# GAN & DCGAN 讨论

Posted by c cm on July 23, 2017

GAN的两篇入门paper

• Generative adversarial nets(Ian Goodfellow, 2014)，简称为GAN
• Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks(Alec Radford, etc, 2015)，简称为DCGAN。

### 1. GAN对噪声z的分布有要求吗？常用哪些分布？

G和D的博弈，G需要尽量贴近p_data，D需要识别出真实数据和G生成的数据

D的loss 伯努利分布的对数似然函数

G的loss

• zero game时，Equation 1
• 实际训练,Equation 2
• 不用zero game loss的原因，In practice, equation 1 may not provide sufficient gradient for G to learn well. Early in learning,when G is poor, D can reject samples with high confidence because they are clearly different fromthe training data. In this case, log(1 − D(G(z))) saturates. Rather than training G to minimizelog(1 − D(G(z))) we can train G to maximize log D(G(z)). This objective function results in thesame fixed point of the dynamics of G and D but provides much stronger gradients early in learning.

GAN优化的目标是JSD，是指在D最优的时候，GAN的目标才等价于JSD。

### 4. 在一轮迭代中，G和D的更新次数哪个多？为了让G学得更好一点，能不能让G多更新？

D更新次数更多，如果G更新太多次会导致diversity不足。

1. in particular, G must not be trained too much without updating D, in order to avoid “the Helvetica scenario” in which G collapses too many values of z to the same value of x to have enough diversity to model p_data
2. Optimizing D to completion in the inner loop of training is computationally prohibitive, and on finite datasets would result in overfitting. Instead, we alternate between k steps of optimizing D and one step of optimizing G. This results in D being maintained near its optimal solution, so long as G changes slowly enough.

### 5. 在GAN中添加batch normalization层有什么作用？

1. 解决随机初始化参数不理想，
2. 防止梯度爆炸，只是降低概率，其他不可控因素还是可能导致梯度爆炸

DCGAN中经验：

Directly applying batchnorm to all layers however, resulted in sample oscillation and model instability. This was avoided by not applying batchnorm to the generator output layer and the discriminator input layer

### 6. DCGAN对激活函数做了哪些限制？ DCGAN哪些地方使用卷积，哪些地方使用反卷积(fractional-strided卷积)，哪些地方使用全连接？

• 激活函数：
• G: ReLU, output tanh
• D: leaky rectified activation (GAN mahout), output softmax/sigmoit
• 卷积应该是D降采样用的，反卷积是G上采样用的
• 全连接层: 去掉