Facebook’s AI research director Yann LeCun called GAN “the most interesting idea in the last 10 years in ML. Generative Adversarial Networks are a powerful class of neural networks used for unsupervised learning.
This article is first of a series of 3 articles, in which we will step-by-step develop an intuitive (part 1), theoretical (part 2) and practical (part 3) understanding of the most groundbreaking invention in the field of computer vision and deep learning in the past 10 years – Generative Adversarial Networks.
In this first article, we will focus on getting comfortable with the concept of GANs, assuming that the reader has some familiarity with the basic concepts of deep learning and related concepts like optimisation, gradients and backpropagation.
Generative Adversarial Networks is a promising idea, introduced by Ian Goodfellow et. al. in 2014, that uses Adversarial training to learn a generative model, which can generate completely unseen images.
GAN comprises of a set of 2 neural networks: Generator & Discriminator.
A Generator takes a random noise signal as input and generates an image from it. Input: The noise signal is a random sample in the form of a vector, taken usually from a uniform or normal distribution. This is also called a latent vector. Function: The generator is required to learn the probability distribution of the real image data and use that information to transform the noise signal into a real-like image. As this image is created from a noise signal and not a real image, it is actually a fake image. Output: A fake image.
A Discriminator is a regular neural network classifier whose aim is to discriminate between a set of real and fake images that are fed as inputs. Input: A set of real images and another set of similar-sized fake images generated by the generator net. Function: The discriminator is required to learn features that distinguish a real image from a fake image, with the output being a number close to 1 for real image and a number close to 0 for the fake image. This number is the probability with which the discriminator predicts the image as real or fake. Output: A probability value.
thinks a little deeply about this structure, it is clear that the discriminator
net acts as an adversary to the generative net and hence the name ‘Generative
– Adversarial Network’.
We can say, the discriminator is performing well if it correctly classifies the real and fake images, and the generator is performing well if it can generate real looking images so that the discriminator cannot detect it as fake.
To understand better, we will discuss this in detail using an interesting example from the original paper.
“The generative model can be thought of as analogous to a team of counterfeiters, trying to produce fake currency and use it without detection, while the discriminative model is analogous to the police, trying to detect the counterfeit currency. Competition in this game drives both teams to improve their methods until the counterfeits are indistinguishable from the genuine articles.”
example here compares the generative model and adversarial/discriminative
neural net to a team of counterfeiters and police respectively.
The counterfeiters are trying to
generate fake currency.
The police is trying to to
correctly distinguish between fake and real currency.
The counterfeiters generate fake currency but they will not do a great job initially.
The police will not be able to detect the fake currency immediately but will soon learn some patterns to differentiate the fake from the real.
The counterfeiters will try to learn from their mistakes. They will try to understand those patterns which the police is using to differentiate the fake currency notes from realand improve upon them. They will now do a better job at generating real-like fake currency.
This time the police will struggle a bit, but will soon learn new patterns to differentiate the fake from the real.
The counterfeiters will again try to learn these new patterns due to which their generated currency is being caught by the police, and then fix those patterns to match real currency.
This process of counterfeiters and police both learning and improving separately will go on until the police can no longer detect the differences between the real and fake image. This means that after a series of learning steps, counterfeiters will become skilled at creating real looking currency.
If you have understood this, then congratulations because this is exactly how a GAN works. The discriminator and generator are trained separately a few times until they reach an equilibrium state where the generator has learnt the distribution of real input data and the discriminator cannot differentiate between real and fake images.
the intuition is clear, let’s try to think how a neural network – GAN is
creating this magic. We will use the same notations as given in the original
The noise input z to the generator is a random sample taken from a distribution p_z (z), which is usually a normal distribution with some mean and variance.
The generator is a multilayer perceptron, with a differentiable function G and trainable parameters θ_g, that learns a distribution p_g to transform any input noise z to a sample from this distribution p_g. From the latent vector z, the function G tries to learn representative features which will help it to create an image. eg. In case of a currency note, it can be font style of number or text position.
The discriminator is also a multilayer perceptron, with a differentiable function D and trainable parameters θ_d, that learns to differentiate an input from generator distribution p_g from an input x coming from the real data distribution p_data. The output is a scalar value which defines the probability with which D predicts this input to be real.
Initially, the networks know nothing about real or fake images and thus, it assigns some random probabilities value for all inputs, real or fake.
As the first part of the training, D is trained over a batch of inputs from both the distribution p_data and p_g. The loss from all the inputs is added together and back propagated through the discriminator so that it can learn better weights and correctly classify real and fake. This is repeated for k steps.
As the second part of the training, G is trained over a batch of inputs from the distribution p_g and the loss is back propagated to the generator. It is important to note that for generator the loss function is the negative of the discriminator loss function, as the generator wants the fake images to be assigned a value of 1 and not 0, unlike the discriminator. Also, while the generator is being trained, the discriminator parameters are made non-trainable but the gradient back propagates through the discriminator which means the discriminator has to help the generator to learn what mistakes it was making. This is like police disclosing the patterns it uses for differentiating real currency from fake currency, to the counterfeiters.
The training process in step 2 and 3 is repeated until the optimal state p_g = p_x is achieved.
shown immense potential in the field of computer vision because of its varied
applications. As intelligent machines require a lot of training data, this
framework for generating realistic images can help to create training data for
many other tasks.
this article was interesting and helpful in understanding GANs.
2, we will discuss in more detail the theoretical analysis and the difficulties
in training GANs.
In part 3, we will discuss some GAN applications and code implementation of a popular GAN framework, better known as DCGAN or Deep Convolutional GANs.
Data Science enthusiast with a deep interest in AI and desire to share her learnings in the most simple manner. Has prior software development experience in building windows applications and blockchain applications .