This post documents a detailed proof of the conditions to achieve optimality for the original GAN.

 

First, let’s recap the GAN formulation. We have a minimax game

The goal is to maximize over where is fixed, while at the same time minimize over where is fixed.

In this formulation, both and are neural networks (e.g. Multilayer Perceptron or CNN) parameterized by and

We want to find

  • The condition that must satisfy to maximize , while keeping fixed (Proposition 1)
  • The condition that must satisfy to miminize , while keeping fixed (Proposition 2)

A note on notation and assumption:

  • Both and are random vectors (i.e. vector where each component is a random variable).

Proposition 1: When is fixed, the optimal Discriminator

Proof:

Note that is a random vector where , so is also a random vector that follows some distribution denoted as .

Because is integrated out, only depends on , and is maximized if the expression being integrated

is maximized.

Let . Then

It follows that is necessary condition for achieving optimum.

We also have , because , , and . So achieves maximum at . End of Proof.

Proposition 2: When is optimal and fixed, then we have almost everywhere, where is the probability distribution of the samples generatated from the optimal Generator.

Proof:

When is optimal and fixed, we have

We denote to indicate that is a now function of .

Next we will show the relationship between and the Jenson-Shannon Divergence (between and ):

It follows that , and is minimized only when , or the two probability distributions almost everywhere. End of proof.