Book: Generative Deep Learning by David Foster

Published

June 4, 2023

Chapter 1: Generative Modeling

Generative Modeling

model the probability of observing an observation x
p(x)

flowchart LR
    training_data -- training --> generative_model
    generative_model -- "sampling (plus random noise)" --> generated_sample

Figure 1: Generative model where an observation has many features

Discriminative Modeling

Model the probability of a label y given an observation x
p(y|x)

flowchart LR
    training_data["Training Data
    label
    observation"]
    training_data -- training --> discriminative_model
    result["Prediction
    0.83
    likely to be van Gogh"]
    discriminative_model -- prediction --> result

Figure 2: Discriminative model

Conditional Generative Model

Model the probability of an observation x given a label y
p(x|y)

Representation Learning

high-dimensional data

representation

latent space

encoder-decoder

manifold

The fundamentals of representational learning are very similar to the mathematical concepts of non-linear behavior in electrical engineering and digital communications theory.

Chapter 2: Deep Learning

Multilayer Perceptron (MLP)

discriminative model
supervised learning
loss function: compare predicted to actual
optimizer: used to adjust weights in neural network based on the gradient of the loss function
- Adam (Adaptive Moment Estimation)
- RMSProp (Root Mean Square Propagation)

Convolution Neural Network (CNN)

Convolutional layer is a collection of filters
strides: step size used to move the filter across input
padding: padding="same" pads input data with zeros so the output layer is the same size as the input size if strides=1
stacking
Batch normalization - calculation of gradient grows too large causing weights to wildly oscillate
- covariate shift: weights move farther away from the random initial values
- training using batch normalization reduces covariate shift problem
- prediction using batch normalization
- trainable parameters
  - scale (gamma)
  - shift (beta)
- nontrainable parameters
  - moving average
  - standard deviation
Dropout
- during training, choose a random set of units from the prior layer and set their output to zero
- reduces reliance on any one value so better at generalizing to unseen data
Modern approaches tend to favor batch normalization

Chapter 3: Variational Autoencoders