flowchart LR
    training_data -- training --> generative_model
    generative_model -- "sampling (plus random noise)" --> generated_sample
Book: Generative Deep Learning by David Foster
Chapter 1: Generative Modeling
Generative Modeling
- model the probability of observing an observation x
- p(x)
Discriminative Modeling
- Model the probability of a label ygiven an observationx
- p(y|x)
flowchart LR
    training_data["Training Data
    label
    observation"]
    training_data -- training --> discriminative_model
    result["Prediction
    0.83
    likely to be van Gogh"]
    discriminative_model -- prediction --> result
Conditional Generative Model
- Model the probability of an observation xgiven a labely
- p(x|y)
Representation Learning
high-dimensional data
representation
latent space
encoder-decoder
manifold
The fundamentals of representational learning are very similar to the mathematical concepts of non-linear behavior in electrical engineering and digital communications theory.
Chapter 2: Deep Learning
Multilayer Perceptron (MLP)
- discriminative model
- supervised learning
- loss function: compare predicted to actual
- optimizer: used to adjust weights in neural network based on the gradient of the loss function
- Adam (Adaptive Moment Estimation)
- RMSProp (Root Mean Square Propagation)
 
Convolution Neural Network (CNN)
- Convolutional layer is a collection of filters
- strides: step size used to move the filter across input
- padding: padding="same"pads input data with zeros so the output layer is the same size as the input size ifstrides=1
- stacking
- Batch normalization - calculation of gradient grows too large causing weights to wildly oscillate
- covariate shift: weights move farther away from the random initial values
- training using batch normalization reduces covariate shift problem
- prediction using batch normalization
- trainable parameters
- scale (gamma)
- shift (beta)
 
- nontrainable parameters
- moving average
- standard deviation
 
 
- Dropout
- during training, choose a random set of units from the prior layer and set their output to zero
- reduces reliance on any one value so better at generalizing to unseen data
 
- Modern approaches tend to favor batch normalization