Normalizing Flows (NF)

Normalizing Flows (NF) achieve modeling of complex distributions in high-dimensional space without the need to reduce dimensions. This is accomplished by transforming a simple distribution into a complex distribution of the same dimensionality through multiple invertible transformation functions. The probability density at a point in the complex distribution is calculated using the probability density of the simple distribution and the transformation functions.

Key Concepts

Invertible Transformations: Each transformation in the NF must be invertible, allowing for the original data to be precisely recovered from the transformed data.
Density Transformation: Using the transformed data to compute the probability density of the original data, achieved through the Jacobian determinant of the transformation.
Composite Transformations: Multiple simple transformations are composed together to form a complex transformation flow. Each simple transformation is invertible and has a known Jacobian determinant.

Mathematical Representation

Given data $x$ , we aim to map it to a simple distribution $z$ (usually standard Gaussian) through a series of invertible transformations $f1, f2, ..., fK$ :

$Transformation Function$

Since these transformations are invertible, we can map $z$ back to the original data space:

$Inverse Transformation Function$

To calculate the probability density of $x$ , we use the Jacobian determinant of the transformation:

$Probability Density Calculation$

where:

$Jacobian Determinant$

There are several things to note here:

$x$ and $z$ need to be continuous and have the same dimension.
$Jacobian Matrix$ is a matrix of dimension $n x n$ , where each entry at location $(i,j)$ is defined as $partial derivative$ . This matrix is also known as the Jacobian matrix.
$determinant$ denotes the determinant of a square matrix $A$ .
For any invertible matrix $A$ , $determinant of inverse$ , so for $z=f^-1(x)$ we have: $Change of Variables Formula$
If $volume preserving$ , then the mapping is volume preserving, which means that the transformed distribution $pX$ will have the same “volume” compared to the original one $pZ$ .

Applications

Normalization flows have several applications, including:

Generative Models: Generating new data samples similar to the training data.
Probability Density Estimation: Precisely estimating the probability density of complex data.
Data Transformation and Preprocessing: Applying complex, non-linear transformations to data for better modeling in data science and machine learning.

Combining VAE, VI, and NF

Variational Autoencoders (VAEs) combine Variational Inference (VI) and Normalizing Flows (NF) to enhance the modeling of latent variables. This combination, known as NF-VAE or Flow-VAE, improves the expressiveness of the latent space.

Key Points

Dimension Consistency: Each transformation function $fi$ in NF must have the same input and output dimensions to ensure invertibility.
Invertibility: The transformation $fi$ must be invertible, with a well-defined inverse transformation $fi inverse$ .
Jacobian Determinant: Computing the Jacobian determinant of the transformation is essential for adjusting the probability density of the transformed data.

Typical Transformation Functions

Affine Coupling Layer: Applies an affine transformation to part of the input vector while combining it with another part.

$Affine Coupling Layer$
Planar Flow: Introduces an invertible transformation with specific parameters to ensure invertibility and simplicity in computing the Jacobian determinant.
Nonlinear Independent Components Estimation (NICE): Utilizes additive coupling layers and rescaling layers to achieve invertible transformations with simple analytical forms.
Real Non-Volume Preserving (RealNVP): Extends NICE by incorporating
Masked Autoregressive Flow (MAF): Uses an autoregressive model for forward mapping, making sampling sequential and efficient.
Inverse Autoregressive Flow (IAF): Inverts the generating process to parallelize sampling while maintaining efficient likelihood computation for generated points.
ActNorm: Shifts and scales each dimension for normalization.

$ActNorm$

Conclusion

Normalizing flows provide a powerful method to represent, model, and sample complex high-dimensional distributions without directly reducing dimensionality. They transform the probability density of a simple distribution into that of a complex distribution through a series of invertible transformations, enabling precise modeling and sample generation of complex distributions. This approach is widely used in generative models, probability density estimation, and other machine learning tasks.