KL divergence

InformationTheory.KLdivergence.HistType
Hist(; bins_x::Tuple = (-1,), bins_y::Tuple = (-1,))

A method for calculating KL divergence using histograms.

Fields

  • bins_x::Tuple: A tuple specifying the binning strategy for the histogram of the first distribution (P). If (-1,), the binning is determined automatically by StatsBase.fit. Otherwise, a tuple of bin edges for each dimension should be provided.
  • bins_y::Tuple: A tuple specifying the binning strategy for the histogram of the second distribution (Q). If (-1,), the binning is determined automatically by StatsBase.fit. Otherwise, a tuple of bin edges for each dimension should be provided.
source
InformationTheory.KLdivergence.kNNType
kNN(; k::Int = 5)

A method for calculating KL divergence using a k-nearest neighbors (k-NN) based estimator.

Fields

  • k::Int: The number of nearest neighbors to consider for each point.
source
InformationTheory.KLdivergence.kldivMethod
kldiv(method::Hist, x::Tuple, y::Tuple)

Calculates the KL divergence between two distributions, P and Q, represented by data x and y, using a histogram-based method.

Arguments

  • method::Hist: The histogram-based KL divergence calculation method.
  • x::Tuple: A tuple of vectors representing the data for the first distribution (P). Each vector is a dimension.
  • y::Tuple: A tuple of vectors representing the data for the second distribution (Q). Each vector is a dimension.

Returns

  • D::Float64: The calculated KL divergence D(P||Q).

Details

The function fits histograms to the data x and y to approximate their probability density functions (PDFs), p(x) and q(y). The KL divergence is then calculated by integrating p(x) * log(p(x) / q(y)) over the domain.

source
InformationTheory.KLdivergence.kldivMethod
kldiv(method::kNN, x::Tuple, y::Tuple)

Calculates the KL divergence between two distributions, P and Q, represented by data x and y, using a k-NN based method.

Arguments

  • method::kNN: The k-NN based KL divergence calculation method.
  • x::Tuple: A tuple of vectors representing the data for the first distribution (P).
  • y::Tuple: A tuple of vectors representing the data for the second distribution (Q).

Returns

  • D::Float64: The calculated KL divergence D(P||Q).

Details

This function uses a non-parametric method to estimate KL divergence based on the distances to the k-nearest neighbors in the data x and y. It is particularly useful for high-dimensional data.

source