Shannon Entropy
InformationTheory.ShannonEntropy.Hist
— TypeHist(; bins::Tuple = (-1,))
A method for calculating Shannon entropy using histograms.
Fields
bins::Tuple
: A tuple specifying the binning strategy for the histogram. If(-1,)
, the binning is determined automatically byStatsBase.fit
. Otherwise, a tuple of bin edges for each dimension should be provided.
InformationTheory.ShannonEntropy.KSG
— TypeKSG(; k::Int = 5)
A method for calculating Shannon entropy using the Kozachenko-Leonenko (KSG) estimator.
Fields
k::Int
: The number of nearest neighbors to consider for each point.
InformationTheory.ShannonEntropy.ShannonEntropyMethod
— TypeShannonEntropyMethod
An abstract type for different methods of calculating Shannon entropy.
InformationTheory.ShannonEntropy.shannon
— Methodshannon(method::Hist, x::AbstractVector...)
Calculates the Shannon entropy of a set of variables x
using a histogram-based method.
Arguments
method::Hist
: The histogram-based Shannon entropy calculation method.x::AbstractVector...
: One or more vectors representing the data for which to calculate the entropy. Each vector is a dimension of the data.
Returns
H::Float64
: The calculated Shannon entropy.
Details
The function first fits a histogram to the data x
. The probability density function (PDF) is then approximated from the histogram. Finally, the Shannon entropy is calculated by integrating -p(x) * log(p(x))
over the domain of x
, where p(x)
is the PDF.
InformationTheory.ShannonEntropy.shannon
— Methodshannon(method::KSG, x::AbstractVector...)
Calculates the Shannon entropy of a set of variables x
using the KSG estimator.
Arguments
method::KSG
: The KSG Shannon entropy calculation method.x::AbstractVector...
: One or more vectors representing the data for which to calculate the entropy. Each vector is a dimension of the data.
Returns
H::Float64
: The calculated Shannon entropy.
Details
The function uses the k-nearest neighbors (k-NN) algorithm to estimate the probability density function (PDF) of the data. The Shannon entropy is then calculated based on the distances to the k-th nearest neighbors. This method is particularly useful for high-dimensional data.