Introduction to Dirichlet Distribution
Introduction
Dirichlet distribution, also called multivariate beta distribution, is widely used in text mining techniques, such as Dirichlet process and latent Dirichlet allocation. To have a better understanding of these text mining techniques, we have to first understand Dirichlet distribution thoroughly. To understand the Dirichlet distribution from scratch, we would also need to understand binomial distribution, multinomial distribution, gamma function, beta distribution, and their relationships.
In this tutorial, we are going through the fundamentals of binomial distribution, multinomial distribution, gamma function, beta distribution, and Dirichlet distribution, laying the foundations to Dirichlet process and latent Dirichlet allocation.
Binomial Distribution
Binomial distribution, parameterized by
where
It is very easy to understand the formula. We select
Multinomial Distribution
Multinomial distribution is simply a generalized high dimensional version of binomial distribution. The variable, instead of being a single scalar value in binomial distribution, is a multivariable vector in multinomial distribution.
In multinomial distribution, we are not doing Bernoulli trials any more. Instead, each trial has
where for
It is also not hard to understand the formula. We select
Gamma Function
We will talk about gamma function, instead of gamma distribution, because gamma distribution does not need to be directly related to Dirichlet distribution. Gamma function is well defined for any complex number
where
Gamma function has a special property, which will be used for deriving the properties of beta distribution and Dirichlet distribution.
The proof is presented as follows using the definition of gamma function and integral by parts.
This concludes the proof.
There are some special values for gamma function.
It might not be trivial to find
The probability density of a Gaussian distribution is well defined from
When
This Gaussian distribution is symmetric about
Therefor,
We use this integral for
Because
Jacobian
The Jacobian, denoted by
For a continuous 1-to-1 transformation
The Jacobian matrix, denoted by
The Jacobian of the Jacobian matrix is defined as the determinant of the Jacobian matrix.
We then have the following transformation for the multiple integrals.
Beta Distribution
Beta distribution is a family of continuous probability distributions well defined on the interval
where
It is less well known how
Because
We have
We then check what
We set
The Jacobian matrix is
The Jacobian is
By applying the transoformation for multiple integrals,
Therefore, this concludes the proof for
In addition, beta distribution is the conjugate prior for binomial distribution. We have prior
Therefore,
Dirichlet Distribution
Analogous to multinomial distribution to binomial distribution, Dirichlet is the multinomial version for the beta distribution. Dirichlet distribution is a family of continuous probability distribution for a discrete probability distribution for
where
Not surprisingly, when
Similar to the normalizer in beta distribution, we would show
Because
We have
We then check what
We set
where
The Jacobian matrix is
The Jacobian is computed via Gaussian elimination by adding each of the row from row
By applying the transformation for multiple integrals,
Therefore, this concludes the proof for
In addition, Dirichlet distribution is the conjugate prior for multinomial distribution. We have prior
Therefore,
References
Introduction to Dirichlet Distribution
https://leimao.github.io/blog/Introduction-to-Dirichlet-Distribution/