Discriminative Model VS Generative Model
Introduction
Sometimes, we would feel confused about the difference between a discriminative model and a generative model. This is because the mathematical definition was not explicitly defined.
In this blog post, I would like to discuss the mathematical definition for discriminative model and generative model, and their relationships.
Discriminative Model
A discriminative model is a model of the conditional probability of the target variable $Y$, given the observation variable $X$. Symbolically, we are modeling $P(Y|X)$. In layman’s terms, we give an observation to the model, we want the model to tell us the probability for each possible target.
For an example in which $X$ and $Y$ are both discrete, we input a $32 \times 32$ RGB dog image $x$ to the model, and the model will output the probability of each of the targets. There are $|Y|$ targets possible being modeled. We could iterate through the probabilities of all the $|Y|$ targets and find out the maximum one, i.e., $\DeclareMathOperator*{\argmax}{argmax} \argmax_{Y} P(Y|X = x)$, which is the most likely target the model thinks.
Generative Model
I found there are many slightly different definitions for generative model.
In some definitions, a generative model is a model of the conditional probability of the observation variable $X$, given the target variable $Y$. Symbolically, we are modeling $P(X | Y)$. In layman’s terms, we give a target to the model, we want the model to tell us the probability for each possible observation.
For an example in which $X$ and $Y$ are both discrete, we input the “dog” label $y$ to the model, and the dog label will generate the probability of each $32 \times 32$ RGB image. There are $| X | = 256^{32 \times 32}$ RGB images possible being modeled. We could iterate through the probabilities of all the $|X|$ observations and find out the maximum one, i.e., $\DeclareMathOperator*{\argmax}{argmax} \argmax_{X} P(X|Y = y)$, which is the most possible dog image the model thinks.
In other definitions, a generative model is a model of the joint probability of the observation variable $X$ and the target variable $Y$. Symbolically, we are modeling $P(X, Y)$. In layman’s terms, we don’t know what we want, we just want the model to tell us the probability for each possible observation and target combinations.
For an example in which $X$ and $Y$ are both discrete, the model modeled the joint distribution of $32 \times 32$ RGB image and different targets. There are $| X | \times | Y | = 256^{32 \times 32} | Y |$ combinations possible being modeled. To find out the most possible dog image that the model thinks, suppose the “dog” label is $y$, we could iterate through the probabilities of all the $|X|$ observations and find out the maximum one, i.e., $\DeclareMathOperator*{\argmax}{argmax} \argmax_{X} P(X, Y = y)$.
Relationships
Generative Conditional and Joint Distribution Model Definitions
The two definitions for generative models are closely related given the definition of joint probability.
$$
\begin{align}
P(X, Y) = P(X | Y) P(Y)
\end{align}
$$
This means that if the probability distribution for the target variable $P(Y)$ could be modeled. The two generative model definitions could be converted from each other. Usually modeling the prior $P(Y)$ is not very hard.
When we are only interested in finding out the which observation, $x_1$ or $x_2$, is more likely given an target $y$, it does not matter if we model $P(X | Y)$ or $P(X, Y)$. Because
$$
\frac{P(X = x_1, Y = y)}{P(X = x_2, Y = y)} = \frac{P(X = x_1 | Y = y)}{P(X = x_2 | Y = y)}
$$
Discriminative and Generative Models
Generative models have richer information than discriminative models. A generative model could be easily converted to a discriminative model. This is because of the Bayes theorem.
$$
\begin{align}
P(Y | X) &= \frac{P(X | Y) P(Y)}{P(X)} \\
&= \frac{P(X, Y)}{P(X)} \\
\end{align}
$$
where
$$
\begin{align}
P(X) &= \sum_{y \in Y} P(X | Y = y) P(Y = y) \\
&= \sum_{y \in Y} P(X, Y = y) \\
\end{align}
$$
for discrete variable $Y$, and
$$
\begin{align}
P(X) &= \int_{Y} P(X | Y ) P(Y = y) dy \\
&= \int_{Y} P(X, Y = y) dy \\
\end{align}
$$
for continuous variable $Y$.
In this case, the generative joint distribution model is readily to be converted to a discriminative model where as the generative conditional distribution model still requires to have a model $P(Y)$ although usually $P(Y)$ is not hard to model.
When we are only interested in finding out the which target, $y_1$ or $y_2$, is more likely given an observation $x$, $P(X)$ could be ignored given it is a constant with respect to the target variable $Y$.
$$
\begin{align}
P(Y | X) &\propto P(X | Y) P(Y) \\
&\propto P(X, Y) \\
\end{align}
$$
$$
\begin{align}
\frac{P(Y = y_1 | X = x)}{P(Y = y_2 | X = x)} &= \frac{P(X = x | Y = y_1) P(Y = y_1)}{P(X = x | Y = y_2) P(Y = y_2)} \\
&= \frac{P(X = x, Y = y_1)}{P(X = x, Y = y_2)} \\
\end{align}
$$
A perfect example of applying a generative model to a discriminative task is naive Bayes classifier.
References
Discriminative Model VS Generative Model
https://leimao.github.io/blog/Discriminative-Model-VS-Generative-Model/