Discriminative Model VS Generative Model

12-14-202012-14-2020 blog 6 minutes read (About 872 words) visits

Introduction

Sometimes, we would feel confused about the difference between a discriminative model and a generative model. This is because the mathematical definition was not explicitly defined.

In this blog post, I would like to discuss the mathematical definition for discriminative model and generative model, and their relationships.

Discriminative Model

A discriminative model is a model of the conditional probability of the target variable $Y$, given the observation variable $X$. Symbolically, we are modeling $P(Y|X)$. In layman’s terms, we give an observation to the model, we want the model to tell us the probability for each possible target.

For an example in which $X$ and $Y$ are both discrete, we input a $32 \times 32$ RGB dog image $x$ to the model, and the model will output the probability of each of the targets. There are $|Y|$ targets possible being modeled. We could iterate through the probabilities of all the $|Y|$ targets and find out the maximum one, i.e., $\DeclareMathOperator*{\argmax}{argmax} \argmax_{Y} P(Y|X = x)$, which is the most likely target the model thinks.

Generative Model

I found there are many slightly different definitions for generative model.

In some definitions, a generative model is a model of the conditional probability of the observation variable $X$, given the target variable $Y$. Symbolically, we are modeling $P(X | Y)$. In layman’s terms, we give a target to the model, we want the model to tell us the probability for each possible observation.

For an example in which $X$ and $Y$ are both discrete, we input the “dog” label $y$ to the model, and the dog label will generate the probability of each $32 \times 32$ RGB image. There are $| X | = 256^{32 \times 32}$ RGB images possible being modeled. We could iterate through the probabilities of all the $|X|$ observations and find out the maximum one, i.e., $\DeclareMathOperator*{\argmax}{argmax} \argmax_{X} P(X|Y = y)$, which is the most possible dog image the model thinks.

In other definitions, a generative model is a model of the joint probability of the observation variable $X$ and the target variable $Y$. Symbolically, we are modeling $P(X, Y)$. In layman’s terms, we don’t know what we want, we just want the model to tell us the probability for each possible observation and target combinations.

For an example in which $X$ and $Y$ are both discrete, the model modeled the joint distribution of $32 \times 32$ RGB image and different targets. There are $| X | \times | Y | = 256^{32 \times 32} | Y |$ combinations possible being modeled. To find out the most possible dog image that the model thinks, suppose the “dog” label is $y$, we could iterate through the probabilities of all the $|X|$ observations and find out the maximum one, i.e., $\DeclareMathOperator*{\argmax}{argmax} \argmax_{X} P(X, Y = y)$.

Relationships

Generative Conditional and Joint Distribution Model Definitions

The two definitions for generative models are closely related given the definition of joint probability.

$$
\begin{align}
P(X, Y) = P(X | Y) P(Y)
\end{align}
$$

This means that if the probability distribution for the target variable $P(Y)$ could be modeled. The two generative model definitions could be converted from each other. Usually modeling the prior $P(Y)$ is not very hard.

When we are only interested in finding out the which observation, $x_1$ or $x_2$, is more likely given an target $y$, it does not matter if we model $P(X | Y)$ or $P(X, Y)$. Because

$$
\frac{P(X = x_1, Y = y)}{P(X = x_2, Y = y)} = \frac{P(X = x_1 | Y = y)}{P(X = x_2 | Y = y)}
$$

Discriminative and Generative Models

Generative models have richer information than discriminative models. A generative model could be easily converted to a discriminative model. This is because of the Bayes theorem.

$$
\begin{align}
P(Y | X) &= \frac{P(X | Y) P(Y)}{P(X)} \\
&= \frac{P(X, Y)}{P(X)} \\
\end{align}
$$

where

$$
\begin{align}
P(X) &= \sum_{y \in Y} P(X | Y = y) P(Y = y) \\
&= \sum_{y \in Y} P(X, Y = y) \\
\end{align}
$$

for discrete variable $Y$, and

$$
\begin{align}
P(X) &= \int_{Y} P(X | Y ) P(Y = y) dy \\
&= \int_{Y} P(X, Y = y) dy \\
\end{align}
$$

for continuous variable $Y$.

In this case, the generative joint distribution model is readily to be converted to a discriminative model where as the generative conditional distribution model still requires to have a model $P(Y)$ although usually $P(Y)$ is not hard to model.

When we are only interested in finding out the which target, $y_1$ or $y_2$, is more likely given an observation $x$, $P(X)$ could be ignored given it is a constant with respect to the target variable $Y$.

$$
\begin{align}
P(Y | X) &\propto P(X | Y) P(Y) \\
&\propto P(X, Y) \\
\end{align}
$$

$$
\begin{align}
\frac{P(Y = y_1 | X = x)}{P(Y = y_2 | X = x)} &= \frac{P(X = x | Y = y_1) P(Y = y_1)}{P(X = x | Y = y_2) P(Y = y_2)} \\
&= \frac{P(X = x, Y = y_1)}{P(X = x, Y = y_2)} \\
\end{align}
$$

A perfect example of applying a generative model to a discriminative task is naive Bayes classifier.

References

Generative Model

Discriminative Model VS Generative Model

https://leimao.github.io/blog/Discriminative-Model-VS-Generative-Model/

Author

Lei Mao

Posted on

12-14-2020

Updated on

12-14-2020

Licensed under

Statistics,

Mathematics

Discriminative Model VS Generative Model

Introduction

Discriminative Model

Generative Model

Relationships

Generative Conditional and Joint Distribution Model Definitions

Discriminative and Generative Models

References

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Comments

Advertisement

Catalogue