### Introduction

Sometimes, we would feel confused about the difference between a discriminative model and a generative model. This is because the mathematical definition was not explicitly defined.

In this blog post, I would like to discuss the mathematical definition for discriminative model and generative model, and their relationships.

### Discriminative Model

A discriminative model is a model of the conditional probability of the target variable $Y$, given the observation variable $X$. Symbolically, we are modeling $P(Y|X)$. In layman’s terms, we give an observation to the model, we want the model to tell us the probability for each possible target.

For an example in which $X$ and $Y$ are both discrete, we input a $32 \times 32$ RGB dog image $x$ to the model, and the model will output the probability of each of the targets. There are $|Y|$ targets possible being modeled. We could iterate through the probabilities of all the $|Y|$ targets and find out the maximum one, i.e., $\DeclareMathOperator*{\argmax}{argmax} \argmax_{Y} P(Y|X = x)$, which is the most likely target the model thinks.

### Generative Model

I found there are many slightly different definitions for generative model.

In some definitions, a generative model is a model of the conditional probability of the observation variable $X$, given the target variable $Y$. Symbolically, we are modeling $P(X | Y)$. In layman’s terms, we give a target to the model, we want the model to tell us the probability for each possible observation.

For an example in which $X$ and $Y$ are both discrete, we input the “dog” label $y$ to the model, and the dog label will generate the probability of each $32 \times 32$ RGB image. There are $| X | = 256^{32 \times 32}$ RGB images possible being modeled. We could iterate through the probabilities of all the $|X|$ observations and find out the maximum one, i.e., $\DeclareMathOperator*{\argmax}{argmax} \argmax_{X} P(X|Y = y)$, which is the most possible dog image the model thinks.

In other definitions, a generative model is a model of the joint probability of the observation variable $X$ and the target variable $Y$. Symbolically, we are modeling $P(X, Y)$. In layman’s terms, we don’t know what we want, we just want the model to tell us the probability for each possible observation and target combinations.

For an example in which $X$ and $Y$ are both discrete, the model modeled the joint distribution of $32 \times 32$ RGB image and different targets. There are $| X | \times | Y | = 256^{32 \times 32} | Y |$ combinations possible being modeled. To find out the most possible dog image that the model thinks, suppose the “dog” label is $y$, we could iterate through the probabilities of all the $|X|$ observations and find out the maximum one, i.e., $\DeclareMathOperator*{\argmax}{argmax} \argmax_{X} P(X, Y = y)$.

### Relationships

#### Generative Conditional and Joint Distribution Model Definitions

The two definitions for generative models are closely related given the definition of joint probability.

\[\begin{align} P(X, Y) = P(X | Y) P(Y) \end{align}\]This means that if the probability distribution for the target variable $P(Y)$ could be modeled. The two generative model definitions could be converted from each other. Usually modeling the prior $P(Y)$ is not very hard.

When we are only interested in finding out the which observation, $x_1$ or $x_2$, is more likely given an target $y$, it does not matter if we model $P(X | Y)$ or $P(X, Y)$. Because

\[\frac{P(X = x_1, Y = y)}{P(X = x_2, Y = y)} = \frac{P(X = x_1 | Y = y)}{P(X = x_2 | Y = y)}\]#### Discriminative and Generative Models

Generative models have richer information than discriminative models. A generative model could be easily converted to a discriminative model. This is because of the Bayes theorem.

\[\begin{align} P(Y | X) &= \frac{P(X | Y) P(Y)}{P(X)} \\ &= \frac{P(X, Y)}{P(X)} \\ \end{align}\]where

\[\begin{align} P(X) &= \sum_{y \in Y} P(X | Y = y) P(Y = y) \\ &= \sum_{y \in Y} P(X, Y = y) \\ \end{align}\]for discrete variable $Y$, and

\[\begin{align} P(X) &= \int_{Y} P(X | Y ) P(Y = y) dy \\ &= \int_{Y} P(X, Y = y) dy \\ \end{align}\]for continuous variable $Y$.

In this case, the generative joint distribution model is readily to be converted to a discriminative model where as the generative conditional distribution model still requires to have a model $P(Y)$ although usually $P(Y)$ is not hard to model.

When we are only interested in finding out the which target, $y_1$ or $y_2$, is more likely given an observation $x$, $P(X)$ could be ignored given it is a constant with respect to the target variable $Y$.

\[\begin{align} P(Y | X) &\propto P(X | Y) P(Y) \\ &\propto P(X, Y) \\ \end{align}\] \[\begin{align} \frac{P(Y = y_1 | X = x)}{P(Y = y_2 | X = x)} &= \frac{P(X = x | Y = y_1) P(Y = y_1)}{P(X = x | Y = y_2) P(Y = y_2)} \\ &= \frac{P(X = x, Y = y_1)}{P(X = x, Y = y_2)} \\ \end{align}\]A perfect example of applying a generative model to a discriminative task is naive Bayes classifier.