Maximum Expected Utility and Decision Making

06-21-202106-21-2021 blog 5 minutes read (About 775 words) visits

Introduction

Given a Bayesian network, sometimes we would like to make a decision based on the information in the Bayesian network. We could add additional action nodes and utility nodes to the Bayesian network and it becomes an influence diagram. Decision making is just finding the optimal action so that the expected utility of the influence diagram becomes the largest. Maximum expected utility is often used for finding the optimal action or policy.

In this blog post, I would like to discuss maximum expected utility for decision making in the influence diagram.

Decision Making

Decision making situation $\mathcal{D}$, we have the random variable action $A$ and the random variable state $X$. Note that the action variable and the state variable can both be multivariate. $X$ and $A$ forms an influence diagram $D(X, A)$. The utility function given $A$ and $X$ in the influence diagram is $U(X, A)$.

The goal of decision making is find the optimal action or policy such that the expected utility is maximized.

Maximum Expected Utility

Special Case

We define the expected utility given action $A$ as

$$
\begin{align}
\mathbb{E} [U(A)]
&= \sum_{x \in X} P(X = x | A) U(X = x, A) \\
\end{align}
$$

The optimal action $a^{\ast}$ that maximizes the expected utility is

$$
\DeclareMathOperator*{\argmin}{argmin}
\DeclareMathOperator*{\argmax}{argmax}
\begin{align}
a^{\ast} = \argmax_{a \in A} \mathbb{E} [U(A = a)]
\end{align}
$$

In this particular setting, the state variable $X$ has no influence on the action variable $A$, but the action variable $A$ might have influence on the state variable $X$.

General Case

More generally, the state variable $X$ could affect the action variable $A$ and vise versa, so we have to use the joint distribution $P(X, A)$.

$$
\begin{align}
\mathbb{E} [U]
&= \sum_{x \in X, a \in A} P(X = x, A = a) U(X = x, A = a) \\
\end{align}
$$

At the first glance, this is an constant value that dependents on neither $X$ nor $A$. What is the variable for optimization? Let’s factorize $P(X, A)$ first. Suppose $X$ is multivariate and $X = \{X_1, X_2, \cdots, X_n\}$,

$$
\begin{align}
P(X, A) = \prod_{i}^{n} P(X_i | \text{Pa}(X_i) ) P(A | \text{Pa}(A))
\end{align}
$$

We found $P(A | \text{Pa}(A))$ is actually a policy, sometimes referred as decision rule, that remains to be optimized. We further denote $P(A | \text{Pa}(A))$ as $\pi(A | Z)$ or $\delta(A | Z)$ where $Z = \text{Pa}(A)$, and $W = X - Z - A$.

$$
\begin{align}
\mathbb{E} [U(\pi)]
&= \sum_{X, A} P(X, A) U(X, A) \\
&= \sum_{X_1, X_2, \cdots, X_n, A} \bigg( \prod_{i}^{n} P(X_i | \text{Pa}(X_i) ) \pi(A | Z) U(X, A) \bigg) \\
&= \sum_{Z, A} \sum_{W} \bigg( \prod_{i}^{n} P(X_i | \text{Pa}(X_i) ) \pi(A | Z) U(X, A) \bigg) \\
&= \sum_{Z, A} \pi(A | Z) \sum_{W} \bigg( \prod_{i}^{n} P(X_i | \text{Pa}(X_i) ) U(X, A) \bigg) \\
\end{align}
$$

Note that $\sum_{W} \Big( \prod_{i}^{n} P(X_i | \text{Pa}(X_i) ) U(X, A) \Big)$ is just a function that depends on $A$ and $Z$, we denote

$$
\mu(A, Z) = \sum_{W} \bigg( \prod_{i}^{n} P(X_i | \text{Pa}(X_i) ) U(X, A) \bigg)
$$

Therefore, $\mathbb{E} [U(\pi)]$ could be abbreviated as

$$
\begin{align}
\mathbb{E} [U(\pi)]
&= \sum_{Z, A} \pi(A | Z) \mu(A, Z) \\
\end{align}
$$

Sometimes, we could change the expression in another form. Suppose $U(X, A) = U(X^{\prime}, A)$ where $X^{\prime} \subseteq X$,

$$
\begin{align}
\mathbb{E} [U(\pi)]
&= \sum_{X^{\prime}, A} P(X^{\prime}, A) U(X^{\prime}, A) \\
\end{align}
$$

where $P(X^{\prime}, A)$ is the marginal probability distribution.

To find the the optimal policy $\pi^{\ast}$,

$$
\begin{align}
\pi^{\ast} &= \argmax_{\pi} \mathbb{E} [U(\pi)] \\
&= \argmax_{\pi} \sum_{Z, A} \pi(A | Z) \mu(A, Z) \\
&= \argmax_{\pi} \sum_{Z} \sum_{A} \pi(A | Z) \mu(A, Z) \\
\end{align}
$$

It is not difficult to find that the optimal policy $\pi^{\ast}$ is

$$
\begin{align}
\pi^{\ast}(A | Z)
&=
\begin{cases}
1 & \text{if $A = \argmax_{A} \mu(A, Z)$}\\
0 & \text{otherwise}\\
\end{cases} \\
\end{align}
$$

In this case the probability distribution $\pi^{\ast}$ degenerates to a deterministic function.

$$
a^{\ast} = \argmax_{a \in A} \mu(A = a, Z = z)
$$

In this particular setting, we would need to observe $Z$ in order to find the optimal action.

Final Remarks

These fundamental concepts are crucial for Markov decision process and reinforcement learning.

Maximum Expected Utility and Decision Making

https://leimao.github.io/blog/Maximum-Expected-Utility-Decision-Making/

Author

Lei Mao

Posted on

06-21-2021

Updated on

06-21-2021

Licensed under

Probabilistic Graphical Model

Maximum Expected Utility and Decision Making

Introduction

Decision Making

Maximum Expected Utility

Special Case

General Case

Final Remarks

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Comments

Advertisement

Catalogue