### Introduction

Given a Bayesian network, sometimes we would like to make a decision based on the information in the Bayesian network. We could add additional action nodes and utility nodes to the Bayesian network and it becomes an influence diagram. Decision making is just finding the optimal action so that the expected utility of the influence diagram becomes the largest. Maximum expected utility is often used for finding the optimal action or policy.

In this blog post, I would like to discuss maximum expected utility for decision making in the influence diagram.

### Decision Making

Decision making situation $\mathcal{D}$, we have the random variable action $A$ and the random variable state $X$. Note that the action variable and the state variable can both be multivariate. $X$ and $A$ forms an influence diagram $D(X, A)$. The utility function given $A$ and $X$ in the influence diagram is $U(X, A)$.

The goal of decision making is find the optimal action or policy such that the expected utility is maximized.

### Maximum Expected Utility

#### Special Case

We define the expected utility given action $A$ as

\[\begin{align} \mathbb{E} [U(A)] &= \sum_{x \in X} P(X = x | A) U(X = x, A) \\ \end{align}\]The optimal action $a^{\ast}$ that maximizes the expected utility is

\[\DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator*{\argmax}{argmax} \begin{align} a^{\ast} = \argmax_{a \in A} \mathbb{E} [U(A = a)] \end{align}\]In this particular setting, the state variable $X$ has no influence on the action variable $A$, but the action variable $A$ might have influence on the state variable $X$.

#### General Case

More generally, the state variable $X$ could affect the action variable $A$ and vise versa, so we have to use the joint distribution $P(X, A)$.

\[\begin{align} \mathbb{E} [U] &= \sum_{x \in X, a \in A} P(X = x, A = a) U(X = x, A = a) \\ \end{align}\]At the first glance, this is an constant value that dependents on neither $X$ nor $A$. What is the variable for optimization? Letâ€™s factorize $P(X, A)$ first. Suppose $X$ is multivariate and $X = \{X_1, X_2, \cdots, X_n\}$,

\[\begin{align} P(X, A) = \prod_{i}^{n} P(X_i | \text{Pa}(X_i) ) P(A | \text{Pa}(A)) \end{align}\]We found $P(A | \text{Pa}(A))$ is actually a *policy*, sometimes referred as *decision rule*, that remains to be optimized. We further denote $P(A | \text{Pa}(A))$ as $\pi(A | Z)$ or $\delta(A | Z)$ where $Z = \text{Pa}(A)$, and $W = X - Z - A$.

Note that $\sum_{W} \Big( \prod_{i}^{n} P(X_i | \text{Pa}(X_i) ) U(X, A) \Big)$ is just a function that depends on $A$ and $Z$, we denote

\[\mu(A, Z) = \sum_{W} \bigg( \prod_{i}^{n} P(X_i | \text{Pa}(X_i) ) U(X, A) \bigg)\]Therefore, $\mathbb{E} [U(\pi)]$ could be abbreviated as

\[\begin{align} \mathbb{E} [U(\pi)] &= \sum_{Z, A} \pi(A | Z) \mu(A, Z) \\ \end{align}\]Sometimes, we could change the expression in another form. Suppose $U(X, A) = U(X^{\prime}, A)$ where $X^{\prime} \subseteq X$,

\[\begin{align} \mathbb{E} [U(\pi)] &= \sum_{X^{\prime}, A} P(X^{\prime}, A) U(X^{\prime}, A) \\ \end{align}\]where $P(X^{\prime}, A)$ is the marginal probability distribution.

To find the the optimal policy $\pi^{\ast}$,

\[\begin{align} \pi^{\ast} &= \argmax_{\pi} \mathbb{E} [U(\pi)] \\ &= \argmax_{\pi} \sum_{Z, A} \pi(A | Z) \mu(A, Z) \\ &= \argmax_{\pi} \sum_{Z} \sum_{A} \pi(A | Z) \mu(A, Z) \\ \end{align}\]It is not difficult to find that the optimal policy $\pi^{\ast}$ is

\[\begin{align} \pi^{\ast}(A | Z) &= \begin{cases} 1 & \text{if $A = \argmax_{A} \mu(A, Z)$}\\ 0 & \text{otherwise}\\ \end{cases} \\ \end{align}\]In this case the probability distribution $\pi^{\ast}$ degenerates to a deterministic function.

\[a^{\ast} = \argmax_{a \in A} \mu(A = a, Z = z)\]In this particular setting, we would need to observe $Z$ in order to find the optimal action.

### Final Remarks

These fundamental concepts are crucial for Markov decision process and reinforcement learning.