Pseudolikelihood in Graphical Models

Introduction

Because I did not use pseudolikelihood quite frequently, I often forget what it is and how to calculate it. The introductory materials from Wikipedia and other resources are always vague and confusing to beginners. So I decided to write a blog post on pseudolikelihood in detail in case in the future I forget it again.

Problem Definition

We usually have to calculate, given a graphical model (directed or undirected), what the probability of each possible state is. Concretely, we have the following undirected graphical model. We would like to know Pr(X1=a2,X2=b3,X3=c2).

Undirected Graphical Model

Undirected Graphical Model

X1,X2,X3 can take 4 different values, respectively. We also ignore the value of the edges. The score of the state S(X1,X2,X3) in the graph could be calculated using f(X1,X2,X3). To calculate Pr(X1=a2,X2=b3,X3=c3), we will usually have to first calculate the scores for all the possible states in the graph and apply normalization functions to turn the scores to probabilities. Usually the normalization function could be softmax function.

Z=141414ef(X1=ai,X2=bi,X3=ci)

Pr(X1=ai,X2=bi,X3=ci)=ef(X1=ai,X2=bi,X3=ci)/Z

Therefore,

Pr(X1=a2,X2=b3,X3=c2)=ef(X1=a2,X2=b3,X3=c2)/Z

More generally, if there are n nodes in the graph and each node could choose from k different values. The complexity of calculating Pr(X) will be O(kn). When the graphical model becomes larger and more complex, calculating Pr(X) becomes intractable.

Without applying softmax function globally, the probability of the state S(X1,X2,X3) could also be calculated from the “chain-rule” of conditional probabilities:

Pr(X1=a2,X2=b3,X3=c2)=Pr(X3=c3|X2=b3,X1=a2)×Pr(X2=b3|X1=a2)×Pr(X1=a2)

However, to calculate the conditional probabilities on the right side, it is still required to calculate the probabilies of all the states, which still takes O(kn).

Pseudolikelihood Approximation

Pr(X=x)=i=1nPr(Xi=xi|xi)

In our case,

Pr(X1=a2,X2=b3,X3=c2)=Pr(X1=a2|X2=b3,X3=c2)×Pr(X2=b3|X1=a2,X3=c2)×Pr(X3=c3|X1=a2,X2=b2)

We first calculate Pr(X1=a2|X2=b3,X3=c2).

Given X2=b3,X3=c2, X1 has four possible values a1,a2,a3,a4. We can therefore calculate the socres for the four possible states S(X1=a1,X2=b3,X3=c2), S(X1=a2,X2=b3,X3=c2), S(X1=a3,X2=b3,X3=c2), S(X1=a4,X2=b3,X3=c2).

Z=14ef(X1=ai,X2=b3,X3=c2)

Apply softmax function to calculate Pr(X1=a2|X2=b3,X3=c2)

Pr(X1=a2|X2=b3,X3=c2)=ef(X1=a2,X2=b3,X3=c2)/Z

Similarly, we can calculate Pr(X2=b3|X1=a2,X3=c2) and Pr(X3=c3|X1=a2,X2=b2)

Once the conditional probabilities have all been calculated, we could calculate the pseudolikelihood of state Pr(X1=a2,X2=b3,X3=c2) by multiplying the three conditional probabilities together.

More generally, if there are n nodes in the graph and each node could choose from k different values. The complexity of calculating the pseudolikelihood Pr(X) will be O(kn), which is much more efficient than calculating the true likelihood.

Conclusion

Pseudolikelihood makes the calculation of the probability of the state from “intractable” to “tractable”, which accelerates our computation of the graphical models.

Pseudolikelihood in Graphical Models

https://leimao.github.io/blog/Pseudolikelihood/

Author

Lei Mao

Posted on

02-16-2018

Updated on

02-16-2018

Licensed under


Comments