Angry Middle-Aged Man
A New Yorker profile of Larry David from January 12, 2004.
A New Yorker profile of Larry David from January 12, 2004.
Here are slides that have a good discussion of Markov random fields for image analysis. The most useful part of the slides is a fairly detailed discussion of Bayesian inference, including a clear explanation of the use of conjugate priors (something I have found merely glossed over in many other places).
Summary: when doing Bayesian inference, sometimes the prior knowledge is vague enough for tractability concerns to come into play, so we use a prior distribution on parameters that is compatible with our likelihood function but also leads to a tractable posterior distribution after applying Bayes rule. Conjugate priors let you do this. Formally, if you have a family of likelihood functions L = { p(g|f) | f in F }, then a family of priors P = { p(f|θ) | θ in Θ } is a conjugate family for L if p(g|f) in L and p(f|θ) in P implies p(f|g) in P. The point is that in the Bayesian “posterior ~ likelihood * prior” relationship, you generally have some model form specified for the likelihood function, so you have some wiggle room in choosing the prior. If you choose a prior from the conjugate family for the likelihood function, the math becomes much nicer. Some of the most used examples: the Gaussian family is self-conjugate, and the Dirichlet distribution is conjugate to the multinomial distribution. More detailed examples are worked out in the slides.
I’ve been spending a lot of time on graphical models lately (a marriage of graph theory and probability theory; not random graphs), and came across an intimidating paper called The Toric Algebra of Graphical Models. Abstract:
We formulate necessary and sufficient conditions for an arbitrary discrete probability distribution to factor according to an undirected graphical model, or a log-linear model, or other more general exponential models. For decomposable graphical models these conditions are equivalent to a set of conditional independence statements similar to the Hammersley–Clifford theorem; however, we show that for nondecomposable graphical models they are not. We also show that nondecomposable models can have nonrational maximum likelihood estimates. These results are used to give several novel characterizations of decomposable graphical models.
I’d like to read this sometime. [Digression: the Hammersley-Clifford theorem still confuses me -- I have had trouble figuring out why one needs to assume that the probability distribution is positive even after seeing counterexamples showing that the theorem breaks otherwise. Fine, it breaks, but I still want an intuitive reason why. Apparently the assumption bothered Hammersley and Clifford too -- to the point of delaying publication -- so at least it's not just me being stupid.]
I became curious about the “toric algebra” business in the title, and this language apparently comes from a field called algebraic statistics; here’s a description from a short course on it:
Algebraic statistics advocates the use of algebraic geometry, commutative algebra, and geometric combinatorics as tools for making statistical inferences. The starting point for this connection is the observation that most statistical models for discrete random variables are, in fact, algebraic varieties. While some of the varieties that appear are classical varieties (like Segre varieties and toric varieties), most are new, and there are many challenging open problems about the algebraic structure of these varieties. These lectures will provide an introduction to algebraic statistics, emphasizing both the interesting algebraic questions that arise and the statistical consequences of the algebraic analysis.
This is neat, because I like algebra, and I’ve recently become interested in probability and statistics for machine learning, but as with math mashups, there is always the question of what the point is. But this sounds both interesting and applicable; the paper specifically discussing its application to graphical models sounds like a good way to get started after getting some background from the short course.
[Side note: Apparently, there are people who actually read this blog, though it was basically intended as a dumping ground for things I read. This is why there is almost no context when things are posted and insufficient explanation of technical material. In the last few months, I've actually read too much rather than too little to keep posting material here, though I would like to start again. Hopefully, one or two of the six or seven people who have looked here may be pleased about this. :) ]