How Things Even Out

How Things Even Out
by Jack Handey

March 3, 2008, The New Yorker

Things tend to even out. Religion, some people say, has caused wars and fighting. Yes, but it’s also boring to sit through a church service, so it evens out. One moment you’re depressed because your doctor tells you that you have alcoholism. But then you cheer up when you go home and find a hidden bottle of vodka you had forgotten about.

Read the rest of this entry »

Comments

Richard Hamming: You and Your Research

Bell Labs, March 7, 1986.

Read the rest of this entry »

Comments

Angry Middle-Aged Man

A New Yorker profile of Larry David from January 12, 2004.

Read the rest of this entry »

Comments

Bayesian Methods and Markov Random Fields

Here are slides that have a good discussion of Markov random fields for image analysis. The most useful part of the slides is a fairly detailed discussion of Bayesian inference, including a clear explanation of the use of conjugate priors (something I have found merely glossed over in many other places).

Summary: when doing Bayesian inference, sometimes the prior knowledge is vague enough for tractability concerns to come into play, so we use a prior distribution on parameters that is compatible with our likelihood function but also leads to a tractable posterior distribution after applying Bayes rule. Conjugate priors let you do this. Formally, if you have a family of likelihood functions L = { p(g|f) | f in F }, then a family of priors P = { p(f|θ) | θ in Θ } is a conjugate family for L if p(g|f) in L and p(f|θ) in P implies p(f|g) in P. The point is that in the Bayesian “posterior ~ likelihood * prior” relationship, you generally have some model form specified for the likelihood function, so you have some wiggle room in choosing the prior. If you choose a prior from the conjugate family for the likelihood function, the math becomes much nicer. Some of the most used examples: the Gaussian family is self-conjugate, and the Dirichlet distribution is conjugate to the multinomial distribution. More detailed examples are worked out in the slides.

Comments

Algebraic Statistics

I’ve been spending a lot of time on graphical models lately (a marriage of graph theory and probability theory; not random graphs), and came across an intimidating paper called The Toric Algebra of Graphical Models. Abstract:

We formulate necessary and sufficient conditions for an arbitrary discrete probability distribution to factor according to an undirected graphical model, or a log-linear model, or other more general exponential models. For decomposable graphical models these conditions are equivalent to a set of conditional independence statements similar to the Hammersley–Clifford theorem; however, we show that for nondecomposable graphical models they are not. We also show that nondecomposable models can have nonrational maximum likelihood estimates. These results are used to give several novel characterizations of decomposable graphical models.

I’d like to read this sometime. [Digression: the Hammersley-Clifford theorem still confuses me — I have had trouble figuring out why one needs to assume that the probability distribution is positive even after seeing counterexamples showing that the theorem breaks otherwise. Fine, it breaks, but I still want an intuitive reason why. Apparently the assumption bothered Hammersley and Clifford too — to the point of delaying publication — so at least it’s not just me being stupid.]

I became curious about the “toric algebra” business in the title, and this language apparently comes from a field called algebraic statistics; here’s a description from a short course on it:

Algebraic statistics advocates the use of algebraic geometry, commutative algebra, and geometric combinatorics as tools for making statistical inferences. The starting point for this connection is the observation that most statistical models for discrete random variables are, in fact, algebraic varieties. While some of the varieties that appear are classical varieties (like Segre varieties and toric varieties), most are new, and there are many challenging open problems about the algebraic structure of these varieties. These lectures will provide an introduction to algebraic statistics, emphasizing both the interesting algebraic questions that arise and the statistical consequences of the algebraic analysis.

This is neat, because I like algebra, and I’ve recently become interested in probability and statistics for machine learning, but as with math mashups, there is always the question of what the point is. But this sounds both interesting and applicable; the paper specifically discussing its application to graphical models sounds like a good way to get started after getting some background from the short course.

[Side note: Apparently, there are people who actually read this blog, though it was basically intended as a dumping ground for things I read. This is why there is almost no context when things are posted and insufficient explanation of technical material. In the last few months, I’ve actually read too much rather than too little to keep posting material here, though I would like to start again. Hopefully, one or two of the six or seven people who have looked here may be pleased about this. :) ]

Comments

Negative databases

An August 31, 2006 article from The Economist on negative databases, or “how philosophy can help create secure databases.”

Read the rest of this entry »

Comments

Douglas Hofstadter: Analogy as the Core of Cognition

This is from an essay by Douglas Hofstadter that was delivered as a Stanford Presidential Lecture; it was also previously published in: The Analogical Mind: Perspectives from Cognitive Science, edited by Dedre Gentner, Keith J. Holyoak, and Boicho N. Kokinov (MIT Press). That book might be worth checking out.

This paper is quite long, and not all of it is that interesting, so the entire article is not pasted below. Part of the introduction nicely summarizes the thesis:

One should not think of analogy-making as a special variety of reasoning (as in the dull and uninspiring phrase “analogical reasoning and problem-solving,” a long-standing cliché in the cognitive-science world), for that is to do analogy a terrible disservice. After all, reasoning and problem-solving have (at least I dearly hope!) been at long last recognized as lying far indeed from the core of human thought. If analogy were merely a special variety of something that in itself lies way out on the peripheries, then it would be but an itty-bitty blip in the broad blue sky of cognition. To me, however, analogy is anything but a bitty blip — rather, it’s the very blue that fills the whole sky of cognition — analogy is everything, or very nearly so, in my view. […]

The thrust of my chapter is to persuade readers of this unorthodox viewpoint, or failing that, at least to give them a strong whiff of it. In that sense, then, my article shares with Richard Dawkins’s eye-opening book The Selfish Gene (Dawkins 1976) the quality of trying to make a scientific contribution mostly by suggesting to readers a shift of viewpoint — a new take on familiar phenomena. For Dawkins, the shift was to turn causality on its head, so that the old quip “a chicken is an egg’s way of making another egg” might be taken not as a joke but quite seriously. In my case, the shift is to suggest that every concept we have is essentially nothing but a tightly packaged bundle of analogies, and to suggest that all we do when we think is to move fluidly from concept to concept — in other words, to leap from one analogy-bundle to another — and to suggest, lastly, that such concept-to-concept leaps are themselves made via analogical connection, to boot.

This essay reminds me of two things: the Barry Mazur essay on category theory that I posted a few days ago, and Jeff Hawkins’ On Intelligence. (Warning: Hofstadter uses the word “category” a lot in his essay, but this has nothing to do with category theory.)

Analogies and Functors

There is an obvious analogy to be made between functors (or any morphisms) and analogies. A functor translates the objects and relationships from one mathematical theory to another; the canonical example is that of the fundamental group functor in algebraic topology, which lets us turn problems of topology into those of group theory. What is this if not a rigorous kind of analogy?

The Hofstadter paper is interesting in that it proposes an extension of this relationship. In particular, Mazur talks about replacing a mathematical object with its network of relationships, while Hofstadter talks about replacing concepts with their “bundle of analogies,” which is more or less the same idea. Recall what Mazur says:

The lights are dimmed on mathematical objects and beamed rather on the corresponding functors; that is, on the networks of relationships entailed by the objects. The functor has center stage, the object that it represents appears almost as an afterthought.

What Hofstadter proposes in his paper is to similarly dim the lights on “concepts” and beam them instead on their networks of analogies; analogies should be center stage when discussing cognition in the same way that functors are frequently center stage when discussing mathematics.

Is there anything deeper to this connection? I don’t know. At least if it’s a superficiality, it’s a neat one.

Analogies and Hierarchical Temporal Memories

It would be interesting to go back to On Intelligence and see how much of this fits with Jeff Hawkins’ theory of how the brain works. At first glance, many of the passages from the essay seem to agree with Hawkins’ ideas. I’m not going to compare them or review them in detail here, but just quote some relevant passages from both.

In this passage, Hofstadter discusses something that he calls “chunking,” which has obvious similarities to the hierarchy in the perceptual system that Hawkins describes.

We begin with a couple of simple queries about familiar phenomena: “Why do babies not remember events that happen to them?” and “Why does each new year seem to pass faster than the one before?”

I wouldn’t swear that I have the final answer to either one of these queries, but I do have a hunch, and I will here speculate on the basis of that hunch. And thus: the answer to both is basically the same, I would argue, and it has to do with the relentless, lifelong process of chunking — taking “small” concepts and putting them together into bigger and bigger ones, thus recursively building up a giant repertoire of concepts in the mind.

How, then, might chunking provide the clue to these riddles? Well, babies’ concepts are simply too small. They have no way of framing entire events whatsoever in terms of their novice concepts. It is as if babies were looking at life through a randomly drifting keyhole, and at each moment could make out only the most local aspects of scenes before them. It would be hopeless to try to figure out how a whole room is organized, for instance, given just a keyhole view, even a randomly drifting keyhole view.

Or, to trot out another analogy, life is like a chess game, and babies are like beginners looking at a complex scene on a board, not having the faintest idea how to organize it into higher-level structures. As has been well known for decades, experienced chess players chunk the setup of pieces on the board nearly instantaneously into small dynamic groupings defined by their strategic meanings, and thanks to this automatic, intuitive chunking, they can make good moves nearly instantaneously and also can remember complex chess situations for very long times. Much the same holds for bridge players, who effortlessly remember every bid and every play in a game, and months later can still recite entire games at the drop of a hat.

Here, Hofstadter discusses the disconnect between sensory input and high-level perception. This is consistent with what Hawkins says would happen in an HTM at the higher levels of the hierarchy.

In fact, I should stress that the upper echelons of high-level perception totally transcend the normal flavor of the word “perception,” for at the highest levels, input modality plays essentially no role. Let me explain. Suppose I read a newspaper article about the violent expulsion of one group of people by another group from some geographical region, and the phrase “ethnic cleansing,” nowhere present in the article, pops into my head. What has happened here is a quintessential example of high-level perception — but what was the input medium? Someone might say it was vision, since I used my eyes to read the newspaper. But really, was I perceiving ethnic cleansing visually? Hardly. Indeed, I might have heard the newspaper article read aloud to me and had the same exact thought pop to mind. Would that mean that I had aurally perceived ethnic cleansing? Or else I might be blind and have read the article in Braille — in other words, with my fingertips, not my eyes or ears. Would that mean that I had tactilely perceived ethnic cleansing? The suggestion is absurd.

The sensory input modality of a complex story is totally irrelevant; all that matters is how it jointly activates a host of interrelated concepts, in such a way that further concepts (e.g., “ethnic cleansing”) are automatically accessed and brought up to center stage. […]

The triggering of prior mental categories by some kind of input — whether sensory or more abstract — is, I insist, an act of analogy-making. Why is this? Because whenever a set of incoming stimuli activates one or more mental categories, some amount of slippage must occur (no instance of a category ever being precisely identical to a prior instance). Categories are quintessentially fluid entities; they adapt to a set of incoming stimuli and try to align themselves with it. The process of inexact matching between prior categories and new things being perceived (whether those “things” are physical objects or bite-size events or grand sagas) is analogy-making par excellence. How could anyone deny this? After all, it is the mental mapping onto each other of two entities — one old and sound asleep in the recesses of long-term memory, the other new and gaily dancing on the mind’s center stage — that in fact differ from each other in a myriad of ways.

Below, Hofstadter makes some comments on the common core underlying various things out in the world, which gels with Hawkins’ emphasis on “discovering causes.”

I now make an observation that, though banal and obvious, needs to be made explicitly nonetheless — namely, things “out there” (objects, situations, whatever) that are labeled by the same lexical item have something, some core, in common; also, whatever it is that those things “out there” share is shared with the abstract mental structure that lurks behind the label used for them. Getting to the core of things is, after all, what categories are for. In fact, I would go somewhat further and claim that getting to the core of things is what thinking itself is for-thus once again placing high-level perception front and center in the definition of cognition.

For comparison, consider the following paragraph from Numenta’s whitepaper on how HTMs work. The connection to the passage directly above should be fairly clear.

The HTM receives the spatio-temporal pattern coming from the senses. At first, the HTM has no knowledge of the causes in the world, but through a learning process that will be described below, it “discovers” what the causes are. The end goal of this process is that the HTM develops internal representations of the causes in the world. In a brain, nerve cells learn to represent causes in the world, such as a cell that becomes active whenever you see a face. In an HTM, causes are represented by numbers in a vector. At any moment in time, given current and past input, an HTM will assign a likelihood that individual causes are currently being sensed. The HTM’s output is manifest as a set of probabilities for each of the learned causes. This moment-to- moment distribution of possible causes is called a “belief”. If an HTM knows about ten causes in the world, it will have ten variables representing those causes. The value of those variables – its belief – is what the HTM believes is happening in its world at that instant.

While none of this is necessarily that unexpected, and I may be making a mountain out of a molehill, it’s still interesting that so much of what they say appears to overlap. It seems that Hofstadter agrees with Hawkins about what the core activity of the brain is, and in particular, he wants to call that activity analogy-making. (Perhaps Hawkins mentioned analogies in his book too, and I’ve simply forgotten.) In any case, I need to think about it more; I’m still not sure whether this is something or nothing.

Comments

Manifold Destiny

Sylvia Nasar and David Gruber on the battle over who solved the Poincare Conjecture; from the New Yorker of August 28, 2006.

Read the rest of this entry »

Comments off

Barry Mazur: When is one thing equal to another thing?

Some excerpts from Barry Mazur’s paper on When is one thing equal to some other thing?. This is a rough draft of the paper from January 2006, so there may be some awkwardnesses. Also, many sections from the original paper are left out, and the text below will not make sense as an article by itself. All the footnotes and other citations are omitted.

In particular, the whole discussion of replacing an object with its network of relationships (and the following discussion of object and representation) is interesting. It’s also probably worth following up on the bit about a Wittgenstenian interpretation of Yoneda’s Lemma. This paragraph from near the end sums up some of this well:

It sometimes happens that the introduction of a term in a mathematical discussion is the signal that an important shift of viewpoint is taking place, or is about to take place. An emphasis on “representability” of functors in a branch of mathematics suggests an ever so slight, but ever so important, shift. The lights are dimmed on mathematical objects and beamed rather on the corresponding functors; that is, on the networks of relationships entailed by the objects. The functor has center stage, the object that it represents appears almost as an afterthought. The lights are dimmed on on equality of mathematical objects as well, and focussed, rather, on canonical isomorphisms, and equivalence.

Read the rest of this entry »

Comments (1)

Anna Goldenberg: Structural Learning of Large Bayesian Networks for Social Network Modeling

An interesting talk abstract from Anna Goldenberg at CMU. Some of the corresponding publications are online.


Bayesian Networks have been successfully applied in many areas such as pharmaceutical, decision making by doctors, air control, marketing. Structural learning of Bayesian Networks is usually a desirable but costly operation. In some domains it is possible to collect expert knowledge to manually create a structure for a Bayes Net. However, social networks, warehousing data, or supermarket purchasing records may contain hundreds of thousands of attributes. Providing expert Bayes Net structure in such cases is cumbersome if not impossible, even if as in the case with many of those domains the events are choices of very small subsets of the large pool of available entities. The complexity of existing algorithms for structural search prevents Bayes Net learning on datasets of that size.

This work introduces an algorithm for tractable structural learning in Bayes Nets by exploring structures on the local level. The algorithm exploits the computational efficiency of Frequent Sets for gathering statistics that are most likely to be useful for structure search given the assumption of sparse data. I will show the relevance of this work to modeling Social Networks. Finally, I will present an empirical evaluation of our algorithm applied to several massive datasets.

Note: If the time is left and there is a sufficient interest in the audience, I will in addition present a new generative model for evolution of social networks that I have developed in collaboration with Alice Zheng.

Comments

« Previous entries ·