Archive for August, 2006

Douglas Hofstadter: Analogy as the Core of Cognition

This is from an essay by Douglas Hofstadter that was delivered as a Stanford Presidential Lecture; it was also previously published in: The Analogical Mind: Perspectives from Cognitive Science, edited by Dedre Gentner, Keith J. Holyoak, and Boicho N. Kokinov (MIT Press). That book might be worth checking out.

This paper is quite long, and not all of it is that interesting, so the entire article is not pasted below. Part of the introduction nicely summarizes the thesis:

One should not think of analogy-making as a special variety of reasoning (as in the dull and uninspiring phrase “analogical reasoning and problem-solving,” a long-standing cliché in the cognitive-science world), for that is to do analogy a terrible disservice. After all, reasoning and problem-solving have (at least I dearly hope!) been at long last recognized as lying far indeed from the core of human thought. If analogy were merely a special variety of something that in itself lies way out on the peripheries, then it would be but an itty-bitty blip in the broad blue sky of cognition. To me, however, analogy is anything but a bitty blip — rather, it’s the very blue that fills the whole sky of cognition — analogy is everything, or very nearly so, in my view. [...]

The thrust of my chapter is to persuade readers of this unorthodox viewpoint, or failing that, at least to give them a strong whiff of it. In that sense, then, my article shares with Richard Dawkins’s eye-opening book The Selfish Gene (Dawkins 1976) the quality of trying to make a scientific contribution mostly by suggesting to readers a shift of viewpoint — a new take on familiar phenomena. For Dawkins, the shift was to turn causality on its head, so that the old quip “a chicken is an egg’s way of making another egg” might be taken not as a joke but quite seriously. In my case, the shift is to suggest that every concept we have is essentially nothing but a tightly packaged bundle of analogies, and to suggest that all we do when we think is to move fluidly from concept to concept — in other words, to leap from one analogy-bundle to another — and to suggest, lastly, that such concept-to-concept leaps are themselves made via analogical connection, to boot.

This essay reminds me of two things: the Barry Mazur essay on category theory that I posted a few days ago, and Jeff Hawkins’ On Intelligence. (Warning: Hofstadter uses the word “category” a lot in his essay, but this has nothing to do with category theory.)

Analogies and Functors

There is an obvious analogy to be made between functors (or any morphisms) and analogies. A functor translates the objects and relationships from one mathematical theory to another; the canonical example is that of the fundamental group functor in algebraic topology, which lets us turn problems of topology into those of group theory. What is this if not a rigorous kind of analogy?

The Hofstadter paper is interesting in that it proposes an extension of this relationship. In particular, Mazur talks about replacing a mathematical object with its network of relationships, while Hofstadter talks about replacing concepts with their “bundle of analogies,” which is more or less the same idea. Recall what Mazur says:

The lights are dimmed on mathematical objects and beamed rather on the corresponding functors; that is, on the networks of relationships entailed by the objects. The functor has center stage, the object that it represents appears almost as an afterthought.

What Hofstadter proposes in his paper is to similarly dim the lights on “concepts” and beam them instead on their networks of analogies; analogies should be center stage when discussing cognition in the same way that functors are frequently center stage when discussing mathematics.

Is there anything deeper to this connection? I don’t know. At least if it’s a superficiality, it’s a neat one.

Analogies and Hierarchical Temporal Memories

It would be interesting to go back to On Intelligence and see how much of this fits with Jeff Hawkins’ theory of how the brain works. At first glance, many of the passages from the essay seem to agree with Hawkins’ ideas. I’m not going to compare them or review them in detail here, but just quote some relevant passages from both.

In this passage, Hofstadter discusses something that he calls “chunking,” which has obvious similarities to the hierarchy in the perceptual system that Hawkins describes.

We begin with a couple of simple queries about familiar phenomena: “Why do babies not remember events that happen to them?” and “Why does each new year seem to pass faster than the one before?”

I wouldn’t swear that I have the final answer to either one of these queries, but I do have a hunch, and I will here speculate on the basis of that hunch. And thus: the answer to both is basically the same, I would argue, and it has to do with the relentless, lifelong process of chunking — taking “small” concepts and putting them together into bigger and bigger ones, thus recursively building up a giant repertoire of concepts in the mind.

How, then, might chunking provide the clue to these riddles? Well, babies’ concepts are simply too small. They have no way of framing entire events whatsoever in terms of their novice concepts. It is as if babies were looking at life through a randomly drifting keyhole, and at each moment could make out only the most local aspects of scenes before them. It would be hopeless to try to figure out how a whole room is organized, for instance, given just a keyhole view, even a randomly drifting keyhole view.

Or, to trot out another analogy, life is like a chess game, and babies are like beginners looking at a complex scene on a board, not having the faintest idea how to organize it into higher-level structures. As has been well known for decades, experienced chess players chunk the setup of pieces on the board nearly instantaneously into small dynamic groupings defined by their strategic meanings, and thanks to this automatic, intuitive chunking, they can make good moves nearly instantaneously and also can remember complex chess situations for very long times. Much the same holds for bridge players, who effortlessly remember every bid and every play in a game, and months later can still recite entire games at the drop of a hat.

Here, Hofstadter discusses the disconnect between sensory input and high-level perception. This is consistent with what Hawkins says would happen in an HTM at the higher levels of the hierarchy.

In fact, I should stress that the upper echelons of high-level perception totally transcend the normal flavor of the word “perception,” for at the highest levels, input modality plays essentially no role. Let me explain. Suppose I read a newspaper article about the violent expulsion of one group of people by another group from some geographical region, and the phrase “ethnic cleansing,” nowhere present in the article, pops into my head. What has happened here is a quintessential example of high-level perception — but what was the input medium? Someone might say it was vision, since I used my eyes to read the newspaper. But really, was I perceiving ethnic cleansing visually? Hardly. Indeed, I might have heard the newspaper article read aloud to me and had the same exact thought pop to mind. Would that mean that I had aurally perceived ethnic cleansing? Or else I might be blind and have read the article in Braille — in other words, with my fingertips, not my eyes or ears. Would that mean that I had tactilely perceived ethnic cleansing? The suggestion is absurd.

The sensory input modality of a complex story is totally irrelevant; all that matters is how it jointly activates a host of interrelated concepts, in such a way that further concepts (e.g., “ethnic cleansing”) are automatically accessed and brought up to center stage. [...]

The triggering of prior mental categories by some kind of input — whether sensory or more abstract — is, I insist, an act of analogy-making. Why is this? Because whenever a set of incoming stimuli activates one or more mental categories, some amount of slippage must occur (no instance of a category ever being precisely identical to a prior instance). Categories are quintessentially fluid entities; they adapt to a set of incoming stimuli and try to align themselves with it. The process of inexact matching between prior categories and new things being perceived (whether those “things” are physical objects or bite-size events or grand sagas) is analogy-making par excellence. How could anyone deny this? After all, it is the mental mapping onto each other of two entities — one old and sound asleep in the recesses of long-term memory, the other new and gaily dancing on the mind’s center stage — that in fact differ from each other in a myriad of ways.

Below, Hofstadter makes some comments on the common core underlying various things out in the world, which gels with Hawkins’ emphasis on “discovering causes.”

I now make an observation that, though banal and obvious, needs to be made explicitly nonetheless — namely, things “out there” (objects, situations, whatever) that are labeled by the same lexical item have something, some core, in common; also, whatever it is that those things “out there” share is shared with the abstract mental structure that lurks behind the label used for them. Getting to the core of things is, after all, what categories are for. In fact, I would go somewhat further and claim that getting to the core of things is what thinking itself is for-thus once again placing high-level perception front and center in the definition of cognition.

For comparison, consider the following paragraph from Numenta’s whitepaper on how HTMs work. The connection to the passage directly above should be fairly clear.

The HTM receives the spatio-temporal pattern coming from the senses. At first, the HTM has no knowledge of the causes in the world, but through a learning process that will be described below, it “discovers” what the causes are. The end goal of this process is that the HTM develops internal representations of the causes in the world. In a brain, nerve cells learn to represent causes in the world, such as a cell that becomes active whenever you see a face. In an HTM, causes are represented by numbers in a vector. At any moment in time, given current and past input, an HTM will assign a likelihood that individual causes are currently being sensed. The HTM’s output is manifest as a set of probabilities for each of the learned causes. This moment-to- moment distribution of possible causes is called a “belief”. If an HTM knows about ten causes in the world, it will have ten variables representing those causes. The value of those variables – its belief – is what the HTM believes is happening in its world at that instant.

While none of this is necessarily that unexpected, and I may be making a mountain out of a molehill, it’s still interesting that so much of what they say appears to overlap. It seems that Hofstadter agrees with Hawkins about what the core activity of the brain is, and in particular, he wants to call that activity analogy-making. (Perhaps Hawkins mentioned analogies in his book too, and I’ve simply forgotten.) In any case, I need to think about it more; I’m still not sure whether this is something or nothing.

Comments (9)

Manifold Destiny

Sylvia Nasar and David Gruber on the battle over who solved the Poincare Conjecture; from the New Yorker of August 28, 2006.

Read the rest of this entry »

Comments off

Barry Mazur: When is one thing equal to another thing?

Some excerpts from Barry Mazur’s paper on When is one thing equal to some other thing?. This is a rough draft of the paper from January 2006, so there may be some awkwardnesses. Also, many sections from the original paper are left out, and the text below will not make sense as an article by itself. All the footnotes and other citations are omitted.

In particular, the whole discussion of replacing an object with its network of relationships (and the following discussion of object and representation) is interesting. It’s also probably worth following up on the bit about a Wittgenstenian interpretation of Yoneda’s Lemma. This paragraph from near the end sums up some of this well:

It sometimes happens that the introduction of a term in a mathematical discussion is the signal that an important shift of viewpoint is taking place, or is about to take place. An emphasis on “representability” of functors in a branch of mathematics suggests an ever so slight, but ever so important, shift. The lights are dimmed on mathematical objects and beamed rather on the corresponding functors; that is, on the networks of relationships entailed by the objects. The functor has center stage, the object that it represents appears almost as an afterthought. The lights are dimmed on on equality of mathematical objects as well, and focussed, rather, on canonical isomorphisms, and equivalence.

Read the rest of this entry »

Comments (2)

Anna Goldenberg: Structural Learning of Large Bayesian Networks for Social Network Modeling

An interesting talk abstract from Anna Goldenberg at CMU. Some of the corresponding publications are online.


Bayesian Networks have been successfully applied in many areas such as pharmaceutical, decision making by doctors, air control, marketing. Structural learning of Bayesian Networks is usually a desirable but costly operation. In some domains it is possible to collect expert knowledge to manually create a structure for a Bayes Net. However, social networks, warehousing data, or supermarket purchasing records may contain hundreds of thousands of attributes. Providing expert Bayes Net structure in such cases is cumbersome if not impossible, even if as in the case with many of those domains the events are choices of very small subsets of the large pool of available entities. The complexity of existing algorithms for structural search prevents Bayes Net learning on datasets of that size.

This work introduces an algorithm for tractable structural learning in Bayes Nets by exploring structures on the local level. The algorithm exploits the computational efficiency of Frequent Sets for gathering statistics that are most likely to be useful for structure search given the assumption of sparse data. I will show the relevance of this work to modeling Social Networks. Finally, I will present an empirical evaluation of our algorithm applied to several massive datasets.

Note: If the time is left and there is a sufficient interest in the audience, I will in addition present a new generative model for evolution of social networks that I have developed in collaboration with Alice Zheng.

Comments (2)

Michael Jordan introducing Graphical Models

A nice, quick introduction from the preface of Jordan’s 1999 book on graphical models.


Graphical models are a marriage between probability theory and graph theory. They provide a natural tool for dealing with two problems that occur throughout applied mathematics and engineering – uncertainty and complexity – and in particular they are playing an increasingly important role in the design and analysis of machine learning algorithms. Fundamental to the idea of a graphical model is the notion of modularity – a complex system is built by combining simpler parts. Probability theory provides the glue whereby the parts are combined, ensuring that the system as a whole is consistent, and providing ways to interface models to data. The graph theoretic side of graphical models provides both an intuitively appealing interface by which humans can model highly-interacting sets of variables as well as a data structure that lends itself naturally to the design of efficient general-purpose algorithms.

Many of the classical multivariate probabalistic systems studied in fields such as statistics, systems engineering, information theory, pattern recognition and statistical mechanics are special cases of the general graphical model formalism – examples include mixture models, factor analysis, hidden Markov models, Kalman filters and Ising models. The graphical model framework provides a way to view all of these systems as instances of a common underlying formalism. This view has many advantages – in particular, specialized techniques that have been developed in one field can be transferred between research communities and exploited more widely. Moreover, the graphical model formalism provides a natural framework for the design of new systems.

Comments (4)

John Searle: Philosophy and the Scientific Worldview

From John Searle’s Mind: An Introduction.


I have now completed the task I have set for myself in the first chapter [of Mind: An Introduction]. I have tried to give an account of the mind that will situate mental phenomena as part of the natural world. Our account of the mind in all its aspects — consciousness, intentionality, free will, mental causation, perception, intentional action, etc. — is naturalistic in this sense: first, it treats mental phenomena as just a part of nature. We should think of consciousness and intentionality as just as much a part of the natural world as photosynthesis or digestion. Second, the explanatory apparatus that we use to give a causal account of mental phenomena is an apparatus that we need to account for nature generally. The level at which we attempt to account for mental phenomena is biological rather than, say, at the level of subatomic physics. The reason for this is that consciousness and other mental phenomena are biological phenomena; they are created by biological processes and are specific to certain sorts of biological organisms. Of course, this is not to deny that our individual minds are shaped by our culture. But culture is not something in opposition to biology; rather, culture is the form that biology takes in different communities. One culture may differ from another culture, but there are limits to the differences. Each must be an expression of the underlying biological commonality of the human species. There could not be a long-term conflict between nature and culture, for if there were, nature would always win; culture would always lose.

People sometimes speak of the “scientific world-view” as if it were one view of how things are among others, as if there might be all sorts of world-views and “science” gave us one of them. In one way this is right; but in another way this is misleading and indeed suggests something false. It is possible to look at the same reality with different interests in mind. There is an economic point of view, an aesthetic point of view, a political point of view, etc., and the point of view of scientific investigation, in this sense, is one point of view among others. However, there is a way of interpreting this conception where it suggests that science names a specific kind of ontology, as if there were a scientific reality that is different from, for example, the reality of common sense. I think that is profoundly mistaken. The view implicit in this book, which I now want to make explicit, is that science does not name an ontological domain; it names rather a set of methods for finding out about anything at all that admits of scientific investigation. The fact that hydrogen atoms have one electron, for example, was discovered by something called the “scientific method,” but that fact, once discovered, is not the property of science; it is entirely public property. It is a fact like any other. So if we are interested in reality and truth, there is really no such thing as “scientific reality” or “scientific truth.” There are just the facts that we know. I cannot tell you how much confusion in philosophy has been generated by the failure to perceive these points. So, for example, there are frequently debates about the reality of the entities posulated by science. But either these entities exist or they do not. The view that I have of the matter is this: the fact that hydrogen atoms have one electron is a fact like the fact that I have one nose. The only difference is that for quite accidental reasons of evolution, I do not need any professional assistance to discover that I only have one nose, whereas given our structure and given the structure of hydrogen atoms, it takes a good deal of professional expertise to discover how many electrons are in a hydrogen atom.

There is no such thing as the scientific world. There is, rather, just the world, and what we are trying to do is describe how it works and describe our situation in it. As far as we know, its most fundamental principles are given by atomic physics, and, for that little corner of it that most concerns us, evolutionary biology. The two basic principles on which any such investigation as the one I have been engaging in depends on are, first, the notion that the most fundamental entities in reality are those described by atomic physics; and, second, that we, as biological beasts, are the product of long periods of evolution, perhaps as long as five billion years. Now, once you accept these points, and they are not points about science but about how the world works, then some of the questions about the human mind admit of rather simple philosophical answers, though that does not imply that they admit of simple neurobiological answers.

We do not live in several different, or even two different, worlds, a mental world and a physical world, a scientific world and the world of common sense. Rather, there is just one world; it is the world we all live in, and we need to account for how we exist as a part of it.

Comments (1)

Paul Graham on Workplaces

In his essay on What Business Can Learn from Open Source, Paul Graham has some interesting things to say about work environments.


Another thing blogs and open source software have in common is that they’re often made by people working at home. That may not seem surprising. But it should be. It’s the architectural equivalent of a home-made aircraft shooting down an F-18. Companies spend millions to build office buildings for a single purpose: to be a place to work. And yet people working in their own homes, which aren’t even designed to be workplaces, end up being more productive.

This proves something a lot of us have suspected. The average office is a miserable place to get work done. And a lot of what makes offices bad are the very qualities we associate with professionalism. The sterility of offices is supposed to suggest efficiency. But suggesting efficiency is a different thing from actually being efficient.

The atmosphere of the average workplace is to productivity what flames painted on the side of a car are to speed. And it’s not just the way offices look that’s bleak. The way people act is just as bad.

Things are different in a startup. Often as not a startup begins in an apartment. Instead of matching beige cubicles they have an assortment of furniture they bought used. They work odd hours, wearing the most casual of clothing. They look at whatever they want online without worrying whether it’s “work safe.” The cheery, bland language of the office is replaced by wicked humor. And you know what? The company at this stage is probably the most productive it’s ever going to be.

Maybe it’s not a coincidence. Maybe some aspects of professionalism are actually a net lose.

To me the most demoralizing aspect of the traditional office is that you’re supposed to be there at certain times. There are usually a few people in a company who really have to, but the reason most employees work fixed hours is that the company can’t measure their productivity.

The basic idea behind office hours is that if you can’t make people work, you can at least prevent them from having fun. If employees have to be in the building a certain number of hours a day, and are forbidden to do non-work things while there, then they must be working. In theory. In practice they spend a lot of their time in a no-man’s land, where they’re neither working nor having fun.

If you could measure how much work people did, many companies wouldn’t need any fixed workday. You could just say: this is what you have to do. Do it whenever you like, wherever you like. If your work requires you to talk to other people in the company, then you may need to be here a certain amount. Otherwise we don’t care.

That may seem utopian, but it’s what we told people who came to work for our company. There were no fixed office hours. I never showed up before 11 in the morning. But we weren’t saying this to be benevolent. We were saying: if you work here we expect you to get a lot done. Don’t try to fool us just by being here a lot.

The problem with the facetime model is not just that it’s demoralizing, but that the people pretending to work interrupt the ones actually working. I’m convinced the facetime model is the main reason large organizations have so many meetings. Per capita, large organizations accomplish very little. And yet all those people have to be on site at least eight hours a day. When so much time goes in one end and so little achievement comes out the other, something has to give. And meetings are the main mechanism for taking up the slack.

For one year I worked at a regular nine to five job, and I remember well the strange, cozy feeling that comes over one during meetings. I was very aware, because of the novelty, that I was being paid for programming. It seemed just amazing, as if there was a machine on my desk that spat out a dollar bill every two minutes no matter what I did. Even while I was in the bathroom! But because the imaginary machine was always running, I felt I always ought to be working. And so meetings felt wonderfully relaxing. They counted as work, just like programming, but they were so much easier. All you had to do was sit and look attentive.

Meetings are like an opiate with a network effect. So is email, on a smaller scale. And in addition to the direct cost in time, there’s the cost in fragmentation— breaking people’s day up into bits too small to be useful.

You can see how dependent you’ve become on something by removing it suddenly. So for big companies I propose the following experiment. Set aside one day where meetings are forbidden— where everyone has to sit at their desk all day and work without interruption on things they can do without talking to anyone else. Some amount of communication is necessary in most jobs, but I’m sure many employees could find eight hours worth of stuff they could do by themselves. You could call it “Work Day.”

The other problem with pretend work is that it often looks better than real work. When I’m writing or hacking I spend as much time just thinking as I do actually typing. Half the time I’m sitting drinking a cup of tea, or walking around the neighborhood. This is a critical phase— this is where ideas come from— and yet I’d feel guilty doing this in most offices, with everyone else looking busy.

It’s hard to see how bad some practice is till you have something to compare it to. And that’s one reason open source, and even blogging in some cases, are so important. They show us what real work looks like.

We’re funding eight new startups at the moment. A friend asked what they were doing for office space, and seemed surprised when I said we expected them to work out of whatever apartments they found to live in. But we didn’t propose that to save money. We did it because we want their software to be good. Working in crappy informal spaces is one of the things startups do right without realizing it. As soon as you get into an office, work and life start to drift apart.

That is one of the key tenets of professionalism. Work and life are supposed to be separate. But that part, I’m convinced, is a mistake.

Comments

Hello world

Here’s the obligatory first post. As mentioned in the about page, I plan on just storing snippets from articles or other blogs here, not updating the world about the last time I brushed my teeth. I think that will also make this blog a lot easier to maintain.

I already store articles I like as individual pages, but this started getting cumbersome. I thought about writing some kind of “online scrapbook” software (partly as an excuse to try out Django), but this seemed so much easier.

Comments