the jsomers.net blog.

Metacat

A couple of puzzles

James Marshall's doctoral thesis, "Metacat: a Self-Watching Cognitive Architecture for Analogy-Making and High-Level Perception," produced under the tutelage of Douglas Hofstadter at Indiana University's Fluid Analogies Research Group (FARG), describes a computer program that is capable of solving "string analogy" puzzles like this one:

abc -> abd, ijk -> ?

What's interesting about this particular problem is that the most obvious answer, ijd, doesn't actually occur to most people. We tend to see ijl instead.

The reason (I think) is that we automatically recognize that the letters in each of abc and ijk follow one another in the alphabet, so what sticks out for us is the "1-2-3" pattern. With that in mind, the most natural rule that changes abc to abd is not "replace the last letter by d," but rather something like "increment the last letter." Applying that to ijk gives ijl.

The point of articulating these steps explicitly is to demonstrate that there is a lot of work going on under the hood even when we solve the simplest abstract problems. Work which a computer program like Metacat, and by extension its programmers, have to know how to do precisely if the thing is going to have any chance at tackling these puzzles at a human level. They have to know, for example, how to determine that the most salient feature in the problem above is the relative position of the letters -- and this turns out to be a pretty significant task.

Of course things only get more difficult as the problems get more complex, e.g.,

abc -> abd, mrrjjj -> ?

You may even have to stop for a minute to think about this one. There are several attractive answers, and the "best" one isn't the most obvious.

One possibility is mrrjjk, which follows the same logic as our solution to the first problem. But it loses major points for missing the three "clusters" of identical letters in the target string. Accounting for these gives the much more elegant mrrkkk.

That's not quite optimal, though, because if we look closely we notice that while abc follows the "1-2-3" pattern in terms of alphabet position, mrrjjj does it with letter frequency. So if our rule turns out to be "increment the last letter-group" (accounting again for those clusters), we ought to make sure we "increment" in the appropriate way.

Our best bet, then, is actually mrrjjjj.

Remarkably, Metacat is able to solve this problem and others still more difficult. Perhaps more important, though, is that the method it uses to do this is in some ways just a slowed-down version of our own unconscious cognition, which, Marshall insists, is really all about analogies.

Analogies as the core of thought

These little puzzles may not seem like much. They are, after all, restricted to a highly stylized microdomain: simple manipulations of letter strings. What's so general about that?

Consider, as a parallel example, a set of problems where someone is asked to "complete" an integer sequence, as in:

1, 2, 3, ...

or

1, 1, 2, 3, 5, ...

For a person who (say) knows nothing more about math than the successorship of integers -- pretend that he's even ignorant about simple arithmetical operations like addition -- the task would become an exercise in (a) scouring for relationships among the numbers, and (b) flexibly updating his best hypothesis as he considers more terms. Both of which strike me as exceedingly general abilities.

(Incidentally, there is a program called Seek-Whence out of FARG that tries to complete integer sequences.)

In the same way, Metacat is less concerned with letters than it is with broadly defined structural relationships.

To see how, think of the steps you had to take to solve one of the puzzles above. Your first move, most likely, was to try to discover some "rule" that transforms the original string. Then, either consciously or not, you tried to highlight the relevant abstract structural similarities between the original string and the target string. And finally, you applied your rule to this abstract representation of the target to produce a solution.

The key to the process, and the part most aptly characterized as an "analogy," is the mapping you made between the original and target strings, where you had to see the two different sets of letters as, at some level, playing the same role. Which when you come down to it is exactly what's happening anytime you make an analogy.

If someone were to ask me, for example, who Canada's "president" is, it would be quite natural for me to tell them the name of our prime minister, rather than saying "nobody" -- because in this context prime minister plays roughly the same role as president, namely, head of state, that the person is probably interested in. Similarly, when YouTube pitched themselves as "the Flickr of video," the notion immediately made sense to users, who could easily imagine transforming Flickr's features to incorporate videos instead of pictures.

More mundane analogies pervade everyday life. Our language is full of them: we "spend" time, "retrieve" memories, "get ideas across," and "shoot down" arguments (see Lakoff and Johnson's Metaphors We Live By for more). And even in our basic interactions with objects, we can't help but think laterally instead of literally. Marshall gives the example of eating food off of a frisbee while at a picnic.

Now, in one sense it's somewhat unremarkable to see a frisbee as a plate, but if nothing else it does illustrate the fluidity of our concepts, which for Marshall is pretty much the key to the whole operation. Here's how he characterizes it:

To some extent every concept in the mind consists of a central core idea surrounded by a much larger "halo" of other related concepts. The amount of overlap between different conceptual halos is not rigid and unchangeable but can instead vary according to the situation at hand. Much work has been done in cognitive psychology investigating the nature of the distances between concepts and categories [Shepard, 1962; Tversky, 1977; Smith and Medin, 1981; Goldstone et al., 1991]. For most people, certain concepts lie relatively close to one another in conceptual space, such as the concepts of mother and father (or perhaps mother and parent) while others are farther apart, at least under normal circumstances. However, like the boundaries defining individual concepts, the degree of association between different concepts can change radically under contextual pressure with the potential result that two or more normally quite dissimilar concepts are brought close together, so that they are both perceived as applying equally well to a particular situation, such as when the Earth is seen as an instance of both the mother concept and the planet concept. This phenomenon, referred to in the Copycat model [Metacat's predecessor] as conceptual slippage, is what enables apparently unrelated situations to be perceived as being fundamentally "the same" at a deeper, more abstract level.

What's nice about this view is that it explains apparently ineffable features of the mind, like creativity and insight, as merely special cases of a more general phenomenon. So when Einstein equated acceleration with gravity (and saw gravity as "curved spacetime"), or when Shannon defined information in terms of entropy, they were just exploring unlikely analogies (that ended up being true) -- or, in the above terms, they were "slipping" particularly distant concepts.

What makes such fruitful ideating possible is a kind of "elastic looseness," wherein one's concepts are allowed to range widely enough to combine in novel ways, but still constrained away from nonsense. Lewis Carroll was especially good at toeing that line -- Jabberwocky is wildly imaginative and, at first glance, meaningless, but there is enough structure in there to allow us to translate his made-up words (say, by breaking up the obvious portmanteaus), and "fill in" the plot from context. Most of Dr. Seuss's stuff is the same way: absurd slippage that might be incomprehensible, except that it's packaged in an allegory that even youngsters can swallow.

Analogies, then, appear to be fundamental to every kind of thought, from survival-level object recognition (detecting new threats, for instance) all the way to artistic or scientific innovation. And Metacat is, if nothing else, an attempt to articulate explicitly all of the subcognitive machinery that makes such analogies possible. So, assuming Marshall knew what he was doing, figuring out how Metacat works should be a lot like figuring out how the mind works.

Metacat in action

This is what Metacat looks like at the end of a typical run, after it has found a plausible solution:

(Notice that the title of the window says "Workspace," which is the name of this particular part of the program. It's only one of many components, but it's probably the most fun to watch, if only because that's where all the conceptual structures get built on top of the strings themselves. Put elsely: if you were working on one of these problems by hand with a piece of paper, you'd probably draw something that looked like the picture above.)

There is a lot going on here. If you look carefully, though, you'll notice that there are really only two "types" of line -- straight and squiggly -- and that together they comprise the three maps we discussed above: (1) from the original string horizontally to its transformed version, (2) from the original string vertically to the target string, and (3) from the target string horizontally to the solution. In addition there are boxes showing the "groups" formed by "bonds" that represent successor/predecessor/sameness relations, short textual descriptions (e.g. "lmost=>lmost") of salient mappings, and natural language explanations of the overarching "rules" that determine how the strings are modified.

It's worth stressing that the horizontal maps are chiefly concerned with differences between the strings, while the vertical map is meant to highlight similarities; this should make sense in light of the discussion above, where we focused on the vertical map as the crux of the analogy -- which is of course more about sameness than differentness.

One may understandably be curious about how all of these Workspace structures are formed. The answer is that a whole slew of computational modules, called "codelets," are sent to the Workspace, one at a time, each with a single low-level task. Here are a few examples (hopefully the names are roughly self-explanatory):

  • Bottom-up bond scouts
  • Bond builders
  • Group evaluators
  • Description builders
  • Rule scouts

Of course the order in which various codelets are executed, and their relative frequency, effectively determines which part of the solution space is being searched at a given time. Which means that the function for choosing codelets in each step should probably be more sophisticated than a random draw, and should in some sense reflect the actual semantics of the problem at hand.

And they do. What Metacat does is to tie the probability of each codelet's being selected in the next round to the state of the "Slipnet," which is a kind of semantic network containing variously activated "concepts." Here's what it looks like (more active concepts have bigger circles above them):

I will leave it to Marshall to describe how this works, and what it means:

In some ways the Slipnet is similar to a traditional semantic network in that it consists of a set of nodes connected by links. Each of these links has an intrinsic length that represents the general degree of association between the linked nodes, with shorter links connecting more strongly associated nodes. [. . .] Each node corresponds to an individual concept, or rather, to the core of an individual concept. A concept is more properly thought of as being represented by a diffuse region in the Slipnet centered on a single node. Nodes connected to the core node by links are included in the central node's "conceptual halo" as a probabilistic function of the link lengths. This allows single nodes to be shared among several different concepts at once, depending on the links involved. Thus, concepts in the Slipnet are not sharply defined: rather, they are inherently blurry, and can overlap to varying degrees.

Unlike traditional semantic networks, however, the Slipnet is a dynamic structure. Nodes in the Slipnet receive frequent infusions of activation as a function of the type of perceptual activity occurring in the Workspace. Activation spreads throughout a node's conceptual halo, flowing across the links emanating from the core node to its neighbors. The amount of spreading activation is mediated by the link lengths, so that more distant nodes receive less activation. However, the link lengths themselves are not necessarily fixed. Some links are labeled by particular Slipnet nodes and may stretch or shrink in accordance with the activation of the label node. A labeled link encodes a specific type of relationship between two concepts, in addition to the conceptual distance separating them. For example, the link between the predecessor and successor nodes is labeled by the opposite node, and the link from the a node to the b node is labeled by the successor node. Whenever a node becomes strongly activated, all links labeled by it shrink. As a result, pairs of concepts connected by these links are brought closer together in the Slipnet, allowing activation to spread more easily between the two, and also making it more likely for conceptual slippages to occur between them.

And,

Whenever new Workspace structures are built, concepts in the Slipnet relating to them receive activation, which then spreads to neighboring concepts. In turn, highly-activated concepts exert top-down pressure on subsequent perceptual processing by promoting the creation of new instances of these concepts in the Workspace. Thus which types of new Workspace structures get built depends strongly on which concepts are relevant (i.e., highly activated) in a given context.

I hope that the correspondence to human cognition is apparent. I am particularly attracted to this idea of a "network of concepts" that (a) directs the activity of lower-level computational/cognitive modules, and then (b) changes shape based on feedback from those modules. It seems to me that when I perceive and think, that is exactly the kind of loop I'm in.

If you have the time (and I assume that if you're reading this sentence, you do), it might be fun to watch a video of all three of these components -- the Workspace, Coderack, and Slipnet -- working together on an actual problem. Watch it here.

You might notice a little thermometer in the top-right corner there. That gives a measure of the "perceptual order in the Workspace": when things are frenzied, and Metacat hasn't settled into any particular "line of thought," it adds more randomness to its codelet selections; but when it starts to hone in on an idea, it "cools off" and eases its way into a conceptual groove, determinedly selecting those codelets most likely to finish the job.

An unfortunate side effect of this otherwise wonderful (and again, psychologically plausible) mechanism is that Metacat risks getting caught in a local optimum, or a "snag," somewhere short of a good solution.

That problem is largely what motivated the idea of a successor to Copycat in the first place. Indeed, what makes Metacat "meta" are a set of self-watching features designed to keep the program out of snags, by (a) watching its own activity in the Workspace and Slipnet so that it can recognize dead ends, and (b) "remembering" particular analogy problems and old solutions, to draw on if it does get stuck.

These meta features show up in the architectural overview that Marshall gives on page 56, which is probably worth taking a look at anyway:

That, in a nutshell, is how Metacat works. It's also probably a good approximation to how your mind works, if not at a neural level than at a conceptual one -- which is probably more interesting, at least for now.

If you'd like to read more, or install the software yourself, or even dive into the tens of thousands of lines of LISP code that makes this run, check out Marshall's project page here: http://science.slc.edu/~jmarshall/metacat/.

Those maps at the beginning of books, or, a few words about teaching

If you're out to understand a story that's really located, as in deeply bound to a particular place, you would do well to have at least a murky mental picture of nearby landmarks, both natural and manmade. For books like Joyce's Ulysses, where the action so often hinges on spatial minutiae -- like which side of the street Bloom's on -- you probably need something more vivid, and of course more accurate.

Now some authors (or more likely, their publishers) will occasionally offer a partial solution to this problem by providing an aerial map of the region. But this rarely turns out to be helpful, if it's ever even read.

The problem is not that these are bad maps, like that they omit salient details or leave too little to the imagination. It's just that they're in the wrong part of the book. Almost always, they're buried in what book-people call the "front matter": the edition notice, title page, dedication, table of contents, preface, foreword, prologue, introduction, etc.

Even if you read that stuff (which many people don't, either because it's boring or because they're careful to avoid spoilers), odds are you won't linger for long on the map. The reason is that it means about as little to you now, when you've first picked up the book, as it ever will. Every feature worth caring about -- the fact that George Willard's new house is a two-minute walk to the railroad, or that Hern's Grocery opens onto the East side of the street (so Kate will see the sun setting on her way out) -- is tied in some way to the characters, who haven't yet arrived.

It is of course possible to flip between the text and the map, but this is only likely to happen (a) if something about the scenery is especially (or really, espatially) confusing, or (b) if you're forced to. But if the map has proven valueless the first time you've looked at it, you probably won't want to look at it again.

Which suggests the following question: when should it be presented?

I think the end of the book is just as bad as the beginning, because by then it's too late to start thinking about new scenes in terms of the map, which is how you'd populate it with the kind of rich associational content that makes maps useful.

What you want, then, is to find some place late enough that at least a few locales or landmarks will ring a bell -- ideally pushing you to mentally reorganize some of what you've just read --, yet also early enough for there to be time for meat to grow on your scaffold.

The best experience I've had with this kind of thing was in Don Gifford's Ulysses Annotated, a must-have for anyone trying to tackle Joyce's masterpiece.

What Gifford did was print a map at the end of each chapter (or more exactly, at the beginning of each chapter's endnotes) that was relevant to that chapter alone. What's more, he'd trace the route that the characters had taken through that particular part of the city. Which meant that as you routinely consulted the guide, you were effectively confronted with your main character's real-time location -- because you had in mind exactly where he was, or what he was looking at, and you could find it on Gifford's map; and then of course you'd look ahead a little, which helped you anticipate and orient yourself for the character's next move. By the end of the chapter you effectively had a little movie in your head, because you'd traced the whole thing out as you went along.

Of course, the efficacy here has a lot to do with the fact that you were in fact forced to consult the map. But Gifford made it worth your while, and other books could follow his lead if they thought creatively.

* * *

The reason I wanted to articulate this question of "where to put the map" in detail is that I think it serves as a useful model for learning in general.

That is, when I introspect about content that I've retained quickly and understood well, I realize that so much of what made it click had to do with the timing and ordering of my encounters with scaffold-material:

  • terms/definitions
  • theorems
  • maps
  • graphs
  • tables
  • timelines
  • categories/classes
  • nested combinations of the above

versus meat-material:

  • examples
  • proofs
  • problems
  • exercises
  • stories
  • first-hand experiences
  • instantiations
  • etc.

The key for me, again, is to scaffold only after I've got some meat, but still early enough to enable ample use of that scaffold.

The idea of "the meta level," for example, is a lot easier to understand if you tacitly have in mind many examples of it before you finally encounter the "concept"; but there is some pressure to learn it sooner rather than later, because once you do articulate that definition precisely, you'll be able to "call" it as a kind of cognitive computational module -- an especially useful one at that.

This is why I advocate problem-based approaches to computer programming [pdf], because you learn the big ideas from the bottom up: you start by working hard on concrete examples, which are then generalized (often with help from others) into useful principles, honed by more work, and generalized again. The cycle is repeated and before you know it you will really know it.

This is in contrast to the standard (K-12) approach, which if I'm not mistaken starts with "concepts" and has you do exercises as applications of these concepts. It's backwards. It's like putting the map at the beginning of the book.

Feynman’s Rigor

All of the things I admire about Richard Feynman -- his intellect, and verve, and eloquence -- seem like special cases of a more general feature of his mind, namely, its almost fanatical devotion to rigor.

Here's an example of what I mean, taken from a wonderful essay, "Richard Feynman and The Connection Machine":

Concentrating on the algorithm for a basic arithmetic operation was typical of Richard's approach. He loved the details. In studying the router, he paid attention to the action of each individual gate and in writing a program he insisted on understanding the implementation of every instruction. He distrusted abstractions that could not be directly related to the facts. When several years later I wrote a general interest article on the Connection Machine for [Scientific American], he was disappointed that it left out too many details. He asked, "How is anyone supposed to know that this isn't just a bunch of crap?"

Feynman would only claim to know something if he knew he knew it. And the way he got there, I think, was by never stretching himself too thin -- never moving to step C without first nailing A and B. So when he approached a new problem he would start with concrete examples at the lowest level; when he read, he would "translate" everything into his own words (or better yet, a vivid picture). My guess is that for him, the worst feeling in the world was not being able to explain an idea clearly, because that meant he didn't really own it.

Of course I'd like to think I'm the same way, but I'm not. I skim passages and skip equations. I claim to know more than I do.

But I say that fully confident that most everyone I know is the same way; it turns out that being a "details guy" is harder than it sounds.

* * *

Now the exciting thing, I think, is that you can teach yourself to work a little more rigorously.

One way is to buy a good abstract algebra textbook and work through all of the problems. Algebra's a good place to start because (a) it doesn't require calculus, (b) the problems are intuitive, and (c) it's mostly proof-based. Which means you'll get the full thrill of really knowing things (because you proved them), without having to learn a whole new language (e.g. analysis).

But a better recommendation might be to start hacking. For one thing, you can start building stuff right away; with math it takes a lot longer (7-8 years) to get to the point where you can produce something original.

What's really good about hacking, though, for the purposes of rigorizing, is that you can't make a program work without an honest-to-God understanding of the details. The reason is that the interpreter doesn't actually do much "interpreting"; it does exactly what you say, no more or less. Which means you have to know exactly what you are talking about.

That imperative becomes clearest when something goes wrong, because that's when you really have to look under the hood. Was that index supposed to cut off at the second-to-last item of your list or the third-to-last? What's happening to those instance variables at the end of each loop? Why is that function not getting called? The only way to ensure a long chain of computation ends up the way it's supposed to is to know what's happening at every step, and in that sense, debugging a program teaches you to do explicitly what guys like Feynman seem to do naturally: work hard at level zero, keeping a close eye on every moving part.