the jsomers.net blog.

Jimbits vol. 1

I’m reluctant to post anything shorter than a few hundred words. The reason is that it feels easier to write little blurbs about this and that than to compose a longer, more involved piece, and I’ve tried to incubate an allergy to ease. (For various reasons.)

That said, there are often things about which I have something to say, maybe a paragraph’s worth or two, and I’d rather try to put that out there than either (a) let the thought rot in my notes archive, or (b) stretch it thin for the sake of a higher word count.

It also helps that I’ve come up with a nice moniker for this sort of thing: “Jimbits.”

Here’s the first batch:

Reading out loud

I wrote the following after the first “discussion section” of a class on modernist novels:

Having students read passages aloud in their first day of class is on the one hand a good way to break the ice — because it gets them talking — but it also engenders tension, because (a) people suck at reading stuff aloud, and (b) people get extraordinarily uncomfortable when they see someone struggling in a public performance.

This latter point struck me a while ago after watching all kinds of amateur speakers deliver announcements at school assemblies, but I still don’t have a satisfying explanation for it. I think it’s a combination of our general aversion to awkwardness, empathy for the speaker, and some kind of mind-meld where each person’s reaction is multiplied via identification with everyone else around them.

I’m not sure. But I think I do know why people can’t read aloud, or at least I have a theory:

Reading aloud causes a person to go into some attenuated version of “public speaking mode,” with the corresponding nervousness. Reading printed words precisely is already difficult — simply because most of our natural conversation is extemporaneous — but the added stress all but guarantees errors. Which, in turn, may induce a kind of a “self-consciousness loop” should the person realize he’s bombing.

It’s a lot easier to recover when you’re making the words up as you go along, just because there’s no need to backtrack and there are no “mistakes” to fix.

Intellectual fovea

“The foveal system of the human eye is the only part of the retina that permits 100% visual acuity,” according to Wikipedia. The impressive bit is that it “comprises less than 1% of retinal size but takes up over 50% of the visual cortex in the brain,” acting like a laser amid what is mostly diffuse lamplight.

The point being that almost all of the data in your field of vision is restricted to a tiny area; when you read, for instance, most of what you see is a “big region of wordy-looking texture,” not individual words that you can process and recall (Baum, What is Thought?, p. 412).

My question, then, is whether one can have a similar kind of arrangement for intellectual attention. Maybe, for instance, one foveates the linguistic features of a text — sentence structure, cadence, word choice, and the like — and leaves content to the periphery. Or slows down for exposition but skims equations. Or, perhaps more commonly, attends to arguments in field x at the expense of everything else.

I ask for two reasons. One, odds are that experts at anything have a fovea for their particular expertise, or a way of looking at the world that is highly resolved in one direction. And second, I’m guessing that something similar is at work anytime someone is deeply biased; religious nuts, for example, might have a “Jesus fovea.” In either case it seems like an important thing to identify (though of course it’s possible that I just have a fovea fovea).

Linguistic leverage

Every time I think about it, I’m a bit blown back by the fact that words actually have kinetic effects on people’s brains. If you are reading this sentence, for example, stuff in your head is moving in a way it wouldn’t have otherwise.

The effect depends quite strongly, though, on exactly which words are chosen. They have to be in the right language, presented at the appropriate time, and received with careful attention, if they’re going to operate with the most potency. And of course their strength grows at least linearly with the size of their audience.

By analogy, consider the Reduce function in Mathematica. According to Wolfram’s page about it, “Reduce and related functions use about 350 pages of Mathematica code and 1400 pages of C code.” That’s leverage: the label, Reduce, is only powerful because it’s connected to a massive cache of underlying machinery. I think words work the same way.

Anatomy of a Project Euler problem, and, some remarks on the OEIS

[This post turned into a kind of narrative about solving a particular problem, which I’m happy with, but if you’re interested in my general remarks about the online encyclopedia of integer sequences (the original intention of this post) jump here.]

Over the weekend I had a chance to work on a few Project Euler problems. I think of them as little “mental workouts,” in that they provide a somewhat frictionless (controlled) environment in which to practice math and hacking. And like physical workouts, PE problems are satisfying in direct proportion to their difficulty.

Problem 106, reproduced here, has been one of my all-time favorites (of the 111 I’ve solved):

Let S(A) represent the sum of elements in set A of size n. We shall call it a special sum set if for any two non-empty disjoint subsets, B and C, the following properties are true:

  1. S(B) ≠ S(C); that is, sums of subsets cannot be equal.
  2. If B contains more elements than C then S(B) > S(C).

For this problem we shall assume that a given set contains n strictly increasing elements and it already satisfies the second rule.

Surprisingly, out of the 25 possible subset pairs that can be obtained from a set for which n = 4, only 1 of these pairs need to be tested for equality (first rule). Similarly, when n = 7, only 70 out of the 966 subset pairs need to be tested.

For n = 12, how many of the 261625 subset pairs that can be obtained need to be tested for equality?

(If you’re interested in the rest of this post, I encourage you to read the description again, at least until you have some idea of what it’s about; it’s a bit of a mouthful the first time through. And if you’re interested in solving the problem, by all means do it — but be sure to come back when you’ve finished.)

The first thing that struck me about this question is that it doesn’t feel well-defined. What does it mean for us to “need” to test a pair? Isn’t that a judgment call?

My first step, then, was to print out the 25 subset pairs where n = 4; the idea would be to go through them and cross out the ones that “obviously” couldn’t be equal, and see if there was a pattern to it.

I immediately discovered one rule: pairs of singletons can’t possibly have equal “sums.” Since the sets are disjoint by assumption, any member of one can’t be part of the other — which, when your sets only have one member each, implies a strict inequality. Simple.

The next rule, too, follows directly from a fact mentioned in the problem description, namely, that property #2 is satisfied. So we can throw out any pair containing sets of different sizes, because these can’t be equal (by assumption).

Now it turns out that when you account for those two rules, you’re only left with three pairs — out of which only one “needs” to be tested, according to the problem description. They are:

[1, 2] [3, 4]
[2, 3] [1, 4]
[1, 3] [2, 4]

Can you see the rule?

Well, in the first and third cases, set A is strictly dominated by set B: if you move through both sets simultaneously, proceeding from left to right, the number you choose from set B is in every case greater than its partner from set A. So set B unequivocally has the larger sum.

With that in mind, it looks like we have all of the rules for telling us exactly how many pairs we need to check. But before we jump all the way up to n = 12, it makes sense to code up a little script on n = 7 to make sure our rules eliminate all but 70 of the 966 possible subset pairs (as we’re told in the problem description). Here’s a little code that does this (using a fast powerset() function pasted to the Python mailing list):

s = [1, 2, 3, 4, 5, 6, 7]
subsets = [list(A) for A in list(powerset(s))]

ct = 0
# Enumerate all possible pairs, then filter:
for i, s in enumerate(subsets[1:]):
    for j, t in enumerate(subsets[i:]):
        if len((s + t)) == len(set(s + t)): # Disjoint
            if len(s) == len(t) and len(s) > 1: # Rules 1 and 2
                diffs = [a[1] - a[0] for a in zip(s, t)]
                if len(filter(lambda x: x <= 0, diffs)) > 0: # Rule 3
                    ct += 1
print ct

It prints 70, which verifies that our rules are sound.

Unfortunately, though, we can’t just add five elements to s and be done with it. When n = 12, the power set of s contains 212 elements, which means that if we “enumerated all possible pairs,” as in the script above, we’d run through our loop (212)2 times, which would take something like 9 hours to complete on a computer like mine.

So the problem comes down to our being able to find (a) the number of pairs Px of disjoint, equally-sized subsets for sizes x = 2, …, 6 (or 12/2), and (b) subtract from each of these Px the number of pairs that are “strictly dominated,” in the sense above.

It took me a long time to figure out how to even approach (a). What I finally did was to make a table, for n = 4 and x = 2, of all the possible subset pairs; I’d then cross out the ones that weren’t disjoint and try to find a pattern. So I started to draw something like this:

         (1, 2) (1, 3) ... (6, 7)
        +------+------+---+------
(1, 2)  |
(1, 3)  |
 ...    |
(6, 7)  |

which was going to be filled with X’s in every spot where two sets contained the same element. But I stopped way early, because in the process of filling out my first row, I realized that the pair (1, 2) had essentially “blotted out” exactly two possibilities (1 and 2) from my list, leaving 3, 4, 5, 6, and 7. Which meant that if I was going to ensure “disjointness,” all I had to do was figure out the number of ways to select 2 elements out of the 5 remaining — which is just “5 choose 2″, or, 5!/2!(5-2)!.

To figure out my total P2, then, I just had to multiply by the number of rows, which turns out to be “7 choose 2″, or as a Python function, nCr(7, 2).

I checked my work using the lists I had printed out earlier, and it turned out that I was always off by a factor of two: I was double-counting. In terms of the table, I had forgotten that you only had to look above (or below) the diagonal.

With that done, all that was left was to subtract out the number of strictly dominated pairs.

But this was a lot more difficult than it sounds.

To restate the problem concretely, let’s say N = 7 and x = 2. What we’re after is the number of ways of selecting two pairs of elements from our 7-element list such that each of the guys from pair A has a partner in pair B that is strictly greater than it. Visually, where our pair A is represented by stars below the numbers and pair B by lines above, what we want is something like this:

      _   _
1 2 3 4 5 6 7
  *     *

as opposed to this:

    _   _
1 2 3 4 5 6 7
      *   *

because in this second case, 6 (from pair A) isn’t dominated by anyone from pair B.

My first stab at a systematic rule here was to try fixing two stars, like so:

1 2 3 4 5 6 7
  *   *

and to see how many ways there were to arrange two lines so that they strictly dominated the stars. Working with a pencil, I started drawing, erasing, and tallying.

This was a brutal mess. There were simply too many possibilities, and I wasn’t at all clear about what would happen if I increased n or x; I couldn’t find a system, and was even having trouble moving the lines systematically without losing track. Arranging the stars differently just muddled things up even more.

I went to dinner and tried to think a bit while eating. One idea I had was to write, above each number, the different star-numbers that would be dominated if I chose that number (i.e., if I drew a line above it) — sort of like how people solve Sudoku puzzles by listing all the possibilities and propagating constraints.

I don’t think that was a good idea, but it may have got me thinking, because when I got back to work to try it out I came up with something much better. It was the clearest “insight” of the day.

What I realized was that n doesn’t matter, at least in one important sense. Say x = 2 and n = 1,000,000. For any arrangement of stars and lines, there are all kinds of ways to shift things around; probably millions of them. But for the most part, none of these moves will change the essential dominance relationship — the ordering of lines and stars relative to one another. There are really only a fixed number of arrangements that matter, and increasing n just adds empty space.

In fact, we can systematically investigate every arrangement-that-matters by letting n = 4, which is small enough to work out by hand. Then, and here’s the trick, anytime we increase n, all we need to think about is the number of ways of choosing 4 numbers from that set.

Let’s say, just to be explicit about it, that there are 2 ways of arranging two stars and two lines around the list 1 2 3 4 such that the lines dominate the stars. (There are.) Then, if we let n = 7, or 12, or 2,000, all we need to do is calculate the number of ways of embedding our 4-member list in there — given by nCr(n, 4) — and multiply that number by 2. Done.

We do have a bit more work to do, though, because all we know precisely at this point is what happens when x = 2, and we need to consider the cases where x = 3, 4, 5, and 6, because those are all the possibilities when n = 12. (Singletons aren’t allowed, and anything bigger than 6 runs out of space.)

I was able to calculate these values explicitly for x = 2, 3, 4, and roughly for x = 4 (I wasn’t sure about the last one, because it required lots of drawing, erasing, and tallying). So I had a series that looked like 2:2, 3:5, 4:14, 5:~43. I could have tried to figure it out for x = 6, but I saw that the series was growing quickly and I wasn’t too keen on making hundreds of tallies. But I also wasn’t sure that I could figure out a formula; I assumed it would be some tricky recursive combinatorial expression, and frankly, I had a much simpler idea:

* * *

I went to the online encyclopedia of integer sequences, otherwise called “OEIS” or “Sloane’s.”

What it is is a repository of integer sequences (obviously) to which anyone can contribute. For each entry, there is

  • A list of the first twenty or so terms, along with links to more. There’s even a link to an audio file where you can hear the terms spoken out loud.
  • Comments — usually one or many “interpretations” of the list, assuming it has some number-theoretic, algebraic, or geometric significance.
  • Heavy-duty (i.e. peer-reviewed) references of papers that use the sequence in some way; occasionally this includes the paper that “discovered” it.
  • Links to online explanations or discussions.
  • Formulas and computer code (usually Maple and Mathematica) for producing it.
  • Cross references to related sequences.

The way I think about it is less as an encyclopedia of sequences per se than it is a kind of projection of mathematics onto sequence-space. The point being that an integer sequence, because it often has so many mathematical interpretations, acts in some sense like the intersection of those interpretations. There are of course many ways in which two or more mathematical concepts are linked, but a shared integer sequence is among the most useful, precisely because it can be browsed and searched like an encyclopedia.

It also helps that this encyclopedia is online — and therefore hyperlinked — and that its contributors and editors are mostly professional mathematicians or computer scientists. It makes for an incredible resource, one more modest than Wikipedia or IMDb but in some ways more interesting.

The chief difference, I think, is that one can more readily imagine someone using the OEIS to do original research. It’s as much a rich set of raw data as it is a compendium of what’s already known.

Of course it’s possible to be misled by a particular entry, which is why one still has to do the math before taking a connection too seriously — for instance, I bet that nearly any random sequence of five or six strictly increasing numbers less than 100 will show up on Sloane’s; so it might not be prudent to, say, walk into an elevator, see which numbers are highlighted, look them up online, and assume you’ve found some connection between your building and the Stolarsky array read by antidiagonals. But it’s a rich resource for those who know what to do with it.

* * *

So that’s where I went to find the next member of my series (x = 6). Since I wasn’t sure of my value for 5 (43), I just omitted it and searched for “2, 5, 14″, which brought up the “Catalan numbers” as the first entry.

As you may gather from the length of that page, the Catalan sequence is quite important and admits to a whole slew of interpretations. One of these in particular jumps out to me as straightforwardly relevant to the problem at hand:

Ways of joining 2n points on a circle to form n nonintersecting chords.

All you have to do is think of “chords crossing” as “a violation of strict dominance,” and the correspondence should be clear. In my mind it’s a beautiful visualization of the idea.

In any case, it turns out that the 5th member of our series is going to be 132. Which means we’re ready to solve the problem, because we now have a formula for determining the number of strictly dominated pairs given an array length n and subset pairs each of size x. It’s simply the number of ways of embedding 2x elements in n, i.e., nCr(n, 2x), multiplied by what turns out to be the xth Catalan number for that embedded array of 2x points.

Setting n = 12 and ranging over the possible values for x, 2 through 6, gives our answer. Here’s the code:

def fact(n):
    "factorial"
    return reduce(lambda x, y: x * y, range(1, n + 1))

def nCr(n, r):
    "n choose r"
    if n == r:
        return 1
    else:
        return fact(n) / (fact(r) * fact(n - r))

def x_with_x(n, x):
    "number of *disjoint* x-subset pairs one can form with an array of length n"
    if n - x >= x:
        return (nCr(n, x) * nCr(n - x, x)) / 2
    else:
        return 0

catalans = {2:2, 3:5, 4:14, 5:42, 6:132}

def dominated(n, x):
    "number of 'strictly dominated' x_with_x guys -- these need not be checked"
    return nCr(n, 2 * x) * catalans[x]

def need_to_check(n):
    "number of disjoint subset pairs we need to check for array length n"
    tot = 0
    for x in range(2, n // 2 + 1):
        tot += x_with_x(n, x) - dominated(n, x)
    return tot

print need_to_check(12)

It takes .27 seconds to run on my machine.

Metacat

A couple of puzzles

James Marshall’s doctoral thesis, “Metacat: a Self-Watching Cognitive Architecture for Analogy-Making and High-Level Perception,” produced under the tutelage of Douglas Hofstadter at Indiana University’s Fluid Analogies Research Group (FARG), describes a computer program that is capable of solving “string analogy” puzzles like this one:

abc -> abd, ijk -> ?

What’s interesting about this particular problem is that the most obvious answer, ijd, doesn’t actually occur to most people. We tend to see ijl instead.

The reason (I think) is that we automatically recognize that the letters in each of abc and ijk follow one another in the alphabet, so what sticks out for us is the “1-2-3″ pattern. With that in mind, the most natural rule that changes abc to abd is not “replace the last letter by d,” but rather something like “increment the last letter.” Applying that to ijk gives ijl.

The point of articulating these steps explicitly is to demonstrate that there is a lot of work going on under the hood even when we solve the simplest abstract problems. Work which a computer program like Metacat, and by extension its programmers, have to know how to do precisely if the thing is going to have any chance at tackling these puzzles at a human level. They have to know, for example, how to determine that the most salient feature in the problem above is the relative position of the letters — and this turns out to be a pretty significant task.

Of course things only get more difficult as the problems get more complex, e.g.,

abc -> abd, mrrjjj -> ?

You may even have to stop for a minute to think about this one. There are several attractive answers, and the “best” one isn’t the most obvious.

One possibility is mrrjjk, which follows the same logic as our solution to the first problem. But it loses major points for missing the three “clusters” of identical letters in the target string. Accounting for these gives the much more elegant mrrkkk.

That’s not quite optimal, though, because if we look closely we notice that while abc follows the “1-2-3″ pattern in terms of alphabet position, mrrjjj does it with letter frequency. So if our rule turns out to be “increment the last letter-group” (accounting again for those clusters), we ought to make sure we “increment” in the appropriate way.

Our best bet, then, is actually mrrjjjj.

Remarkably, Metacat is able to solve this problem and others still more difficult. Perhaps more important, though, is that the method it uses to do this is in some ways just a slowed-down version of our own unconscious cognition, which, Marshall insists, is really all about analogies.

Analogies as the core of thought

These little puzzles may not seem like much. They are, after all, restricted to a highly stylized microdomain: simple manipulations of letter strings. What’s so general about that?

Consider, as a parallel example, a set of problems where someone is asked to “complete” an integer sequence, as in:

1, 2, 3, ...

or

1, 1, 2, 3, 5, ...

For a person who (say) knows nothing more about math than the successorship of integers — pretend that he’s even ignorant about simple arithmetical operations like addition — the task would become an exercise in (a) scouring for relationships among the numbers, and (b) flexibly updating his best hypothesis as he considers more terms. Both of which strike me as exceedingly general abilities.

(Incidentally, there is a program called Seek-Whence out of FARG that tries to complete integer sequences.)

In the same way, Metacat is less concerned with letters than it is with broadly defined structural relationships.

To see how, think of the steps you had to take to solve one of the puzzles above. Your first move, most likely, was to try to discover some “rule” that transforms the original string. Then, either consciously or not, you tried to highlight the relevant abstract structural similarities between the original string and the target string. And finally, you applied your rule to this abstract representation of the target to produce a solution.

The key to the process, and the part most aptly characterized as an “analogy,” is the mapping you made between the original and target strings, where you had to see the two different sets of letters as, at some level, playing the same role. Which when you come down to it is exactly what’s happening anytime you make an analogy.

If someone were to ask me, for example, who Canada’s “president” is, it would be quite natural for me to tell them the name of our prime minister, rather than saying “nobody” — because in this context prime minister plays roughly the same role as president, namely, head of state, that the person is probably interested in. Similarly, when YouTube pitched themselves as “the Flickr of video,” the notion immediately made sense to users, who could easily imagine transforming Flickr’s features to incorporate videos instead of pictures.

More mundane analogies pervade everyday life. Our language is full of them: we “spend” time, “retrieve” memories, “get ideas across,” and “shoot down” arguments (see Lakoff and Johnson’s Metaphors We Live By for more). And even in our basic interactions with objects, we can’t help but think laterally instead of literally. Marshall gives the example of eating food off of a frisbee while at a picnic.

Now, in one sense it’s somewhat unremarkable to see a frisbee as a plate, but if nothing else it does illustrate the fluidity of our concepts, which for Marshall is pretty much the key to the whole operation. Here’s how he characterizes it:

To some extent every concept in the mind consists of a central core idea surrounded by a much larger “halo” of other related concepts. The amount of overlap between different conceptual halos is not rigid and unchangeable but can instead vary according to the situation at hand. Much work has been done in cognitive psychology investigating the nature of the distances between concepts and categories [Shepard, 1962; Tversky, 1977; Smith and Medin, 1981; Goldstone et al., 1991]. For most people, certain concepts lie relatively close to one another in conceptual space, such as the concepts of mother and father (or perhaps mother and parent) while others are farther apart, at least under normal circumstances. However, like the boundaries defining individual concepts, the degree of association between different concepts can change radically under contextual pressure with the potential result that two or more normally quite dissimilar concepts are brought close together, so that they are both perceived as applying equally well to a particular situation, such as when the Earth is seen as an instance of both the mother concept and the planet concept. This phenomenon, referred to in the Copycat model [Metacat’s predecessor] as conceptual slippage, is what enables apparently unrelated situations to be perceived as being fundamentally “the same” at a deeper, more abstract level.

What’s nice about this view is that it explains apparently ineffable features of the mind, like creativity and insight, as merely special cases of a more general phenomenon. So when Einstein equated acceleration with gravity (and saw gravity as “curved spacetime”), or when Shannon defined information in terms of entropy, they were just exploring unlikely analogies (that ended up being true) — or, in the above terms, they were “slipping” particularly distant concepts.

What makes such fruitful ideating possible is a kind of “elastic looseness,” wherein one’s concepts are allowed to range widely enough to combine in novel ways, but still constrained away from nonsense. Lewis Carroll was especially good at toeing that line — Jabberwocky is wildly imaginative and, at first glance, meaningless, but there is enough structure in there to allow us to translate his made-up words (say, by breaking up the obvious portmanteaus), and “fill in” the plot from context. Most of Dr. Seuss’s stuff is the same way: absurd slippage that might be incomprehensible, except that it’s packaged in an allegory that even youngsters can swallow.

Analogies, then, appear to be fundamental to every kind of thought, from survival-level object recognition (detecting new threats, for instance) all the way to artistic or scientific innovation. And Metacat is, if nothing else, an attempt to articulate explicitly all of the subcognitive machinery that makes such analogies possible. So, assuming Marshall knew what he was doing, figuring out how Metacat works should be a lot like figuring out how the mind works.

Metacat in action

This is what Metacat looks like at the end of a typical run, after it has found a plausible solution:

(Notice that the title of the window says “Workspace,” which is the name of this particular part of the program. It’s only one of many components, but it’s probably the most fun to watch, if only because that’s where all the conceptual structures get built on top of the strings themselves. Put elsely: if you were working on one of these problems by hand with a piece of paper, you’d probably draw something that looked like the picture above.)

There is a lot going on here. If you look carefully, though, you’ll notice that there are really only two “types” of line — straight and squiggly — and that together they comprise the three maps we discussed above: (1) from the original string horizontally to its transformed version, (2) from the original string vertically to the target string, and (3) from the target string horizontally to the solution. In addition there are boxes showing the “groups” formed by “bonds” that represent successor/predecessor/sameness relations, short textual descriptions (e.g. “lmost=>lmost”) of salient mappings, and natural language explanations of the overarching “rules” that determine how the strings are modified.

It’s worth stressing that the horizontal maps are chiefly concerned with differences between the strings, while the vertical map is meant to highlight similarities; this should make sense in light of the discussion above, where we focused on the vertical map as the crux of the analogy — which is of course more about sameness than differentness.

One may understandably be curious about how all of these Workspace structures are formed. The answer is that a whole slew of computational modules, called “codelets,” are sent to the Workspace, one at a time, each with a single low-level task. Here are a few examples (hopefully the names are roughly self-explanatory):

  • Bottom-up bond scouts
  • Bond builders
  • Group evaluators
  • Description builders
  • Rule scouts

Of course the order in which various codelets are executed, and their relative frequency, effectively determines which part of the solution space is being searched at a given time. Which means that the function for choosing codelets in each step should probably be more sophisticated than a random draw, and should in some sense reflect the actual semantics of the problem at hand.

And they do. What Metacat does is to tie the probability of each codelet’s being selected in the next round to the state of the “Slipnet,” which is a kind of semantic network containing variously activated “concepts.” Here’s what it looks like (more active concepts have bigger circles above them):

I will leave it to Marshall to describe how this works, and what it means:

In some ways the Slipnet is similar to a traditional semantic network in that it consists of a set of nodes connected by links. Each of these links has an intrinsic length that represents the general degree of association between the linked nodes, with shorter links connecting more strongly associated nodes. [. . .] Each node corresponds to an individual concept, or rather, to the core of an individual concept. A concept is more properly thought of as being represented by a diffuse region in the Slipnet centered on a single node. Nodes connected to the core node by links are included in the central node’s “conceptual halo” as a probabilistic function of the link lengths. This allows single nodes to be shared among several different concepts at once, depending on the links involved. Thus, concepts in the Slipnet are not sharply defined: rather, they are inherently blurry, and can overlap to varying degrees.

Unlike traditional semantic networks, however, the Slipnet is a dynamic structure. Nodes in the Slipnet receive frequent infusions of activation as a function of the type of perceptual activity occurring in the Workspace. Activation spreads throughout a node’s conceptual halo, flowing across the links emanating from the core node to its neighbors. The amount of spreading activation is mediated by the link lengths, so that more distant nodes receive less activation. However, the link lengths themselves are not necessarily fixed. Some links are labeled by particular Slipnet nodes and may stretch or shrink in accordance with the activation of the label node. A labeled link encodes a specific type of relationship between two concepts, in addition to the conceptual distance separating them. For example, the link between the predecessor and successor nodes is labeled by the opposite node, and the link from the a node to the b node is labeled by the successor node. Whenever a node becomes strongly activated, all links labeled by it shrink. As a result, pairs of concepts connected by these links are brought closer together in the Slipnet, allowing activation to spread more easily between the two, and also making it more likely for conceptual slippages to occur between them.

And,

Whenever new Workspace structures are built, concepts in the Slipnet relating to them receive activation, which then spreads to neighboring concepts. In turn, highly-activated concepts exert top-down pressure on subsequent perceptual processing by promoting the creation of new instances of these concepts in the Workspace. Thus which types of new Workspace structures get built depends strongly on which concepts are relevant (i.e., highly activated) in a given context.

I hope that the correspondence to human cognition is apparent. I am particularly attracted to this idea of a “network of concepts” that (a) directs the activity of lower-level computational/cognitive modules, and then (b) changes shape based on feedback from those modules. It seems to me that when I perceive and think, that is exactly the kind of loop I’m in.

If you have the time (and I assume that if you’re reading this sentence, you do), it might be fun to watch a video of all three of these components — the Workspace, Coderack, and Slipnet — working together on an actual problem. Watch it here.

You might notice a little thermometer in the top-right corner there. That gives a measure of the “perceptual order in the Workspace”: when things are frenzied, and Metacat hasn’t settled into any particular “line of thought,” it adds more randomness to its codelet selections; but when it starts to hone in on an idea, it “cools off” and eases its way into a conceptual groove, determinedly selecting those codelets most likely to finish the job.

An unfortunate side effect of this otherwise wonderful (and again, psychologically plausible) mechanism is that Metacat risks getting caught in a local optimum, or a “snag,” somewhere short of a good solution.

That problem is largely what motivated the idea of a successor to Copycat in the first place. Indeed, what makes Metacat “meta” are a set of self-watching features designed to keep the program out of snags, by (a) watching its own activity in the Workspace and Slipnet so that it can recognize dead ends, and (b) “remembering” particular analogy problems and old solutions, to draw on if it does get stuck.

These meta features show up in the architectural overview that Marshall gives on page 56, which is probably worth taking a look at anyway:

That, in a nutshell, is how Metacat works. It’s also probably a good approximation to how your mind works, if not at a neural level than at a conceptual one — which is probably more interesting, at least for now.

If you’d like to read more, or install the software yourself, or even dive into the tens of thousands of lines of LISP code that makes this run, check out Marshall’s project page here: http://science.slc.edu/~jmarshall/metacat/.

N: A Coffeehouse Conversation on Minds and Metaphor

The speakers are SIMPLICIO and SALVIATI, both students of Nietzsche. They have just read “On Truth and Falsity in their Ultramoral Sense.”

SALVIATI: It’s quite a thrill, isn’t it, to learn that the words we know and love are effectively arbitrary? That “hard” only means hard by fiat?

SIMPLICIO: It would be a thrill if it were true. But words aren’t quite as arbitrary as you or Nietzsche seem to think. Consider how the ones we use frequently, like “I” and “you,” are much shorter than those encountered rarely, like “subjectivity” and “historicization.” That’s no accident: it is just more efficient, when designing a language, to make sure that your common words are shorter than the uncommon ones. The idea can be made rigorous if we appeal to something called the Kraft-McMillan theorem…

SALVIATI: I’m sure it can, but that won’t help you here. To rebut your point all I have to do is ask, so what if we evolved a somewhat “efficient” language? Efficiency is a human value. All you’ve done is replace a little arbitrariness here with a little there.

SIMPLICIO: What you’re saying is that the concept of efficiency itself is no less an arbitrary human “construct,” if you will, than the words we live by? At least by Nietzsche’s standard of arbitrariness?

SALVIATI: Exactly. Even if our languages have internal structure–maybe word lengths are distributed according to some general rule that maximizes efficiency; maybe there’s a “universal grammar,” and every dialect in every country uses some subset of the exact same vowel sounds–that won’t impress Nietzsche, because those regularities are all wrapped up in e.g. the way the human brain is wired. And that’s precisely his point: none of that internal structure buys you a substantive map between words and their objects, and in that very important sense, words are arbitrary. A rock by any other name would still be whatever the hell a “rock” actually is.

SIMPLICIO: Hmm. What about onomatopoeia? In those cases there’s a very clear relation between the word, say “quack,” and its object, namely, the sound a duck makes. Is there not?

SALVIATI: Not exactly. You have to be careful to distinguish between the thing itself–the sound a duck makes–and your percept of it, viz., the sound a duck makes to you. And that’s really the central problem that Nietzsche was driving at. You, and I, and every other idiot human out there, acts as though our perceptual equipment gives us a window to the world: we hear a duck and think, “that’s what a duck sounds like”; we see a stop sign say, “that sign is red.” But the only reason ducks quack and stop signs are red is because our “nerve impulses,” as N. put it, tell us so. It might help to think of your eyes as less like cameras receiving input from the world and more like projectors making this stuff up; the picture only seems objective because everyone else has the same kind of projector as you.

SIMPLICIO: See, I’m willing to grant that words (with some caveats) are basically cooked up at random, i.e. that as long as we were consistent, reversing “hard” and “soft” wouldn’t make that much of a difference. But I am decidedly not on board with the idea that our perception of hardness itself is somehow arbitrary. There are actual rocks! They’re hard!

SALVIATI: Again, you talk as though you have access to objective reality,–

SIMPLICIO: I do, as a matter of fact. These eyes of mine you insist on calling “projectors” are, believe it or not, little curved windows: a black hole and some eyeball-shaped glass. All they do is let the light in.

SALVIATI: Don’t forget your retinas, your optic nerve, your lateral geniculate nucleus, your occipital lobe, your visual cortex, and all the neural pathways that integrate everything you see into a unified “mental image.”

SIMPLICIO: And what exactly do you think all this gear is doing? “Making stuff up”?

SALVIATI: Do I believe that the visual processing machinery housed in and around your brain is literally constructing images out of nothing? No. Obviously there is some correspondence between the “light you’re letting in” and what you perceive. But your brain does do an incredible amount of computation before you see what you see, and who are you to say that it’s the “right” computation? Another way of putting that (and it’s an angle Nietzsche took in the essay) is this: there are all kinds of animals on Earth, including ones with radically different sensory apparatus than yours–the canonical example (since Nagel at least) being the bat and its sense of echolocation. What privileges your particular picture of things?

SIMPLICIO: Evolution, that’s what. We happen to be (cognitively, at least) the most advanced species on the planet. Which is to say, in the optimization problem that is evolution we’re the best solution so far. So presumably the way we see things is closer to reality than the way bats see them. Evidence: bats didn’t land a bat on the moon, and there aren’t any bat movies about bat superheroes, or bat malls where you can buy, I don’t know, blood at exorbitant prices, and as f–

SALVIATI: You have to keep in mind what you’re optimizing for.

SIMPLICIO: Pardon?

SALVIATI: You just said that “in the optimization problem that is evolution, humans are the best solution so far.” My question is, what is this optimization problem optimizing for? What’s the goal?

SIMPLICIO: Reproductive fitness. The number of grandchildren you have.

SALVIATI: Right–evolution doesn’t care if your beliefs are closer to reality (whatever that means) than a bat’s. All it cares about is your ability to replicate your DNA before you die. (Which probably has something to do with the 1020 new viral particles created by the human population alone each day…)

SIMPLICIO: Well, evolution doesn’t care about anything; it’s a dumb, blind process that just does what it does, no agency or intentionality involved. But presumably, having a better model of the world–knowing for instance where things are, what can and cannot be eaten, how to tell leaves apart from branches, etc.–improves reproductive fitness. As such, the degree to which one’s beliefs map to reality in those kinds of ways is really the key to the whole operation, if you think about it.

SALVIATI: No no no… that is exactly what I’m arguing against! [pauses] Consider the tendency among mammals to see macroscopic objects: leaves, tigers, rocks, etc.

SIMPLICIO: More than a “tendency,” I’d say.

SALVIATI: Sure. But what if this roughly universal feature of mammalian minds was only a metaphor, a helpful mapping from one frame to another and nothing more? So just as a word maps neatly, consistently, but totally arbitrarily to a percept–as in “hard” mapping to the physical sensation of hardness–so does a percept map neatly, consistently, but totally arbitrarily to some thing-in-the-world.

SIMPLICIO: So in effect what you’re claiming here is that tigers, say, are as tenuous as words.

SALVIATI: Exactly! In the same way that “dangerous,” for example, is a feature we ascribe to certain things (like tigers), and not intrinsic to those things, we ascribe “tigerness” to certain patches of matter or whatever.

SIMPLICIO: Yes, but there are a lot of rules about how we ascribe “tigerness”; we don’t go around willy-nilly accusing things of being a tiger! It must be striped, have four legs, live in the jungle, etc. etc.

SALVIATI: I hope you recognize that all you’ve done there is to break down the “tigerness” problem into several subproblems–“stripedness,” “four-leggedness,” etc.–that themselves admit to the same process. So all I have to do to defeat this particular plaint of yours is ask you to define “tigerness” in a way that doesn’t lead to an infinite regression of recursively defined types.

SIMPLICIO: I don’t quite see what you mean.

SALVIATI: Well, your goal here is to show that while the map between the word “tiger” and the percept of a tiger may be arbitrary, the map between that percept and the actual tiger is not. Instead, you claim, it’s grounded in all kinds of rules that capture something of the real structure of the tiger itself. Something that flies around and shits keyboards, for instance, would not set off any “tigerness” alarm bells.

SIMPLICIO: Yes, I’d like to think that “tigerness” would be pretty low on the list of concepts activated by some kind of flying object that pooped computer peripherals.

SALVIATI: Okay. Then what you’re not seeing yet is that these structural features that you take to be actually there are really in your head. That is, if you grant that “tigerness” is just the composition, in your mind, of things like “stripedness” and “four-leggedness,” I need merely point out that these sub-concepts are likewise composed–“four-leggedness” is built out of the concepts “legs” and “four.” And the process never bottoms out.

SIMPLICIO: Sure it does! That’s what Nietzsche failed to realize, too. It’s not like my brain makes these patterns that lead up to tigers out of nothing. There is a bottom: the light that hits my retina, for instance, is itself highly patterned–before it gets there. Feynman puts it this way: imagine that you’re sitting by the pool and a kid jumps in and makes a nice big splash. Of course he’s not the only one in the water: there are all kinds of kids swimming in there left and right, playing “Marco, Polo” and swatting at each other with those foam noodles… which makes, overall, for a lot of choppiness. The question Feynman then asks is whether there are enough clues in all that activity so that a clever little insect floating on the surface somewhere in the corner could deduce, just from being disturbed by the waves, who dived in when, who’s swimming in what direction, and so on.

SALVIATI: Sounds tricky.

SIMPLICIO: It sure does. But guess what Feynman says next: that’s what we’re doing when we’re looking at something. In other words, that’s what all that heavy-duty machinery at the back of your brain is for; it’s doing the kinds of calculations that that little insect would have to do to figure out what the waves mean. Point being, there may not be macroscopic objects per se, but there are, like you said, little patterned “patches of matter” (and electromagnetic fields and the like) that we sense and make sense of. We take those patterns and work our way up chains of abstraction, losing bits and pieces of the hard data as we go, until photonic fluctuations become the image of an animal.

SALVIATI: So you are granting that these “chains of abstraction” are just as metaphorical as those that we use to transfer percepts into words?

SIMPLICIO: Yes, I’ll grant that. But there are real patterns out there in the world, and brains that extract them effectively, and act on them effectively, will fare better than those that don’t. That’s how evolution works. Which means that our brains must be pretty damn excellent at this stuff, because otherwise we wouldn’t be having this conversation.

SALVIATI: Nicely argued. I’m going to get a cup of coffee, and when I come back I’ll tell you how Nietzsche, about 140 years ago, ripped the rug out from under most of what you’ve just said. Want anything?

* * *

SALVIATI: Okay. So your premise here is something like this: your brain is coded for by your DNA, which in total is comprised of roughly ten megabytes of “active” information (with the rest being “junk,” as they say). This DNA was arrived at after billions of billions of trials–a series of parallel computations, if you will–as organisms of various kinds were born, competed for scarce resources, and died since the dawn of life. That so much computation is embedded in so little data (less than the source code for Microsoft Word, actually) must mean that there has been a massive amount of compression: your DNA must be an extremely compact representation of some of the actual structure in the world, in the same way that (2π/φ^2)n is a compact representation of the floret pattern in a sunflower. The odds of such a small DNA code producing (very) capable organisms by chance are laughably tiny, and thus, we can reasonably assert that the code actually codes for something, namely features of the real world.

SIMPLICIO: I couldn’t have said it better myself. Think of how Newtonian mechanics are baked into what are effectively spear-throwing algorithms in the human brain, or how our muscle spindles feed into simple trigonometric functions computed by motor neurons to give us an incredibly accurate sense of how our limbs are oriented, or how edge detector neurons use Gaussian filters to discern macroscopic objects. Like that little insect in the corner of a pool, we’re able to figure out what’s going on in this mess because the very laws of physics–e.g. that gravity accelerates at 9.8 m/sec2–are represented to some approximation in that overeducated squishy noggin of yours.

SALVIATI: Or they’re hidden behind the bush.

SIMPLICIO: Sorry?

SALVIATI: That’s Nietzsche. He said that “If somebody hides a thing behind a bush, seeks it again and finds it in the self-same place, then there is not much to boast of.” And here you are putting all kinds of mathematical patterns “in the real world” and acting as though you found them there. Look at the bottom of page 186:

All obedience to law which impresses us so forcibly in the orbits of stars and in chemical processes coincides at the bottom with those qualities which we ourselves attach to those things, so that it is we who thereby make the impression upon ourselves.

SIMPLICIO: So in your view I’m conflating the map with the territory?

SALVIATI: That’s it! Remember how I said that “efficiency” was a human concept, and that therefore your appeal to it was just as arbitrary as the choice of the word “rock”? And how I then argued that even things that seem elemental, like “four-leggedness,” are actually just arbitrary concepts themselves composed of other arbitrary (read: human) concepts?

SIMPLICIO: Sure.

SALVIATI: Well that’s the way it goes with math, too! We look at Newton’s equations and assume they map in a very perspicacious way to something actually in the world, like a map maps to the territory. We feel special because we assume we’ve evolved the best damn picture of the territory there is. What we don’t realize is that everything is in the map.

SIMPLICIO: Even the very foundations of math, like set theory and the natural numbers… you would say these are “made up”?

SALVIATI: Yes: there’s nothing “natural” about the natural numbers at all.

SIMPLICIO: It’s one thing to point out that we do arithmetic in base ten because we have five fingers; that’s quite transparently an arbitrary convention. But numerosity itself–the very idea of a “number”–is not a human product. If anything is true, that is.

SALVIATI: Bingo: there is no truth, at least in the ultramoral sense. It’s all Vanity-with-a-capital-V.

SIMPLICIO: Cute, but I’m not convinced. What, in your view, is so unnatural about the natural numbers?

SALVIATI: Well you agree that they’re not quite at the bottom of all mathematics? That they’re in turn built out of more primitive objects?

SIMPLICIO: Sure–in a very strict sense we can construct N inductively by embedding sets, so that 0 = {}, 1 = {0} = {{}}, 2 = {1} = {{{}}}, and so on.

SALVIATI: And what are these sets, exactly?

SIMPLICIO: Collections of distinct objects. Or, one could think of a set as a rule that divides the universe into (a) the things that belong inside the set and (b) the things that belong outside of it. The even numbers, for example, can be written in the first way (as a collection of distinct objects) as {2, 4, 6, 8, … } and in the second (as a kind of rule) as {n in N: n = 0 mod 2}.

SALVIATI: So at absolute bottom is, in one formulation, the notion of “discrete collections,” and in another, the rule “A vs. not-A.” Correct?

SIMPLICIO: Yes, absolutely. That is ground zero.

SALVIATI: Well, sir, don’t you see where this is going?

SIMPLICIO: Sure I do. You want to argue that these “absolute fundamentals” are no less metaphorical, no less anthropomorphic and “in the map” than higher-order percepts or words. You want to argue that pattern itself is in my head. That the laws of nature have about the same ontological status as the laws of man.

SALVIATI: Well?

SIMPLICIO: I think that’s wonderful. I really do. One question, though: so what? What becomes of the world after you’ve turned it on its head?

SALVIATI: Well it’s not like I’m speaking to Congress here; the only thing I’m really aiming to change, if only a little bit, is your point of view.

SIMPLICIO: How so?

SALVIATI: Maybe it’s a bit pessimistic, but as a rule I think you would do quite well to never underestimate a person’s insecurity, selfishness, or self-deception. And for a practitioner of post-Popperian science like yourself, combating that evil triumverate–if only for the sake of humility–means seeing one’s search for truth as deeply metaphorical, and ultimately subjective.

SIMPLICIO: Not that there’s anything wrong with that…

SALVIATI: Exactly.

[pdf version]

Those maps at the beginning of books, or, a few words about teaching

If you’re out to understand a story that’s really located, as in deeply bound to a particular place, you would do well to have at least a murky mental picture of nearby landmarks, both natural and manmade. For books like Joyce’s Ulysses, where the action so often hinges on spatial minutiae — like which side of the street Bloom’s on — you probably need something more vivid, and of course more accurate.

Now some authors (or more likely, their publishers) will occasionally offer a partial solution to this problem by providing an aerial map of the region. But this rarely turns out to be helpful, if it’s ever even read.

The problem is not that these are bad maps, like that they omit salient details or leave too little to the imagination. It’s just that they’re in the wrong part of the book. Almost always, they’re buried in what book-people call the “front matter”: the edition notice, title page, dedication, table of contents, preface, foreword, prologue, introduction, etc.

Even if you read that stuff (which many people don’t, either because it’s boring or because they’re careful to avoid spoilers), odds are you won’t linger for long on the map. The reason is that it means about as little to you now, when you’ve first picked up the book, as it ever will. Every feature worth caring about — the fact that George Willard’s new house is a two-minute walk to the railroad, or that Hern’s Grocery opens onto the East side of the street (so Kate will see the sun setting on her way out) — is tied in some way to the characters, who haven’t yet arrived.

It is of course possible to flip between the text and the map, but this is only likely to happen (a) if something about the scenery is especially (or really, espatially) confusing, or (b) if you’re forced to. But if the map has proven valueless the first time you’ve looked at it, you probably won’t want to look at it again.

Which suggests the following question: when should it be presented?

I think the end of the book is just as bad as the beginning, because by then it’s too late to start thinking about new scenes in terms of the map, which is how you’d populate it with the kind of rich associational content that makes maps useful.

What you want, then, is to find some place late enough that at least a few locales or landmarks will ring a bell — ideally pushing you to mentally reorganize some of what you’ve just read –, yet also early enough for there to be time for meat to grow on your scaffold.

The best experience I’ve had with this kind of thing was in Don Gifford’s Ulysses Annotated, a must-have for anyone trying to tackle Joyce’s masterpiece.

What Gifford did was print a map at the end of each chapter (or more exactly, at the beginning of each chapter’s endnotes) that was relevant to that chapter alone. What’s more, he’d trace the route that the characters had taken through that particular part of the city. Which meant that as you routinely consulted the guide, you were effectively confronted with your main character’s real-time location — because you had in mind exactly where he was, or what he was looking at, and you could find it on Gifford’s map; and then of course you’d look ahead a little, which helped you anticipate and orient yourself for the character’s next move. By the end of the chapter you effectively had a little movie in your head, because you’d traced the whole thing out as you went along.

Of course, the efficacy here has a lot to do with the fact that you were in fact forced to consult the map. But Gifford made it worth your while, and other books could follow his lead if they thought creatively.

* * *

The reason I wanted to articulate this question of “where to put the map” in detail is that I think it serves as a useful model for learning in general.

That is, when I introspect about content that I’ve retained quickly and understood well, I realize that so much of what made it click had to do with the timing and ordering of my encounters with scaffold-material:

  • terms/definitions
  • theorems
  • maps
  • graphs
  • tables
  • timelines
  • categories/classes
  • nested combinations of the above

versus meat-material:

  • examples
  • proofs
  • problems
  • exercises
  • stories
  • first-hand experiences
  • instantiations
  • etc.

The key for me, again, is to scaffold only after I’ve got some meat, but still early enough to enable ample use of that scaffold.

The idea of “the meta level,” for example, is a lot easier to understand if you tacitly have in mind many examples of it before you finally encounter the “concept”; but there is some pressure to learn it sooner rather than later, because once you do articulate that definition precisely, you’ll be able to “call” it as a kind of cognitive computational module — an especially useful one at that.

This is why I advocate problem-based approaches to computer programming [pdf], because you learn the big ideas from the bottom up: you start by working hard on concrete examples, which are then generalized (often with help from others) into useful principles, honed by more work, and generalized again. The cycle is repeated and before you know it you will really know it.

This is in contrast to the standard (K-12) approach, which if I’m not mistaken starts with “concepts” and has you do exercises as applications of these concepts. It’s backwards. It’s like putting the map at the beginning of the book.