the jsomers.net blog.

Spillage

[Composed on March 12th, 2010, at 2:17 in the morning.]

I’m very tired. I’m writing, though, to keep a streak alive: one hour per day, every day. Last month I did twenty-four days in a row. Tonight will be my sixth. I’m shooting for hundreds.

Writing under these conditions can have you producing crap in no time. But creative writing teachers and professional authors actually encourage this sort of meandering associative spillage. They say it’s okay to write with abandon as long as you clean it up later. All writing is rewriting, they say.

When I play squash I love to go after cutesy drop shots and boasts. Part of it is that it’s fun to feel as though I’m finessing the ball, and part of it is that these shots are devastating when they succeed.

But they rarely do. That’s why my coach urged me to hit deep solid drives instead. Those I could more reliably control; they were easier to practice; and when they failed, I wouldn’t be left totally exposed.

Writing can work the same way. I like to compose long, breathy sentences, sentences packed with commas and clauses in an effusion of iambic feet, the sort of sentences that build, in a meandering prosodic way, a feeling in the reader of tension—eased by an interruption—that predictably unwind, rhythmically beat by beat, toward a quick but satisfying end.

It’s hard to stop writing like that once you start. You sort of have to snap yourself out of it: keep things short and punchy. That’s what I’m doing right now. These sentences are my writerly attempt to hit solid deep balls to the corner. Are they working?

Project Euler problem 191, or, how I learned to stop counting and love induction

Of the 142 Project Euler problems I’ve battled so far, it may just be that #191 has put up the most interesting fight, not because the problem itself was all that difficult or unusual or enlightening, but because the road to solving it—seven to ten hours of work, seven sheets of paper, several trips to the online encyclopedia of integer sequences, three or four minor insights, and one major breakthrough—was for me almost a paradigm of puzzle-solving, or, closer to the point, a paradigm of how a mathematically crippled mistake-prone dabbler like myself solves puzzles.

Here’s the problem statement that kicked it all off:

A particular school offers cash rewards to children with good attendance and punctuality. If they are absent for three consecutive days or late on more than one occasion then they forfeit their prize.

During an n-day period a trinary string is formed for each child consisting of L’s (late), O’s (on time), and A’s (absent).

Although there are eighty-one trinary strings for a 4-day period that can be formed, exactly forty-three strings would lead to a prize:

  OOOO OOOA OOOL OOAO OOAA OOAL OOLO OOLA OAOO OAOA
  OAOL OAAO OAAL OALO OALA OLOO OLOA OLAO OLAA AOOO
  AOOA AOOL AOAO AOAA AOAL AOLO AOLA AAOO AAOA AAOL
  AALO AALA ALOO ALOA ALAO ALAA LOOO LOOA LOAO LOAA
  LAOO LAOA LAAO

How many “prize” strings exist over a 30-day period?

I encourage you to give it a shot. If you’ve solved lots of puzzles like this in the past, you’ll probably be able to cook up a solution in under an hour. If not, expect to spend an afternoon or two scratching out all sorts of cases and conditions and combinatoric formulations. Either way it should be a great deal of fun.

When I first saw the problem I had only two approaches in mind:

  1. If we let n be the number of days, we know that there are 3n possible strings of Os and As and Ls. To find the number of “prize” strings, then, all we have to do is subtract from 3n the number of strings that couldn’t be winners.

  2. Or, we could go the constructive route and try to only count prize strings.

Intuition told me that the first way would be a lot easier, so that’s the way I went.

A first crack

The problem statement helps us out by giving the solution to a small case, n = 4, which we can use to validate our approach before ramping up to the much trickier case where n = 30.

What we’re trying to do is enumerate all of the strings with either the substring AAA or two or more not-necessarily-adjacent Ls, because these are the strings that are ineligible for a prize. We know that we have 34 = 81 possible strings to begin with, and from the problem statement we’ve been told that 43 of these are prize strings, so whatever rules or program we come up with to prune them should wipe out exactly 38 (=81 – 43).

At a first pass I mustered the following rough (and wildly incorrect) count:

  • There are exactly two strings containing AAA.
  • Six strings contain two Ls, a number I arrived at by taking “4 choose 2″, or 4!/2!2!
  • Four strings have three Ls (by the same argument as the two-L case).
  • One string has four Ls (ditto).

Unfortunately that left me with only thirteen candidates for elimination, far short of required 43. Can you see what I was doing wrong?

I left out a whole lot of permutations! Each of the cases above is actually under-specified, in the sense that there are multiple strings that could satisfy the given condition. For example, although I said there were only two strings containing AAA, there are actually a lot more, because each template—AAA_ or _AAA—can be completed with any one of O, L, or A. Similarly, in cases where we’re counting Ls, we need to remember that the spots without Ls can have either an O or an A, so that each L-template really has 2k possible fillings, where k is just the number of non-L spots.

Trouble looms

Are you beginning to smell the complexity?

Consider the rule for counting strings with an AAA in them. Even if you multiply 2 (for the two templates AAA_ and _AAA) by 3 (because in each template you can have three possible fillings: O, L, or A), you’re still incorrectly counting the number of ineligible strings. In particular, you’ve double-counted the case where an A fills the blank, because regardless of which template you use, you end up with the same string AAAA.

At first this might not seem like a big deal, but that little AAAA snag and others like it will turn out to be major wrenches in the works, hitches enough to drive me into what an impartial observer might have described as hours of psychotic numerology.

The trick is to see what happens when you crank n up to 30. Remember what we’re trying to do: ultimately, we want the number of eligible prize strings. To wit, we want to first count the non-prize strings—the number of strings with three or more As in a row (LOOAAAO..., ...AAAAO, L...AAAAAAOO...O, etc.) or two or more Ls (OAAL...AOLO, AOOLLALLL..., etc.)—and then subtract that number from 330, the number of total possible strings.

Suppose we start by trying to count the first type of string, the ones with a bunch of AAAs in them. At some point we’re going to run into a string that also has a few Ls, and it’s important that we somehow keep track of when this happens. If not, we’ll end up counting such a string twice, first when we go through looking for AAA-style strings and second when we look for strings with multiple Ls.

This is exactly the kind of bookkeeping you don’t want to run into when hacking on a combinatorics problem, because it almost always means you’ve taken a wrong turn. Project Euler is careful to choose problems that can be solved elegantly, usually with less than ten lines of code, so it’s a major red flag anytime you’ve got pages of branching cases and ad hoc “rules.”

Divide and conquer?

At this point I probably should have gone back to square one. Instead, though, I hatched what I thought was a workable game plan: I’d avoid the double-counting issue by breaking those non-prize strings into mutually exclusive sets.

I’d proceed in three steps:

  1. First, count the strings of length 30 with two or more Ls in them. The non-L spots in these strings can filled any whichway. They can even have runs of three or more As in a row—it doesn’t matter.
  2. Second, count all of the strings with runs of As in them that also don’t have any Ls. So we’re just considering the length-30 strings with some mix of Os and As, and counting only the ones that include three or more As in a row.
  3. Finally, count all of the strings with three or more As in a row that also have exactly one L in them.

Do you see how that works? The trick is that we’re taking care of all the messy stuff in Step 1, and restricting ourselves in Steps 2 and 3 to small precise collections of ineligible strings—strings where only the AAAs matter. So our three sets are guaranteed, by construction, not too overlap with one another.

So much for the theory—at a certain point one has to actually count these things, either with a formula or a simple program. Will my sets be well enough behaved, or will they too have all sorts of quirks and caveats?

I tamed the first set rather quickly. It turned out that counting those strings, the ones with with two or more Ls, was simply a matter of taking a sum from i = 2 to i = 30 of “30 choose i” * 2(30 – i).

That first term, “30 choose i,” just counts the number of ways of arranging i Ls in a string with 30 different spots. For i = 3 it would count the number of templates that look like __L...L_..._L or LL__...L..., i.e., templates with exactly three Ls in them. The second term just counts the number of possible fillings-in of each template: for each spot there are by assumption two choices (O or A), and each template has 30 – i open spots (spots without an L in them).

The second set is a lot more complicated, but once I thought about it I was immediately reminded of another Project Euler problem, one I’d worked on—without solving—just a few weeks before. It was problem #114.

Old friends and new

One way to see the correspondence is to imagine the red blocks in problem #114 as As and the black ones as Os. If you solve it for n = 30, then, you’ve implicitly solve my Step 2 above, because you’ve counted the number of strings with at least one run of three or more As and no Ls.

Or so I thought. What you’ve actually done is undercounted those strings by a huge margin, because you’ve left out strings where an A—a red block—fills one of the spots not already covered by a run of As. Problem #114 explicitly requires that the only red blocks you can have must appear in runs of three or more. You can’t, in other words, have stragglers.

You can imagine what I did next. I tried not only to solve #114, a problem I couldn’t crack weeks earlier (even with the help of a mathematically-inclined friend who wanted to chip in), but to do so knowing full well it’d only get me partway toward a solution to the problem at hand—thanks to those straggler As. It cost me about two hours of mostly unnecessary work, an expense I would have avoided if I had never seen #114 in the first place. It was an unlikely lesson: drawing on experience can sometimes do more harm than good.

After scrapping that angle I decided to do something a bit ignoble. I decided to write some code to exhaustively spit out trinary strings, not of length 30—such a program would take weeks to run—but for small ns. For each n I’d keep a count of the number of strings that would fall into each of the sets I named above, with a specific focus on the strings in Step 2: strings with zero Ls and at least one run of three or more As in a row. I’d then take that partial series and plug it into the Online Encyclopedia of Integer Sequences. If I hit a match I might be able to steal a nice clean formula.

(I’ve raved elsewhere about the OEIS, so I won’t do that again here. Suffice it to say it’s an invaluable resource and a constant temptation when working on Project Euler problems. One must use it sparingly and as a last resort.)

Much to my delight, the very sequence I had generated was chronicled in great detail on the OEIS as sequence #A050231, “the number of n-tosses having a run of 3 or more heads for a fair coin.”

Yoink!

Descent into madness

#A050231 was a huge boon—rather than computing the count in Step 2 above, I could literally copy and paste the correct value from a list someone else had already generated and posted to the OEIS. And with that I was only one term away from a full solution. All that was left was to work through Step 3: counting strings just like the ones in Step 2 that also include a single L.

When you think about it, it actually seems like a trivial modification could get you from the strings that A050231 is counting to the strings that also include a single L (call them L-strings). To get the number of L-strings, all you have to do is to take each A050231-string—each string of As and Os—and multiply it by the number of O-spots (as opposed to A-spots), those being precisely the spots where you can put the L. As an example, for the A050231-string OAAAOAOAAAA I would have three corresponding L-strings, LAAAOAOAAAA, OAAALAOAAAA, and OAAAOALAAAA, one for each O-spot.

But since I wasn’t man enough to generate A050231 myself, and so didn’t understand its structure at anything but the shallowest level, I had no way of doing that simple multiplication, because for each n I only knew the number of A050231-strings, without knowing how many O-spots each of them contained.

So after fumbling around the OEIS for just over an hour trying to find some code or sequence or math that might get me those numbers, I was left seemingly without a way out. But instead of dropping the idea, I decided to work it out in painstaking detail. I decided to work out a sequence that tracked, for each string of length n, (a) the number of ways of arranging As, with the proviso that each “way” must include a run of three or more, and (b) the number of empty spots (O-spots) in each of these arrangements. I’d then look for patterns.

You can see the insane work that came out of this insane project by looking at pages 1 and 2 in this PDF of scratch paper. The “crazy-person triangles” that resulted include a list of comma-separated terms for each n, where the large numbers specify the number of O-spots we’re working with (call it k) and the subscripts count the number possible arrangements of Type 2 strings—As and Os and at least one run of three or more—for each of those ks. (I’m pretty sure only the second triangle is correct, which is why I went on to larger ns with it. It’s been long enough since I did this work, though, that I don’t really know what I was thinking or what’s actually going on in the first triangle.)

I used a helper script to compute those subscripts as n and k increased. I had to stop after n = 14 or so, both because I was running out of space on the page and because the subscripts were taking too long to generate. But even with eleven different rows of numbers, I wasn’t able to find any meaningful patterns. I did find all sorts of unmeaningful patterns, mostly by subtracting sequences of numbers and then recursively subtracting the differences until I hit some sort of constant or steady state; but I’m pretty sure any sequence can degrade that way after enough “diffs of diffs.” You can see on page 2 of my scrap work that I came upon a few promising generating formulas, both horizontally (as k varies) and vertically (as n varies), but nothing panned out.

The clouds part

After all that—after all that—I decided to scrap the work I’d done so far and take another tack. Because it hit me like a ton of bricks: I should be solving this problem inductively.

Think about it this way. Suppose I know how exactly how many prize strings there are for a given n. Now increment n by one. What changes?

Well, there are only three letters that can be added to a given string: O, A, and L. If I add an L to a string that already has an L, all of a sudden that string becomes ineligible; same thing if I add an A to a prize string ending in AA. If I had an O to any string, it has no effect. If I add any letter to an ineligible string, it stays ineligible. Etc.

It’s starting to sound like all we need to do, then, is keep counts of certain string “types”—strings ending with AA, strings with one L, ineligible strings, etc.—and formulate rules for how these counts change each time we increment n.

That’s exactly what I did. After lots of trial and error, I determined that I needed to track six different types of special strings:

  1. Eligible strings ending in AA.
  2. Eligible strings with zero Ls.
  3. Eligible strings ending in A (and only A).
  4. Eligible strings ending in AA that also have exactly one L.
  5. Eligible strings ending in A that also have exactly one L.
  6. Eligible strings with one L that don’t end in A.

It turns out that just this set of strings, call them a through f, are enough to keep an accurate accounting of prize strings as you increment n all the way up to 30.

An example should illustrate the method. Take c-strings, strings that end in a single A. Suppose that when n = 3 we have exactly five of them (we do: they’re OOA, AOA, LOA, OLA, and ALA). What happens to that count (5) when we increment n? Well, we have to consider three cases, one for each of the possible new letters O, A, and L.

(Keep in mind that what we’re doing in each case is essentially generating 3n + 1 new strings—that is, we’re taking each of the 3n strings we have and appending one letter (O, for instance) to all of them. Do that three times and we have our full round of strings for the n + 1 case, because 3n * 3 is 3n + 1. I say “essentially” above because what we’re really doing is generating 3 * t new strings, where t is the number of eligible (prize) strings for a given n. We ignore ineligible strings.

Either way, the question is how many of these new -O strings and -L strings and -A strings are of type c.)

  • Case O: If we add an O then all of the five eligible strings that used to end in a single A no longer end in a single A. And all of the strings that didn’t end in a single A still don’t. So the net effect of the Os is to decrease our count by 5.
  • Case L: This works the same as the O, obviously. A net loss of 5 c-strings.
  • Case A: If we add an A to one of the five c-strings, it’ll no longer be a c-string. But if we add an A to any of the other strings it will become a c-string. So there’s a net gain here of t – 5.

So we have a total number of ((t – 5) – 5) – 5 c-strings in our next round, which for n = 3 is ((27 – 5) – 5) – 5 = 12. If you were to enumerate all of the eligible trinary strings for n = 4, you’d see that there are in fact exactly 12 of them that end in a single A. Our method works!

What’s nice is that we now have an exact formula that tells us how to get the number of c-strings in round n + 1 if we know the number of c-strings in round n. And what’s more, that formula depends only on t, the total number of eligible prize strings in a given round; c; and a, where a tracks the number of strings ending in two As. (Where does that a come from? Well, when you add an A to a string that already ends in two As, you “kill” it (render it ineligible). So you need to know how many a-strings are floating around.)

Similarly, all the rest of af depend only on t and the values of af themselves. That’s the sense in which the set af is complete.

The upshot of all this is that we can now write an obnoxiously simple program to find t when n = 30. We simply seed a list of values for n, t, a, b, c, d, e, and f[1, 3, 0, 2, 1, 0, 0, 1]—and loop over it thirty times, applying our various formulas (like the one for c-strings above) with each iteration. Here are the simplified formulas for each element:

  • n -> n + 1
  • t -> 2 * t + b - a
  • a -> c
  • b -> 2 * b - a + d
  • c -> t - (a + c)
  • d -> e
  • e -> f
  • f -> t

I encourage you to try to derive these yourself. You can consult my scratch paper for hints.

We’re left with this wonderful, short, extremely fast solution:

# [n, tot, -AA, 0Ls, -A, 1L-AA, 1L-A, 1LnotA]
lst = [1, 3, 0, 2, 1, 0, 0, 1]
while lst[0] < 30
  n, t, a, b, c, d, e, f = lst
  lst = [n + 1, 2 * t + b - a, c, 2 * b - a + d, t - (a + c), e, f, t]
end
p lst[1] # => 1918080160

Boredom

I don’t seem to get bored anymore. I mean that I no longer get the feeling that I want to do something, but not any of the things I can think of doing. Of course I still get bored of individual activities.

What happened?

I think that the biggest change as I’ve gotten older is that I now see activities at a higher grain. Where I used to survey my options as “reading,” “watching TV,” “playing a video game,” or “running around outside,” I now break each of these into lots of sub-options: “reading the newspaper,” “reading the Economist,” “reading Reader posts,” “reading Cryptonomicon,” “reading Le Ton Beau de Marot,” etc.

Since boredom is the feeling of not having an outlet for my energy—of not liking my options—and since I now seem to have more options, I have a smaller chance of getting bored. I can always think of something to do because my mind has matured to be stimulable by—and capable of—more and more things.

It’s the old idea of “seeing the world in a grain of sand”: the closer you zoom into something, the richer it gets. And what have I been doing in my intellectual life over the past eight or nine years? Zooming in.

* * *

With that in mind, I can now see what might threaten my newfound unborability: anything that forces me, or encourages me, to zoom out.

The trouble is that these kinds of things are proliferating. Most of what I read on the web, for instance, is trying to summarize something for me. And most of the web sites I visit summarize those summaries.

What seems to be happening is what I described in a post called “Wandering the Web Stacks“: I cultivate a few feeds, aggregations of abstractions of ideas, and when I’m done walking through them, I feel as though there’s nothing left to read online.

* * *

These feeds aren’t always pernicious. Sometimes they draw me down a rabbit hole, down a few levels of abstraction, until suddenly I’m consuming huge volumes on a single topic. That topic then becomes one of the options I can cash in to fend off boredom.

I think it’s worth reminding myself that drawing me down rabbit holes is what those feeds are for.

Futurizing

“The Record”

Data storage is now cheap enough and file transfers fast enough that you could walk around with a voice recorder that was always on, periodically shuttling your audio to a server when the local drive started to fill up.

On its own this already has serious implications. You’d never have to worry about missing a great line or joke; you could replay and dissect every subtle argument; no conversation would be forgotten.

But imagine if this capability were combined with fast accurate speech recognition. For one, the plain text that such a gizmo might produce would be trivial to store and share. But more importantly, your speech would be searchable, skimmable, excerptable, etc., just like any other kind of text.

That’s where things get crazy. Imagine being able to perform the conversational equivalent of highly targeted keyword searches—“What are two guys in Arizona saying about ‘X’?” Think of advertising and media; think of the law; think of the bubbling up of ideas and jokes and people. Whole debates might be pruned and published; quips that get enough of a laugh could automatically be tweeted; meetings minutes and class notes and informal colloquies would all be broadcast and indexed and archived. You could get famous just for being well-spoken.

The name-tag scenario

In Season 5, Episode 7 of Seinfeld—“The Non-Fat Yogurt”—Elaine, shooting the bull with her boyfriend Lloyd Braun, suggests that “everybody should wear name tags all the time—to make the city friendlier.” “It would be like a small town,” she says.

Later in the episode we discover that Braun has shared Elaine’s proposal with his boss, the sitting mayor, who essentially torpedos his bid for reelection by making the name tags a key part of his platform.

Silliness aside, it’s quite likely that someday soon everyone’s name will be visible, in one form or another, to everyone else. Maybe we’ll walk around with Terminator-style heads-up displays; maybe our smartphones will show us the Facebook profiles of everyone within thirty feet; maybe we’ll all have state-issued wristbands.

If something like that does happen, what will become of names? Will I still perk up whenever I hear “James,” or will the sound wear out? Will it be kosher to even use a stranger’s name, or will new norms for introductions evolve? Will people be more empathetic, or less? Will we just seamlessly adapt?

E-mails of note?

Letters of Note is one of my favorite blogs. But it makes me wonder what will become of our correspondence, now that it’s all electronic. On the one hand it’s infinitely more accessible—neatly typed, frequently copied, and keyword-searchable. But all the good stuff is buried in tens of thousands of useless ads and notices, and what’s worse, siloed in password-protected inboxes hosted by companies that might very well dissolve.

What a shame it would be if our progeny couldn’t read our intimate exchanges the way I did Feynman’s, or Joyce’s, or Lewis Carroll’s. And what a thrill it would be if they had, say, J.D. Salinger’s gmail password.

“technology,” expectations, pseudoreading

“Technology”

On the one hand the way the word “technology” is used is overly narrow. It refers essentially to electronic gizmos when it could refer to any kind of artcraft, including bookshelves, water bottles, RAID arrays, Markdown, magazines, keyboards, LEDs, pizza boxes, windowing systems, driveways, Bayes’ rule, door handles, languages, Occam’s razor, software, pens, lists, and so on.

On the other hand, even in its current guise—as a gloss for “computer shit”—the word is overly broad. To say that we should “bring technology into the classroom” is to make a million different proposals all at once. To “cover tech” is to write about anything new and nerdy.

What role, then, is the word “technology” playing in the language game?

Low expectations

Disappointments abound: first anythings with a girl, your own fiction, new people. Everything’s better when you set a lower bar. Example: read negative reviews before seeing a movie—set yourself up just like your friend did when he said he’d arrive late, but came early. Make less rare the rare pleasure of a pleasant surprise.

The question is whether you can hold your expectations down against their will. Can you intend tonight to take the poison tomorrow, knowing what you know about the pain, the reward, and, most importantly, the way out?

Pseudoreading

Normally while reading my eyes and mind work in concert, or at least seem to: as my foveae jump and jounce around the page, the rest of my readerly machinery—everything from my ocular nerve and visual cortex through to the “workspace” of my consciousness and the seat of my “I”—seems to tag along.

Sometimes the whole apparatus goes off task at once, as when I put a book down to make myself a sandwich. But that’s not very interesting. The interesting case is when the various layers of the system unhinge.

That seems to be what’s happening when I daydream in the middle of a sentence. My eyes follow through, happily tracking the rest of the line, even as my mind wanders onto something entirely unrelated. It’s as if all those lower layers continue their sophisticated unconscious work of turning globs of ink into a mental representation of words, only to have the whole lot discarded just before it becomes a conscious thought.

The trick is that when the daydream ends, my focus returns as if it never left. It’s easy not to catch the difference, and so just to read, read, read, to read without reading, without making sure that my mind is along for the ride.