the jsomers.net blog.

By inspection

Mathematicians will sometimes say that a result follows “by inspection,” meaning roughly that its proof is so straightforward that to work it out would almost be patronizing. [1]

It’s a useful phrase to have in mind, and it’s led me to a somewhat strange idea—that science, and I mean this quite generally, has developed as though one of its goals was in fact to make more and more things provable that way, “by inspection.”

Consider, for example, how thousands of years ago one needed all kinds of clever arguments to convince someone that the Earth was round, whereas today you can just send them to space and have them look. Or how a student on his laptop can now brute-force a problem that once required tricky mathematics. Or how IBM can take 3D pictures of atomic bonds whose structure was always deduced, never seen.

New technologies, then, seem to serve twin purposes: one, to generate more inputs to an inferential model, like telescopes do for cosmologists and accelerators do for physicists, and two, to replace these models with simple pictures of the truth.

So we can imagine a future in which doctors see diseases, rather than teasing them out based on patients’ symptoms or history; where chain-termination DNA sequencing is replaced by high-resolution radiography; where we explore the deep oceans instead of just guessing what’s there.

Think of how J. Craig Venter’s team discovered tens of thousands of new organisms in a barrel of seawater that marine microbiologists said would be empty. [2] The difference was that he had tools—for rapid DNA sequencing, mostly—that they didn’t. He could see more. [3]

* * *

Footnotes

[1] Example: suppose we’re playing a game of chess, and we inform our opponent—we claim—that he has been checkmated. We could be highly rigorous and say, “in particular, you cannot move here because of X, nor here because of Y, nor here…”, but we would be insulting his intelligence. So instead we just leave it to him to inspect each of the possible moves until he sees, quite plainly, that there is no way out.

[2] See p. 40 of this awesome book of conversations / talks about life (think biology, not philosophy) [PDF, via edge.org].

[3] In the same way, individual minds have a way of moving from slow, heavily (and cleverly) deductive inferential procedures to a kind of rapid inspection. That is, when you learn a great deal about something, you are in some sense just building highly defined manipulable mental objects, ones that you can look at, bend, and combine at will. It’s like how expert chess players, instead of meticulously playing out lines, “see the whole board” and fluidly explore it, or how expert coders feel like they can “hold the program in their head.”

This ability is hard-earned. Which is why it’s probably not a good idea to, say, forget about mathematics because we can now compute so quickly, or rely on imaging machines at the expense of clinical expertise. If we look for shortcuts we’ll likely get lost.

But it’s worth recognizing the power of richer mental pictures—that they are worth more than just the work put in to build them.

The Snickers trick

Not counting beverages, the four most popular products sold in American vending machines are, in order: the Snickers, peanut M&Ms, Doritos, and Cheetos.

While nos. 2-4 may be unexpected, it sort of goes without saying that the number one bestselling vending machine product—and most popular chocolate bar of all time—would be the Snickers. That nutty, nougaty, pleasantly plenteous Snickers. Anyone could have guessed that.

So why, in spite its utter dominance among snack foods, do vending machine operators tend to—and I’m not making this up—under-stock that particular treat? That is, why do they supply their machines with fewer Snickers than consumers demand?

The Snickers bar drives people to vending machines like no other snack: thanks in large part to some truly excellent advertising by Masterfoods USA, hungry people who need food right now think Snickers. And they know where to find it.

But—and here’s the key to the whole operation—the Snickers bar actually has a relatively slim margin: for the operator, a Snickers brings in less profit than even the gum and mints that sit on the lowest shelves of the machine.

Hence the under-stocking. Hungry people who go to a vending machine in search of a Snickers won’t just leave if there are none left—they’ll buy something else. In particular, they’ll buy something with a higher margin, like the cookies. (Machine operators clean up on those cookies.)

Which means it’s actually in the operator’s interest to stock the machines with just enough Snickers to get people thinking they could find one there, but not so many that they always do. It’s their tricky way of driving sales to more profitable substitutes.

Update: see Sharon T.’s comment below for clarification. (Sharon was my source for this story.)

Offline Wikipedia

This very cool and prolific hacker, Thanassis Tsiodras, wrote an article a few years back explaining how to build a fast offline reader for Wikipedia. I have it running on my machine right now, and it’s awesome.

The trouble is, his instructions don’t quite get the job done; they require modification in some key places before everything will actually work, at least on my Macbook Pro (Intel) running OS X Leopard. So I thought I’d do a slightly friendlier step-by-step with all the missing bits filled in.

Requirements

  • About 15GB of free space. This is mostly for the Wikipedia articles themselves.
  • 5-6 hours. The part that takes longest is partially decompressing and then indexing the bzipped Wikipedia files.
  • Mac OS X Developer tools, with the optional Unix command line stuff. For example, you need to have gcc to get this running. These can be found on one of the Mac OS X installation DVDs.

Laying the groundwork

  1. Get the Mac OS X Developer tools, including the Unix command line tools. If you don’t have these installed, nothing else will work. Luckily, these are easily installable off of the Mac OS X system DVDs that came with your computer. Head here if you’ve lost the DVDs to download the files; you’ll need to set up an ADC (Apple Developer Connection) account (free) to actually start the download.
  2. Get the latest copy of Wikipedia! Look for the “enwiki-latest-pages-articles.xml.bz2″ link on this page. It’s a big file, so be prepared to wait. Do not expand this, since the whole point of Thanassis’s tool is that you can use the compressed version.
  3. Get the actual “offline wikipedia” package (download it here). This has all of custom code that Thanassis wrote to glue the whole thing together.
  4. Get Xapian, which is used to do the indexing (download it here).
  5. Set up Xapian. After you’ve expanded the tar.gz file, cd into the newly created xapian-core-1.0.xx directory. Like every other *nixy package, type sudo ./configure, sudo make, and sudo make install to get it cooking.
  6. Get Django, which is used as the local web server. You can follow their quick & easy instructions for getting that set up here.
  7. Get the “mediawiki_sa_” parser/renderer here. To expand that particular file you’ll need the 7z unzipper, which you can download here.
  8. Get LaTeX for Mac here.

Building it

Once you have that ridiculously large set of software tools all set up on your computer, you should be ready to configure and build the Wikipedia reader.

The first thing you’ll need to do is to move the still-compressed enwiki-latest-pages-articles.xml.bz2 file into your offline.wikipedia/wiki-splits directory.

But you have to make sure to tell the offline.wikipedia Makefile what you’ve done, so open up offline.wikipedia/Makefile in your favorite text editor and change the XMLBZ2 = ... top line so that it reads “XMLBZ2 = enwiki-latest-pages-articles.xml.bz2“.

Next, take that parser/renderer you downloaded and expanded in step 7 above, and move it into the offline.wikipedia directory.

Again, you have to tell the Makefile what you’ve done–so open it up again (offline.wikipedia/Makefile) and delete the line (#21) that starts

@svn co http://fslab.de/svn/wpofflineclient/trunk/mediawiki\_sa/ mediawiki_sa...

You don’t need that anymore (and it wouldn’t have worked anyway).

With that little bit of tweaking, you should be able to successfully build the reader. Type sudo make in the offline.wikipedia directory. You should see some output indicating that you’re “Splitting” and “Indexing.” The indexing will take a few hours, so at this point you ought to get a cup of coffee or some such.

Finishing up and fixing the math renderer

Even though the program will tell you that your offline version of Wikipedia is ready to run, it probably isn’t. There are a couple of little settings you need to change before it will work. (Although I’d give it a try first!)

For one, you may need to change line 64 in offline.wikipedia/mywiki/show.pl: just replace php5 with php. Once you do that, you should be able to load Wikipedia pages without a hitch–which is to say, you’re basically done.

(If it doesn’t work at this point, first carefully read the error message that you’re seeing in the server console, and failing that, add a comment below and I’ll try to help you out).

Trouble is, the mathematics rendering will probably be broken. That might not matter for your particular purposes, but if you plan to read any math at all, it’s definitely something you’ll need.

What you have to do is recompile the texvc binary that’s sitting in offline.wikipedia/mediawiki_sa/math. But first, you’ll need a few more programs:

  1. Get an OCaml binary here.
  2. You need ImageMagick. This is an extremely large pain in the ass to install, unless you have MacPorts. So:
    • Get MacPorts here.
    • Once that’s installed, all you need to do is type sudo port install ImageMagick. Boom!

When that’s all ready, head to the offline.wikipedia/mediawiki_sa/math directory. Then, delete all the files ending in .cmi or .cmx. Those are the by-products of the first compilation, and they can’t be there when you run it again.

All you have to do now is type sudo make. If all goes well it should finish without error and you should have a working TeX renderer. Just to make sure, type ./texvc. If you don’t see any errors or output, you’re in good shape.

Finally, I’ve styled my version of the reader up a bit (out of the box it’s a little ugly). If you’d like to do the same, open up offline.wikipedia/mywiki/show.pl and add the following lines underneath the </form> tag on line 98:

   <style type="text/css">
        body {
            font-family: Verdana;
            font-size: 12.23px;
            line-height: 1.5em;
        }
        a {
            color: #1166bb;
            text-decoration: none;
        }
        a:hover {
            border-bottom: 1px solid #1166bb;
        }
        .reference {
            font-size: 7.12px;
        }
        .references {
            font-size: 10.12px;
        }
    </style>

Nothing too fancy–just some prettier type.

You’re done!

Now you should have a working version of the offline wikipedia. It’s incredible: in a bus, plane, train, car, basement, Starbucks (where they make you pay for Internet), classroom, cabin, or mountaintop, you can still have access to the world’s best encyclopedia.

The wrong way to search Google in Firefox

The default Firefox toolbar looks like this:

default-wrong

Notice how the “quick search” box on the right uses Google. This seems convenient, because it’s giving you easy access (⌘+K) to the world’s most powerful search engine, until you realize that the address bar already does this. If you type words in there instead of a URL, it’ll automatically direct your query to Google.

But Firefox obfuscates that fact by putting the Google engine in the quick search box, with two pernicious effects:

  1. Users assume they have to use that for their googling, and so may never learn about all the cool things the address bar can do.
  2. Users lose the option of having another more targeted search engine—I use Wikipedia—because they think it’ll be at the expense of Google. But of course you can use both: Google in the address bar (⌘+L) and Wikipedia as the quick search (⌘+K).

This wouldn’t irritate me so much if it weren’t for the fact that every time someone else uses my computer, they remove my Wikipedia engine and re-select Google.

Eighty years without flippers

According to the excellent short documentary, Tilt: The Battle to Save Pinball, 1947 was about the biggest year in the game’s history. Why? Because that’s when they introduced flippers:

STEVE KORDEK (PINBALL DESIGNER 1948-1999): There were a lot of other things that happened earlier, like lights and sound effects, picking up the ball out of holes, and then of course at that time, the introduction of the tilt, which Harry Williams was responsible for. But, the biggest change in the entire industry was the introduction of the flippers. Because it made all other games that were manufactured before then obsolete. Absolutely obsolete.

NARRATOR: For nearly eighty years, the game of pinball consisted entirely of pulling back a plunger and watching a ball fall into a hole. With the addition of flippers, pinball changed from a game of luck into a game of skill.

This is about as absurd to me as the fact that basketball, in its very early days, was played with an intact peach basket, and each time a player scored, someone had to climb up some stairs to retrieve the ball. I’m not sure how long this went on before they decided to cut out the bottom, but god help us if it was more than a few minutes.

Of course it’s worth asking whether there are peach-basket bottoms that we have yet to cut—cases where we are, so to speak, still playing pinball without flippers.

I can conjure up a few candidates:

  1. We use two different machines to do our laundry. Even if there were some principled reason to separate washers and dryers (and I doubt that; cf. the dishwasher), the least we could do is invent a trap door, pulley system, or robotic arm to get the clothes from one to the other.
  2. This is not an insight of mine, but it bears repeating: the fact that we now use computers mostly to share media will in time look hopelessly primitive. So will the fact that most of today’s code is written manually by human programmers.
  3. There are no longer technical barriers to digitizing the full text of every book and making it easily (if not freely) available. Yet I still sometimes have to ruffle through hundreds of pages to find an old excerpt or annotation, and I can do that only if I have a copy of the book nearby.
  4. We clean up after going “number two” by taking several passes back there with a loosely constructed mitten of thin, dry paper, which we then inspect for poopiness as a way of knowing when to stop.

Can you think of any others?