I should have loved biology

By James Somers

I should have loved biology but I found it to be a lifeless recitation of names: the Golgi apparatus and the Krebs cycle; mitosis, meiosis; DNA, RNA, mRNA, tRNA.

In the textbooks, astonishing facts were presented without astonishment. Someone probably told me that every cell in my body has the same DNA. But no one shook me by the shoulders, saying how crazy that was. I needed Lewis Thomas, who wrote in The Medusa and the Snail:

For the real amazement, if you wish to be amazed, is this process. You start out as a single cell derived from the coupling of a sperm and an egg; this divides in two, then four, then eight, and so on, and at a certain stage there emerges a single cell which has as all its progeny the human brain. The mere existence of such a cell should be one of the great astonishments of the earth. People ought to be walking around all day, all through their waking hours calling to each other in endless wonderment, talking of nothing except that cell.

I wish my high school biology teacher had asked the class how an embryo could possibly differentiate—and then paused to let us really think about it. The whole subject is in the answer to that question. A chemical gradient in the embryonic fluid is enough of a signal to slightly alter the gene expression program of some cells, not others; now the embryo knows “up” from “down”; cells at one end begin producing different proteins than cells at the other, and these, in turn, release more refined chemical signals; ...; soon, you have brain cells and foot cells.

How come we memorized chemical formulas but didn’t talk about that? It was only in college, when I read Douglas Hofstadter’s Gödel, Escher, Bach, that I came to understand cells as recursively self-modifying programs. The language alone was evocative. It suggested that the embryo—DNA making RNA, RNA making protein, protein regulating the transcription of DNA into RNA—was like a small Lisp program, with macros begetting macros begetting macros, the source code containing within it all of the instructions required for life on Earth. Could anything more interesting be imagined?

Someone should have said this to me:

Imagine a flashy spaceship lands in your backyard. The door opens and you are invited to investigate everything to see what you can learn. The technology is clearly millions of years beyond what we can make.

This is biology.

–Bert Hubert, “Our Amazing Immune System”

In biology class, biology wasn’t presented as a quest for the secrets of life. The textbooks wrung out the questing. We were nowhere acquainted with real biologists, the real questions they had, the real experiments they did to answer them. We were just given their conclusions.

The Roche Biochemical Pathways Poster
Plans for an alien machine, in Contact

For instance I never learned that a man named Oswald Avery, in the 1940s, puzzled over two cultures of Streptococcus bacteria. One had a rough texture when grown in a dish; the other was smooth, and glistened. Avery noticed that when he mixed the smooth strain with the rough strain, every generation after was smooth, too. Heredity in a dish. What made it work? This was one of the most exciting mysteries of the time—in fact of all time.

Most experts thought that protein was somehow responsible, that traits were encoded soupily, via differing concentrations of chemicals. Avery suspected a role for nucleic acid. So, he did an experiment, one we could have replicated on our benches in school. Using just a centrifuge, water, detergent, and acid, he purified nucleic acid from his smooth strep culture. Precipitated with alcohol, it became fibrous. He added a tiny bit of it to the rough culture, and lo, that culture became smooth in the following generations. This fibrous stuff, then, was “the transforming principle”—the long-sought agent of heredity. Avery’s experiment set off a frenzy of work that, a decade later, ended in the discovery of the double helix.

In his “Mathematician’s Lament,” Paul Lockhart describes how school cheapens mathematics by robbing us of the questions. We’re not just asked, hey, how much of the triangle takes up the box?

That’s a puzzle we might delight in. (If you drop a vertical from the top of the triangle, you end up with two rectangles cut in half; you discover that the area inside the triangle is equal to the area outside.) Instead, we’re told that if you ever find yourself wanting the area of a triangle, here’s the procedure:

Biology is like that, but worse because it’s a messier subject. The facts seem extra arbitrary. We’re told to distinguish “lipid bilayers” from “endoplasmic reticula” without understanding why we care about either in the first place.

Enormous subjects are best approached in thin, deep slices. I discovered this when first learning how to program. The textbooks never worked; it all only started to click when I started to do little projects for myself. The project wasn’t just motivation but an organizing principle, a magnet to arrange the random iron filings I picked up along the way. I’d care to learn about some abstract concept, like “memoization,” because I needed it to solve my problem; and these concepts would lose their abstractness in the light of my example.

Biology is no different. Learning begins with questions. How do embryos differentiate? Why are my eyes blue? How does a hamster turn cheese into muscle? Why does the coronavirus make some people much sicker than others?

*

A few months ago, I started a magazine assignment to answer some questions about SARS-CoV-2 and the immune system. I encountered paragraphs like this:

In low-MOI infections (MOI, 0.2), exogenous expression of ACE2 enabled SARS-CoV-2 to replicate and comprise ~54% of the total reads mapping more than 300x coverage across the ~30-kb genome (Figures 1A and 1B). Western blot analyses corroborated these RNA-seq data… It is noteworthy that, despite this dramatic increase in viral load, we observed neither activation of TBK1, the kinase responsible for IFN-I and IFN-III expression, nor induction of STAT1 and MX1, IFN-I-stimulated genes (Figure S1A; Sharma et al., 2003)…

“Imbalanced Host Response to SARS-CoV-2 Drives Development of COVID-19,” Cell

It was hard to get through a sentence without having to consult Wikipedia. In immunology in particular the nomenclature is expansive. One sentence might refer to “leukocytes,” the next to monocytes, the next to lymphocytes. There are a lot of squares-and-rectangles situations: all interleukins are cytokines, but not all cytokines are interleukins?

I’ve never come across a subject so fractal in its complexity. It reminds me of computing that way. A day of programming might involve constructing an elaborate regular expression, investigating a file descriptor leak, debugging a race condition in the application you just wrote, and thinking through the interface of a module. Everywhere you look—the compiler, the shell, the CPU, the DOM—is an abstraction hiding lifetimes of work. Biology is like this, just much, much worse, because living systems aren’t intentionally designed. It’s all a big slop of global mutable state. Control is achieved by upregulating this thing while turning down the promoter of that thing’s repressor. You think you know how something works—like when I thought I had a handle on the neutrophil, an important front-line player in the innate immune system—only to learn that it comes in several flavors, and more are still being discovered, and some of them seem to do the opposite of the ones you thought you knew. Everything in biology is like this. It’s all exceptions to the rule.

But biology, like computing, has a bottom, and the bottom is not abstract. It’s physical. It’s shapes bumping into each other. In fact the great revelation of twentieth-century molecular biology was the coupling of structure to function. An aperiodic crystal that forms paired helices is the natural store of heredity because of its ability to curl up and unwind and double itself with complements. Hemoglobin, the first protein studied in full crystallographic detail, was shown to be an efficient store of energy because of how oxygen atoms snap into its body like Legos, each snap widening the remaining slots, so that it loads itself up practically at a gulp. Most proteins are like this. The ones that drive locomotion twist like little motors; the ones that contract muscles climb and compress each other. Cells, too, are constantly in conversation, and the language they speak is shape. It’s keys entering locks: a protein might straddle the cell membrane, and when a cytokine (that’s a kind of signaling molecule) docks with it, it changes its shape, so that its grip loosens on some other molecule on the interior side of the membrane, as though fumbling a football—that football might be a signal itself, on its way to the nucleus.

I think my understanding of biology was too flow-charty in high school. I knew that DNA → RNA → protein and that this was called “gene expression,” but I was confused on the basics, like, how did genes actually “turn on”? And once they were on, were they on for good? It’s clearer when you think physically. Mammalian DNA isn’t laid out as one long double helix; it’s tightly coiled and coiled again, like this, around little circular proteins called histones:

DNA curled around histones. Image from this Moderna video, at 1:10

The structure of the resulting fiber has an effect on which genes are expressed. This is because the little molecular machine that transcribes DNA into RNA has to actually ride along the helix, and it can only ride along some parts of it, namely the parts that aren’t curled up out of sight. “Expressing” a gene just means that at a given moment, the machine is accessing a specific portion of DNA, resulting in lots of RNA transcripts, resulting in lots of the protein that the gene codes for. Kink the fiber a bit and you change what the machine can see, thus changing the distribution of proteins it produces. You have “reprogrammed” the cell. (There are many ways to control gene expression, maybe the most common being “repressors” that park somewhere on the DNA, physically blocking the transcription machinery.)

One of the workhorse techniques in modern biology, called RNA sequencing, or RNA-seq for short, takes a frozen cell and counts the RNA transcripts inside it. In effect you get a snapshot of all the proteins being expressed at that moment. The result is literally a big table mapping genes to transcript counts. You see that being one kind of cell versus another—or being in one kind of cellular mood versus another, say in health versus disease—is just a matter of having a different distribution across this table. RNA-seq results are often represented as vectors in high-dimensional space, the counts in the table forming the coordinates; cells move through this expression space as they self-regulate and adapt to their environment.

*

How do you develop a physical understanding of biology? I like pictures. One of my favorite books is called The Machinery of Life, by David Goodsell. It’s full of gorgeous hand-drawn illustrations. Here a bacterium’s flagellar motor is shown in context, then zoomed in on in an inset, with a third picture highlighting its functional elements:

What makes the book work is that it’s basically a re-introduction to molecular biology with the following premise: the cell is a very fast and crowded place, full of little machines, most of them protein, which you understand by taking a close look. It does an especially terrific job through insets like the above relating things at different scales. “Imagine your room filled with grains of rice. That will give you an idea of the billion or so cells that make up your fingertip.”

The writing is very good. It somehow gets you imagining the motion of these machines. It’s tempting when thinking about the cellular world to simply miniaturize our own; but at the cellular scale things behave weirdly. Movement is essentially by random diffusion. “The motions and the interactions of biological molecules are completely dominated by the surrounding water molecules… Inside the cell, [a] protein is battered from all sides by water molecules. It bounces back and forth, always at great speed, but takes a long time to get anywhere.”

It turns out that random diffusion is an incredibly slow way to travel large distances, but an incredibly fast way to explore at short distances. Being a protein inside a cell is like being at a crowded house party where it might take an hour to get across the room, but by the time you get there you’ve bumped into everybody six hundred thousand times.

This point is made beautifully in another favorite book of mine, A Computer Scientist’s Guide to Cell Biology, by William W. Cohen:

Molecules that come close to an organelle tend to remain close to it for a while, and brush against it many times—Figure 20 gives some intuitions as to why this is true.

The result of this is that if receptors for a protein p cover even a small fraction of the surface of an organelle, the organelle will be surprisingly efficient at recognizing p. As an example, if only 0.02% of a typical eukaryotic cell’s surface has a receptor for p, the cell will be about half as efficient as if the entire surface were coated with receptors for p.

This is the kind of fact that instantly clarifies how biology could possibly work. “Cell-sized objects thus have a ‘high bandwidth,’” Cohen writes. “They can recognize or absorb hundreds of different chemical signals, even if they are bounded by membranes.”

Cohen’s book is pitched as an attempt to distill what he learned in acquiring a “reading knowledge” of biology—enough to be able to follow along with a paper in Cell. He’s very good at explaining methods: how do biologists know what they know? For a computer scientist, a biologist’s methods can seem insane; the trouble comes from the fact that cells are too small, too numerous, too complex to analyze the way a programmer would, say in a step-by-step debugger. What biologists mostly do is stuff like:

Cohen found, and I have too, that in trying to acquire a reading knowledge of biology it’s almost more useful to study the methods than any individual facts. That’s because the methods are highly conserved across studies. Everybody does Western blots. Everybody does flow cytometry and RNA-seq. You’ll see this stuff in every paper. (Or variations on the same themes: separation, sorting, selection, genetic manipulation.)

So that’s the foundation. Or almost: I have left for last my favorite resource of all, an incredible book called The Eighth Day of Creation: Makers of the Revolution in Biology, by Horace Freeland Judson. Parts of this book were serialized in the New Yorker in the 1970s. It is the Power Broker of biology, a tomic masterwork. It is not just comprehensive—Judson had hundreds of conversations with Francis Crick, with Jacques Monod and François Jacob, with their friends and spouses and colleagues; he read every paper, he read all their letters—but it pulls no punches scientifically. Judson always just describes the real thing.

And he emphasizes wrong turns. For example, before the discovery of tRNA—the adapter molecules that link triplets of RNA bases to the amino acids they code for—there was much confusion. It was widely believed that there had to be some kind of punctuation, because how else would one know where to start transcribing, or how to delimit one codon from the next? Certain mental models were ingrained: a going theory was that RNA formed specially shaped pockets for the different amino acids. The idea was that if you zoomed in on each triplet or quartet or whatever (the scheme was then unknown), it would always form the same unique shape that only one kind of amino acid could fit into. The amino acid chain would be formed right there alongside the RNA strand, using it almost as a mold. This was thought to happen in the nucleus. The idea that protein synthesis happened via an adapter, and that the nucleic acids therefore acted less like a mold than a digital code, more purely information—this was a major surprise.

Sitting on the grass at Woods Hole, Crick was talking about genes and proteins, in particular about his assumption that they were colinear and Benzer and Brenner’s plan to show as much, when Ephrussi took him aback by asking how he knew that amino acids were not put in their primary sequence by something in the cytoplasm. . . . “I don’t think Boris necessarily believed it, but it was an idea he thought wasn’t impossible.”

. . .

Crick also cast his skeptical eye over Watson and Rich’s attempts to build models of RNA. “Of course, you realize that our ideas on that were totally wrong. We thought that RNA had some structure with the twenty cavities, it was that period. Mm-hmm. Unfortunately people have forgotten what it is we didn’t know at the time.”

Put another way, the book gives us a view of science before discovery. It is a practitioner’s view of the subject. It is the opposite of a textbook.

*

Trying to study the immune system has gotten me into a Bret Victor sort of mood, wondering what could be done, or built, to make understanding this subject easier. A few things come to mind:

There are some incredible YouTube explainers. Ninja Nerd Science’s videos on the immune system were a miracle—all delivered by a kid in grad school. He is a genius. What he does so well is what Goodsell, in that Machinery of Life book, does so well, what those famous “Inner Life of a Cell” 3D animations do so well: he helps you “see the unseeable.”

Ninja Nerd Lectures YouTube channel

But I wonder whether it should be easier for regular people to create useful illustrations. Consider how easy it is to write, tooling-wise: on the web, you are only ever one click away from a Markdown-enabled textarea that allows you to create and publish pretty, hyperlinked documents. Anyone with a keyboard can contribute a few sentences to Wikipedia or answer a question on Stack Exchange. Drawing, by contrast, is hard, and animating is at least an order of magnitude harder. And yet these media are essential for understanding biological processes.

So what do we do?

It’s telling that when I was recently on a Zoom with a PhD student who was explaining RNA-seq, he pulled out his iPad Pro and essentially made a Khan Academy lecture as he talked, drawing along the way. These tools need to become more common and cheaper.

But we also need more software like pattern brushes in Adobe Illustrator, BioRender, and CellPAINT to make it un-tedious to draw complex objects. We need more software like Molecular Maya, but simplified even further, à la Victor’s Stop Drawing Dead Fish, to make animating accessible to anyone who can gesture.

Quickly draw an endothelial lining with pattern brushes in Adobe Illustrator
Molecular Maya’s double-stranded DNA kit

Using vector graphics and Undo history, it should be possible to make collaboratively editable images, i.e., images that can be slowly improved as part of a knowledge project like Wikipedia or Stack Exchange.

I want to be able to take a screenshot of the whiteboard in a Ninja Nerd lecture—a big beautiful diagram of the players in the adaptive immune system—and lasso sections of it, linking to sub-diagrams, some filled in by me, some by others, illustrating each of the parts in turn. We should have big, collaboratively edited zoomable “maps”—hierarchical diagrams—that are easy to navigate, work in standard browsers, are embeddable in blog posts, and so on.

Of course we need to teach more people how to draw. It’s an underrated skill. And how to write vividly, as in the wonderful books above.

But biology is uniquely suited to simulation—it’s a world of machines that are too small to see. The trouble is, it requires too much specialized skill to create three-dimensional interactive simulations. We need a toolkit that’s like MockMechanics, or Minecraft, that maybe even is Minecraft, but focused on biology. Or something much better.

It’s no coincidence that Watson and Crick depended for their discovery on a literal physical model that was machined for them specially. Victor’s Dynamicland imagines an immersive collaborative space in which such models can be built—now that we have computers—as quickly as you can have a conversation.

This is exactly what I wanted as I was writing my immune system article. I wanted to conjure models I could play with in my hand. I wanted a museum where I could walk around inside the epithelium during an immune response. I wanted to put ideas into physical space, like on a pinboard—TLRs go here, with the other innate armament; CD4+ T cells are there, in the adaptive world—but I wanted it to be as searchable, copy-pasteable, shareable, and composable as text.

Bret Victor’s vision of dynamic tools for thinking

I think we also need inspiration. There is a romance in biology, as in any other science, that a movie like Good Will Hunting could bring out. We need heroes. Whoever delivers us from this pandemic in the form of a slam dunk vaccine, or a cheap quick reliable test, should become a household name, not for their own glory but for our kids—a Feynman for them to dream about someday becoming.

Reading list

See jsomers.net for more of my writing.