Turned up to eleven: Fair and Balanced

Wednesday, July 10, 2002

Genetics, Mathematical Limits to Understanding, and Engineering the Future (I)

To start a serious thread from yesterday's entry, I want to get into some of the nitty-gritty of what it takes to build serious models of genetic systems, and how this might "scale up" into the type of re-engineering that "Godless Captalist" has suggested is right around the corner. In this, however, I need to be careful, because nay-saying is a dangerous game, in which it is easy to be proven wrong. I am not trying to say that humans will never be able to engineer their genes, (in fact, it is happening right now), but that the obstacles are real and significant, and that the information necessary to make changes in complex traits (i.e. those involving several, or even hundreds of genes) in a rational way may be farther away than is realized by many.

In order to discuss this, we need to assess the difficulties inherent in studying a complex system in a detailed fashion. In order to set up this notion, I will employ an analogy culled from a book I am currently reading, Complexification, written by mathematician and popular science author John L. Casti. In the book, which is about modelling complex dynamical systems, Casti describes a famous unsolved problem in celestial mechanics, the Three-body Problem. Special cases have been solved (as in the preceding web page, which describes a severely restricted special case), but the generalized form is a mystery. Basically, the idea is that the interaction of two masses is well described by Newton's Gravitational Law F=Gm1m2/r^2, but three masses interacting in a single system are not well described by this equation, and as of yet, no good classical description has been found that accurately depicts this type of system. Unfortunately, the universe is made up of many, many objects, and oftentimes they are close enough together to make the Three-body Problem significant (it can be extended to the "Many body Problem"). The reason I bring this up in this context is that this problem seems deceptively simple. After all, orbital mechanics has been studied for something like 400 years (since Kepler). It seems inconceivable that we are unable to formulate a mathematical expression for the motion of three interacting celestial bodies. And yet here we are, 400 years later, with still very limited insight into this issue (at least, those of us without advanced degrees in mathematics or physics are). This is not to say that genetic systems are in some way similar to massive orbital systems, are they?

Well, the answer to that depends on the scale that you look at the problem from. From the population genetics scale (ala GC), the problem seems to be statistical in nature (measuring allele frequencies, probability distributions, etc). All are valid techniques, but limited in predictive and explanatory power. Simply put, population genetics can make very strong predictions of a statistical nature, but only under a number of restrictive assumptions (random breeding within the population is the most obvious one). The biggest problem for any statistical approach, however, is that it cannot assess causality. No statistical test can do more than measure correlation, and no matter how strong the correlation is, it never proves causality. That last statement is axiomatic, i.e. it is fundamental to the very nature of statistical analysis. So this is a strong limitation on statistical analysis of biological systems.

Molecular biology can bring to the table some very strong tools for studying causal relationships. Most importantly, molecular tools in direct experiments allow for direct manipulation of the genetic makeup of the study organism and subsequent discovery of causal linkage. For example, in immunology, the rearrangement of the V(D)J region of the T-cell receptor is catalyzed by two recombinase enzymes, RAG-1 and RAG-2. Now, by looking at people or animals with Severe Combined Immunodeficiency Disease (SCID), and correlating mutations in these two genes with occurence of the disease, you can get a pretty good idea that those mutations would be causative, but that is not proof. Proof (in the biological science sense), is when you breed a mouse that is genetically normal except for a single mutation in the RAG-1 or RAG-2 gene (both alleles, since mice are diploid), and compare that mouse to other mice that are genetically identical except for that locus. When those mice turn out to have SCID, that is proof.

Why is this distinction so important? Well, we can revisit the tale of SCID mice and SCID humans to examine that. While RAG deficient mice are SCID, not all SCID is caused by this defect. In fact, human SCID is a heterogenous disease caused by a number of genetic defects, all of which result in a similar pathology. In order to design therapies (genetic or otherwise), statistical or correlative data are insufficient. The genetic underpinnings of the situation are crucial.

In answer to the question posed above re the Three-body Problem, I will submit the conjecture that complex traits, involving gene regulation by multiple factors, and expression of several genes, is exactly analogous to the Three-body Problem, and shares many of the same difficulties when it comes to prediction of outcomes.