It is dangerous to dismiss all this DNA as useless because we do not understand what it says. The Chinese term 'Shi' can — apparently — have seventy-three different meanings depending on how it is pronounced. It is possible to construct a sentence such as 'The master is fond of licking lion spittle' just by using 'Shi' again and again. This would seem like empty repetition to those who cannot speak Chinese.
Much of the inherited landscape is littered with the corpses of abandoned genes, sometimes the.same one again and again. The DNA sequences ol these 'pseudo^eiies' look rather like that of their function.il relatives, Inn arc riddled with decay and no longer make anything. At some time in their history a crucial part of the machinery was damaged. Since then they have been rusting. Oddly enough, the same pseudogenes may turn up at several points along the journey.
After many miles of dull and repetitive DNA terrain, we begin to see places where some product is made. These are the functional genes. They, too, have some surprises in their structure. Each can be recognised by the order of the letters in the DNA alphabet, which start to read in words of three letters written in the genetic code, as a hint that it could produce a protein. In most cases there are few clues about what its product does, although its structure can be deduced (and its shape inferred) from the order of its DNA letters.
Most genes are arranged in groups that make related products, with about a thousand of these 'gene families' altogether. One is involved in the manufacture of the red pigment of the blood. Most of the DNA in the bone-marrow cells which produce the red cells of the blood is switched off. One small group of genes is hard at work. As a result they are better known than any other. Much of human molecular biology grew from research on thisparticular genetic industrial centre, the giobin genes.
They have two factories. One is halfway along the genetic road to John o'Groat's — in Leeds. It makes one part of the protein involved in carrying oxygen. The beta-globin industrial estate contains about half a dozen sections of DNA that code for related things. That responsible for part of adult haemoglobin (and involved, when it goes wrong, in sickle-cell disease) is quite small: about three feet long on this map's scale. A few feet away is another one which makes a giobin found in the embryo. Close to that is the decayed hulk ot some equipment which stopped working years ago. The beta-globin factory covers about a hundred feet altogether, most of which seems to he unused space between functional genes. It co-operates with a sister estate, the alpha-globin unit, a long way away, (near London, on this mythical map) which produces a related protein. When joined together, the two products make the red blood pigment itself. Most genes are arranged in families, either close together or scattered all over the genome.
The map of ourselves shows that genes are of very different size, from about five hundred letters long to more than two million. One makes the largest known protein, titin, a molecular shock-absorber; a long, pleated structure found in muscles, in blood cells and in chromosomes. Whatever the size of its product, titin is by no means the largest gene. Most human genes have their functional segments interrupted by lengths of non-coding DNA — in Huntington's disease, for example, by nearly seventy In many genes (such as the one which goes wrong in muscular dystrophy) the great majority of the DNA codes for nothing. The non-coding material, whose importance varies greatly from gene to gene, participates in the first part of the production process, but this segment of the genetic alphabet is snipped out of the message before the protein is assembled. This seems an odd way to go about things, but it is the one which evolution has come up with.
The general picture began to emerge as soon as the mappers began work. In the year 2000 — almost exactly a century after the rediscovery of Mendel's rules — their labours were, in effect, complete and the whole human gene sequence was laid our in all its tedium before a less than startled world. Three thousand million letters (or, as now it appears, slightly more) is a lot. For accuracy, each section had to be sequenced ten times or more and even at a thousand DNA bases a second (which is what the machinery pumps out) that was not easy. Sixteen centres, in France, Japan, Germany, China, Britain and the United States combined to do the job. Most were funded by governments or charities, with the notorious exception of the Celera Genomics company (their motto: 'Discovery Can't Wait!1), whose head defected from a government programme. Advances in technology reduced the original estimate of three billion dollars by ten times which, for a project — described by President Clinton as the most wondrous map ever produced — with far more scientific weight than the Moon landings, was a remarkable bargain. For much of the time, the private and public sectors were at daggers drawn (vividly illustrated by Celera's description of the director of one public laboratory behaving as if he had been bitten by a rabid dog).
Because (as so often in science) much of the effort lies not in obtaining information, but in making sense of it, a shotgun marriage between the rivals was, at the last moment, arranged. To 'annotate' the genome — to work out just what the newly-sequenced genes do — was a task so formidable that it demanded the use of one of the world's most powerful supercomputers.
From the mass of data, the small segments that code for proteins and the even smaller sections that act as the on-off switches for the working genes, are picked out (which is where the computing power comes in). Fortunately, many human genes look rather like those in fruit flies, yeast and nematode worms (all of which have been sequenced) and a massive comparison of each length of human DNA with those in other creatures points at common segments that must, presumably, represent working genes. However, some useless sections disguise themselves as valuable by chance and some pieces have been defined as working genes — again, perhaps wrongly — only by a slight shift in the ratio of particular pairs of bases. As a result, and even with the complete DNA sequence, the precise number of genes needed to make;i human being remains, and will remain, uncertain (although most of the researchers guess at a figure of fifty thousand or so).
Even when the functional segments are found, the task of understanding what they do has only just begun. The complete sequence, hailed by Presidents (and Prime Ministers) though it might be, is little more than an arbitrary step on the road to understanding.
Even so, the genome has revealed if not its secrets, at least its structure; and that is remarkable enough. Take, as an instance, chromosome 22; as the smallest of the twenty-three pairs, the Rutland of the genome, and, in the last weeks of the twentieth century, the first to have its entire sequence established. It was mapped with an error rate of fewer than one in fifty thousand bases and just a few short gaps. Apart from its small size {at thirty-three and a half million bases it represents a hundredth of the whole sequence) it is an unremarkable chromosome, quite representative of its larger cousins (the biggest of which, chromosome i, is eight times longer).
Before the global map, a few scattered genes had been Tracked to chromosome 2.2.. They included genes for, among others, a rare disease that causes heart problems and facial distortion, a birth defect called 'cat's eye syndrome' and genes involved in a severe disease of nerve degeneration.