We’re in the process of discovering just how we humans evolved from the chimpanzee in molecular detail. This knowledge is almost blasphemous (the work of creation) and should be widely understood. You don’t need to know the chemistry of molecular biology at all to understand it, if I’ve written the post correctly. It is simply too important (and beautiful in a sense) to keep locked up in the minds of people who’ve spent most of their adult life studying the chemistry and biochemistry of life.
Papers on human evolution use terms like selective pressure, force of evolution giving it quasi-conscious properties (now called agency in better circles). You might as well substitute God for these terms. Your choice. We are now beginning to understand how these terms are brought about on a molecular level, but they can be understood on a more abstract level with almost no chemistry at all which is what this post is about.
First, what is natural selection. Darwin’s great book of 1859, “The Origin of Species”, is full of letters back and forth with animal breeders (cattle, horses and even pigeons) and their attempts to improve them for human utility (more meat, more speed, prettier pigeons). He called this selection. He thought that species arose and changed by similar activity in nature — which he called Natural selection.
Parenthetically, it is worth reading Darwin’s book in the original. You know a lot more than he did, and it is humbling to watch Darwin’s powerful mind dealing with the fragmentary and limited information available to him back then. If you have the time, I suggest that you read the Darwin’s 1859 book chapter by chapter along with a very interesting book — Darwin’s Ghost by Steve Jones (published in 1999) which updates Darwin’s book to contemporary thinking and knowledge chapter by chapter. Despite the advances in knowledge in 166 years, Darwin’s thinking beats Jones hands down chapter by chapter.
There are 3 main players making us what we are — proteins, DNA and RNA.
Proteins are what you see when you look in the mirror. DNA and RNA lurk behind the scene. A protein is just a chain of lagos strung together. The lagos come in twenty different varieties, each with different shapes and (chemical) properties. Individual lagos are known as amino acids, and all twenty have names. Examples are glycine, tryptophan, phenylalanine and 17 more. Most proteins are a single chain of 100 or more lagos strung together all in a row. The exact type of which lago goes in each position is crucial. Sickle cell anemia is due to a change in just one lago in the 7th position of hemoglobin (which has 147 lagos in its chain). Normally the glutamic acid lago is there, but sickle hemoglobin has lago valine in position 7, the other 146 lagos being identical in both.
DNA and RNA are simpler than proteins. Their elements (called nucleotides) come in 4 varieties whose names are abbreviated to A, C, G, T in English for DNA. RNA also has 4 elements, so close chemically that you can regard their 4 elements as A, C, G, T written in Cyrillic (and I’ll use the same 4 letters for both, capitalizing them when they occur in RNA). Again, they form linear strings, just as words, sentences, paragraphs etc. etc. are linear strings of alphabetic characters.
Our genetic material (genome) is found in chromosomes which are enormous linear strings of letters (nucleotides). Our largest chromosome (#1) has 249,000,000 nucleotides all in a row. Our smallest (the Y chromosome has 62,000,000). The total number of nucleotides in our 46 chromosomes is over 3 billion. It’s hard to get your mind around a number like that. The 7 Harry Potter books contain about 1 million words. Figuring 5 letters (nucleotides) per word that’s 5 million which is just under 1,000 copies of the 7 books for the 3 billion letters sitting in each of our cell’s DNA
The order of the letters is crucial in chromosomes, just as it is in words (consider united and untied).
There are 16 possible combinations of 2 of the 4 letters AA, AC, AG, AT, CA, CC, CG, CT, GA, GC, GG, GT, TA, TC, TG, TT. This isn’t enough to code for 20 lagos (amino acids), so the genetic code uses groups 3 nucleotides (called triplets) to code for the 20 amino acids. So there are 64 possible nucleotide triplets for our 20 lagos leading to redundancy.
Now its a long story from getting from a nucleotide sequence of triplets in our DNA to actually stringing together the lagos to form a protein involving all sorts of beautiful and intricate molecular machines which I’ll have to skip.
Going back to hemoglobin and its 147 lagos. It is coded by 147 x 3 = 441 letters all in a row. The change from glutamic acid at position 7 in the gene for normal hemoglobin (coded for by GAG) to valine (coded for by GTC) is even smaller. This is what a mutation actually is, a change in the sequence of nucleotides in our genome.
We now know the complete sequence of letters in our DNA (genome). We also know how to look for portions of our DNA coding for our proteins (of which we have about 20,000) It came as a huge surprise that the letters coding for the 20,000 accounted for only 2% of the 3 billion letters in our genome.
In an earlier more hubristic era, the 98% of genome not coding for protein was called junk. Now we know better. As I said earlier, the proteins making us up can be considered the ‘bricks’ making us up and we are a collection of them.
We also have the complete sequence of letters (nucleotides) making up the chimpanzee genome (and the gorilla and two types of orangutan). This was recent. Long before we could sequence DNA we knew that human and chimpanzee proteins were 98% identical (in the lago sequence coding for them). This is the origin of the idea that we are 98% chimpanzee.
The facts are correct, the interpretation wrong. We are far more than the protein bricks that make us up.
This is like saying Monticello and Independence Hall are just the same because they’re both made out of bricks. One could chemically identify Monticello bricks as coming from the Virginia piedmont, and Independence Hall bricks coming from the red clay of New Jersey, but the real difference between the buildings is their plans.
It’s not the proteins, but where and when and how much of them are made. The control for this (plan if you will) lies outside the genes coding for the proteins themselves, in other 98% of the genome.
It’s been a long conceptual slog getting to this point, so get up and stretch, HARs, the main point is coming up.
The obvious way to find the plan is to lay the human and the chimp genome sequences out side by side and look for areas that have changed (mutated) the most between us and the chimp and not between the chimp and other animals (such as the chicken which has also been sequenced). These areas are called Human Accelerated Regions (HARs). An early example (2006) was in a sequence of 118 nucleotides which had 18 changes (mutations) between us and chimp but only two between chimp and chicken.
We have some 2,772 HARs in our genome averaging 269 nucleotides long. When you think about it, that isn’t very much change. 2,772 x 269 nucleotides is just 745,668 positions in our 3,200,000,000 genome or averaging less than one change every 3,000 positions. But the point of HARs is that the changes aren’t averaged within them, but lumped together.
All 2,772 HARS have occurred in the 5 to 12 million years since the last common ancestor of the human and the chimp. We are literally looking at what makes us human, which is why it might be considered blasphemous — reducing human creation to 745,668 mutations.
Well, we all knew that our proteins and the chimps are pretty much the same, so it should come as no surprise that most (96%) HARs occur in regions of the genome not coding for protein.
We are just starting to look at what the HARs do. Many of them involve increasing the number of brain cells (neurons) formed during development. This is what makes it a privilege to be alive now. Research on what the HARs do and how they do it will be coming thick and fast. One mechanism is known: binding of a HAR segment of the genome to a protein gene segment of the gene can turn it on (e.g. cause protein to be made by cellular machinery that I haven’t discussed) or turn it off.
Discussing exactly how HAR binding turns off/on proteins would require discussing induced pluripotent stem cells, promoters, transcription factors, enhancers, long coding RNA etc. etc. This is fascinating stuff and the subject of decades of work and thousands researchers’ careers, but discussing it would detract from thinking about the inner logic of the chemistry and biochemistry producing the plan, an abstract nonphysical object. Also it would take way too long. So I’m going to avoid describing how the spirit (the ‘plan’) is made flesh (the proteins that carry out the plan).
HARs target many genes for proteins important in nerve cell (neuron) function. Controlling how much of a protein is made is certainly part of the plan and it is exciting to think HARs are controlling what makes us so different from the chimp — the proteins in our brains.
“Sneaking a look at God’s Cards” is a book about quantum mechanics. That’s what the human enterprise is doing with HARs.