Lecture Notes Paul F. Lurquin Professor Emeritus ...

13 downloads 0 Views 5MB Size Report
Aug 22, 2006 - Where the hell do the likes of us come from, Hans Thomas? ...... Günter Wächtershäuser, the proponent of the iron-sulfur world, is known to have said: “With- ...... (Source: Adapted from Eigen, M., and R. Winkler-Oswatitsch.
A Big History of Life on Earth Lecture Notes

Paul F. Lurquin Professor Emeritus School of Molecular Biosciences Washington State University

i

Dedication

For Lucie Lurquin

ii

Foreword The five chapters included in this book can each be carved into a set of lectures, with approximately two to three lectures per topic. I have used these notes (now updated) for several years in my course on the “Origins of Life” at Washington State University. The course was an elective “capstone” course for seniors, with no prerequisites other than basic biology. I also taught it as an Honors course. Most students taking the course were science and mathematics majors, although business, history, and communications majors with an interest in science also enjoyed it. The five topics are in order: Prebiotic Earth: First organic compounds and the iron-sulfur world The RNA world The origin of structure in physical and biological systems: Order from chaos. Artificial cells The prokaryotic world and first eukaryotes Increasing complexity: From simple eukaryotes to Homo sapiens These notes are based in part on my book The Origins of Life and the Universe, Columbia University Press, 2003. The references at the end of each topic include technical and nontechnical titles. Technical and nontechnical websites are also mentioned. Enjoy!

Cover: Electron micrograph of liposomes (also see topics #2 and #3). Source: Materials Research Center, University of Freiburg, Germany

iii

CHAPTER 1

Prebiotic Earth: First Organic Compounds and the Iron-Sulfur World

Where the hell do the likes of us come from, Hans Thomas? Have you thought about that? —JOSTEIN GAARDER, The Solitaire Mystery (1996)

Earth’s atmosphere 4.3 billion years ago, when the Earth was formed, was very different from the one we know today. There was no oxygen, but other gases were present. In one scenario, these were methane (CH4), water vapor (H2O), nitrogen (N2), ammonia (NH3), hydrogen sulfide (H2S), and carbon dioxide (CO2). Primeval hydrogen (H2) and helium (He), formed at the birth of the universe nearly 14 billion years ago, were disappearing because Earth’s gravity was not strong enough to keep them in the atmosphere. Traces of helium would always be present, however, thanks to the radioactive decay of elements, such as uranium, thorium, and radium, in Earth’s interior. There were also oceans, whose geography we would not recognize today, since plate tectonics has moved the continents around. Volcanic activity contributed water, nitrogen, carbon dioxide, sulfur dioxide, and other gases to the atmosphere. The sight must have been awesome: the sky was red and the Sun looked bluish because nitrogen, which scatters blue light, was not yet the dominant gas in the atmosphere. The atmospheric pressure was ten times higher than it is today. There was of course no life but there was movement; the wind blew clouds and volcanic smoke around while the waves lapped at beaches made of solid rock, not yet sand. High tides took place every three hours and the Moon appeared four times closer than today. The temperature was much higher than now, perhaps as high as 90°C, because of the greenhouse effect caused by atmospheric methane and carbon dioxide. Nevertheless, this temperature was low enough for liquid water to exist. In the absence of oxygen, there was no ozone (O3) layer and Earth was bombarded by hard ultraviolet (UV) radiation from the Sun. Scientists have gained much information concerning Earth’s primitive atmosphere by studying our neighbor planet, Venus, whose atmosphere contains huge amounts of carbon dioxide (96 percent), some nitrogen (3 percent), and some sulfur dioxide (SO2), but no oxygen. However, being closer to the Sun, Venus receives twice as much radiation and heat, which, combined with the enormous greenhouse effect caused by the vast amount of CO2, raises the surface temperature to 460°C. This is where the parallels between Venus and Earth stop: liquid water cannot exist on Venus because it is too hot, so life is impossible on that planet. Mars has the opposite problem: being farther from the Sun, it is too cold for liquid water to exist on the surface. However, its present tenuous atmosphere is very much like Venus’s, without the sulfur dioxide. Conceivably, Venus and Earth originally had similar atmospheres, composed of, in order of decreasing abundance, H2, He, CH4, H2O, N2, NH3, and H2S. What happened to these gases on these planets? Today, Earth’s atmosphere is composed of about 79 percent nitrogen, 21 percent oxygen, and low amounts of a few other gases. Venus also lost many of its original gases. As we have seen, H2 and He quickly escaped into space. The fate of the other gases (except N2) was sealed by the UV radiation from the Sun, which decomposed them through the phenomenon known as photolysis. On Venus, where liquid water did not exist, water vapor was quickly decomposed into hydrogen (which escaped) and free oxygen, which rapidly oxidized CH4 and H2S to CO2 and SO2. The oxidation of NH3 simply produced more N2.

4

On Earth, where liquid water did exist, H2O photolysis was not nearly as pronounced as on Venus because UV-induced water decomposition occurs best in the gas phase. Nevertheless, photolysis of the primeval gases did take place, albeit more slowly, because Earth is farther from the Sun. Thus CH4, NH3, and H2S also gradually disappeared. There was one enormous difference between Earth and Venus, however: thanks to the presence of liquid water, the CO2 and SO2 generated by photolysis dissolved in the oceans, reacted with the minerals present there, and became locked in rocks. This prevented the runaway greenhouse effect that occurred on Venus and made nitrogen the dominant gas. After a few hundred million years, the sky finally turned blue. Before this happened—that is, while Earth’s atmosphere was still reducing (still containing appreciable amounts of H2, CH4, and NH3)— could it be that some of the building blocks of life were somehow assembled in the primeval gases? After all, Earth’s crust and the oceans at that time could not have held much in terms of carbon- and nitrogen-containing compounds that make up living cells. Some scientists do indeed think that the first building blocks of life were synthesized via atmospheric chemistry.

ORGANIC COMPOUNDS FROM EARTH’S PUTATIVE PRIMITIVE ATMOSPHERE Many organic molecules of the types found in living cells can be made in sparked mixtures of some gases. These observations have led to the concept of a prebiotic soup.

Figure 1. Schematic representation of the equipment used by Stanley Miller to produce organic molecules under abiotic conditions. The air from the glass apparatus is first evacuated by pumping. Water and the gases hydrogen, methane, and ammonia are then added to the system. Water is brought to a boil and water vapor starts circulating in the apparatus. Water vapor and the three other gases then fill the large glass chamber at upper right. The chamber is equipped with two electrodes connected to a power source, and sparking is initiated. Water vapor and compounds resulting from the action of sparking are then condensed by the cooling system and accumulate in the U-shaped trap. Compounds harvested from the trap are then analyzed chemically. (Source: Evolution and Religious Creation Myths: How Scientists Respond, Paul F. Lurquin and Linda Stone, Oxford University Press, 2007) In 1953, Stanley Miller, working in the laboratory of Harold Urey (the Nobel laureate who discovered deuterium) at the University of Chicago, published the results of a strange experiment that was based on the assumption that Earth’s atmosphere was once a reducing one. In a reducing atmosphere, free oxygen does not exist, and elements are present in a reduced, hydrogenated form such as CH4, H2O, and NH3, and, of course, H2. Miller knew that Alexander Oparin (from Russia) and J. B. S. Haldane (from England) had suggested several decades earlier that a reducing atmosphere could have allowed the synthesis of organic compounds (composed of carbon, hydrogen, nitrogen, and a few other elements) necessary for life to appear. However, they never did any experiments to test that idea. Miller, then a graduate student, decided to test this hypothesis, apparently with only reluctant support from his boss, Urey (who did not coauthor the article). Miller built an apparatus (figure 1), made of glass, into which he introduced liquid water, hydrogen, methane, and ammonia gases. (The air had previously been pumped out of the system to eliminate oxygen.) To produce water vapor, Miller boiled water held in a flask; this had the further effect of circulating the mixture of gases present in the apparatus. To simulate rain, Miller added a cooled condenser to the system and trapped the condensed water in a U-shaped tube. The condensed water thus represented the 5

primitive ocean. Samples from this tube could be harvested and analyzed for any water-soluble molecules that formed. Finally, the apparatus contained another large glass flask, in which the gases circulated, and which housed two metal electrodes that allowed electrical sparking of the gas mixture. The electric discharge simulated the lightning that undoubtedly must have occurred in the atmosphere (the atmosphere of Jupiter is witness to huge lightning discharges and contains all the gases used by Miller). After several days of cycling the gases and sparking, Miller noticed that the condensed water in the tube had turned pink and subsequently orange-red (figure 2).

Figure 2. The “atmosphere” after several days of sparking. The equipment is in a different configuration in this figure. The four black objects are high voltage generators used for sparking. (Source: Google Images). Clearly, some chemical reaction was taking place, as the original gases were completely colorless. Analysis of the solution revealed the presence of amino acids, the building blocks of proteins! Of the twenty amino acids found in proteins, ten were formed in Miller’s experiments. The chemistry that took place in these experiments is now understood. For example, the simple amino acid

6

glycine results from the condensation of formaldehyde (formed from the sparked gases) with ammonia and hydrogen cyanide (also formed in the gas mixture) to produce the compound aminonitrile. Aminonitrile then reacts with water to form glycine (figure 3).

Figure 3. The two-step synthesis of glycine from formaldehyde, ammonia,and hydrogen cyanide in sparked gasses

In addition to amino acids that make up proteins, gas-discharge experiments have also yielded the four nitrogenous bases, adenine (A), cytosine (C), guanine (G), and uracil (U), the building blocks of ribonucleic acid (RNA), one of the two informational macromolecules used by living systems today. The other informational macromolecule is DNA, deoxyribonucleic acid, which contains thymine (T) instead of U. Adenine for example, results from the condensation of five molecules of hydrogen cyanide (figure 4). (It is unsettling to think that the poison once used to execute prisoners in a gas chamber can lead to the synthesis of some of the building blocks of life!) Finally, many types of sugars were also synthesized in these experiments, including ribose, the sugar found in RNA. These sugars are made through polymerization of formaldehyde, itself formed in the sparked gases. However, ribose is made in small amounts, and it is unclear how it, as opposed to all other sugars, came to form RNA. We will see a possible answer to this problem later. The startling results of Miller’s experiments have led to the notion that Earth’s primitive oceans accumulated more and more of the building blocks of life, amino acids, nitrogenous bases, and sugars, and became some sort of primordial or prebiotic soup. (Instead of the term soup, which suggests a chunky mixture—think about split pea with ham or chicken noodle soup!—I prefer the word broth.) Primordial broths of the Miller type have been replicated by many investigators using similar gas mixtures exposed to short-wave UV light or silent electric discharge, all undoubtedly present on primitive Earth. Interestingly, free oxygen completely interfered with all the reactions just described. Since we know that the oldest rocks found on Earth are not oxidized, this supports the conclusion that, indeed, oxygen was not present in the primeval atmosphere and could not have blocked the chemical reactions observed by Miller and others. Thus the problem seemed solved: the building blocks of proteins and RNA (but not DNA, as thymine was not formed in these reactions) were made in Earth’s reducing atmosphere. But was the problem really solved? Perhaps not. There are indeed difficulties with Miller’s experiments. First, most scientists now concur that Earth’s atmosphere did not contain significant amounts of hydrogen, methane, and ammonia for the length of time sufficient to allow the formation of a primordial broth based on these compounds. For these dissenters, the atmosphere quickly evolved into one containing mostly carbon dioxide and nitrogen (plus water vapor and unreactive gases such as argon and neon), much like the atmospheres of Venus and Mars today. Indeed, it is argued that Earth’s primeval atmosphere (as well as those of Venus and Mars) was blown off as our Sun ignited, about 5 billion years ago. Another contributing factor would have been the violent collision between young Earth and a Mars-sized object that became the Moon. A new atmosphere could then have been generated through volcanic activity. It turns out that volcanoes emit mostly water vapor, carbon dioxide, and nitrogen, together with small amounts of carbon monoxide and 7

hydrogen. No ammonia and methane are emitted by modern volcanoes. Therefore Earth’s primitive atmosphere may have been much less reducing than imagined by Miller.

Figure 4.The synthesis of adenine from the condensation of five hydrogen cyanide molecules. Furthermore, sparking mixtures of nitrogen, water, carbon dioxide, or carbon monoxide “à la Miller” gives somewhat disappointing results, albeit not entirely negative ones. Without hydrogen, CO2 and N2 cannot produce HCN, necessary for the formation of amino acids and nitrogenous bases. If, however, mixtures of CO2, N2, and H2 or mixtures of CO, N2, and H2 are sparked, some amino acids, in reduced numbers and yields, are produced. Thus a strongly reducing atmosphere is necessary to produce the whole variety and large amounts of organic compounds found in a Miller-type of experiment. On the other hand, the nucleic acid base uracil could be produced by sparking mixtures of methane and nitrogen. Where was this methane coming from if it did not exist in the primeval atmosphere? It could have come from hydrothermal vents found on the bottom of the ocean or it could have been delivered by cometary impacts (see later). HCN, so crucial as an intermediate in the synthesis of amino acids and bases, could have originated from the reaction of methane with nitrogen to produce it and hydrogen. This reaction could have been catalyzed by UV light from the Sun. Hydrogen produced by this reaction, plus hydrogen emitted by volcanoes, would have made Earth’s atmosphere mildly reducing. In conclusion, it was thought until the early 2000s that the atmosphere was probably not as reducing as Miller thought, but reduced carbon in the form of CH4 as well as H2 gas were probably present in modest amounts. However, this picture changed in 2005 when researchers recalculated the rate of escape of hydrogen from Earth’s atmosphere. Their premise was that previous estimates of the temperature of prebiotic Earth’s upper atmosphere were wrong. In their opinion, this temperature was much cooler than previously thought. As a result, the rate of escape of hydrogen would have been up to 100 times slower. With hydrogen continuously vented by volcanoes, these authors estimate that the atmosphere could have contained as much as 30% hydrogen for a

8

very long time. Under those conditions, rich atmospheric chemistry would have been possible. As we see, whether the prebiotic atmosphere was not reducing, mildly reducing or strongly reducing is still an open question. But there is more. It was discovered in the early 1960s that simple mixtures of ammonia and hydrogen cyanide in water also form amino acids and large amounts of adenine. This suggested that bringing these three chemicals together somewhere, and not necessarily on Earth, could produce some of the building blocks of life. At about the same time, amino acids, as well as the bases adenine and guanine, were detected in meteorites. Very interestingly, some meteorites also contain amino acids and other organic compounds made in Miller’s experiments. This adds much credence to the fact that these molecules can indeed be synthesized in places other than Earth’s surface under abiotic conditions. However, the mechanisms of formation of these molecules in meteorites are not known. One famous meteorite containing organic compounds is the Murchison meteorite found in Australia. The types of compounds in this meteorite as well as their relative amounts are remarkably similar to those found in Miller’s experiment. Meteorites derive mostly from asteroids present between the orbits of Mars and Jupiter (some rare ones are of lunar or Martian origin). These meteorites were never exposed to any kind of atmosphere, reducing or oxidizing. How then do they contain organic compounds? These compounds would have had to have been made in space, possibly as a result of reactions between water, hydrogen cyanide, methane, and ammonia. Do these chemicals actually exist in space? Yes, they do.

SPACE CHEMISTRY AND THE ORIGINS OF LIFE Space itself, as well as comets and meteorites, contains organic compounds that could have seeded our planet with molecules necessary for life to appear. Large interstellar clouds in which stars and planets are formed do contain molecules presumably necessary for the origin of life, plus many others. Also, water ice is thought to be the most common solid in the universe, and even methanol (CH3OH; wood alcohol) and ethanol (C2H5OH; drinking alcohol) exist in deep space! What is more, these clouds also contain cyanoacetylene (HC3N), a chemical that reacts with urea to form nitrogenous bases found in RNA. Thus gases and particles left over from the protoplanetary disc after its condensation into the planets composing the solar system could have delivered significant amounts of organic materials to Earth’s surface in the form of meteorites and comets. Comets in particular are thought to be aggregates of primordial material much predating the formation of the solar system as we know it today. They are found in two systems orbiting the Sun. The first system is called the Kuiper belt and its members orbit the Sun beyond the orbit of Neptune. The second system, called the Oort cloud, forms a halo, well beyond the orbits of Pluto and other Kuiper belt objects, around the solar system. Once in a while, a comet is knocked off its orbit because of the gravitational effect of the planets or a passing star. The comet then has a chance to approach the Sun at a much closer distance, with the result that some of its material evaporates and forms a visible tail. The tail can be studied and its spectrum recorded, either from Earth or from spacecraft on a rendezvous mission. Results of all studies agree that comets are rich in organic material and water. Comet Halley was even shown to possibly contain the bases found in RNA and DNA. Comets visible from Earth are not that frequent today, but they may have been much more prevalent in the vicinity of Earth a few billion years ago when the solar system was forming. Collisions with young Earth may have been frequent and would have resulted in the seeding of the planet with building blocks of life or their precursors. Two space missions have shed more light on the presence of organic material in comets. The most spectacular one was “Deep Impact”, in which a refrigerator-size impactor was deliberately collided with comet Tempel 1. The collision occurred on July 4, 2005 and was photographed from the mother ship (figure 5). The energy released by the collision vaporized a small part of the comet, which allowed Earth-based spectral analysis of the compounds present in the nucleus. As expected, large amounts of water were released. Even though the mission was not specifically designed to single out organics present in the comet, it turns out that organic compounds called polycyclic aromatic hydrocarbons (PAHs) were easily detected. These organics are fairly complex and include chemicals such as naphthalene and anthracene. In addition, another mission, called “Stardust”, actually collected cometary material during a close fly-by of comet Wild 2 on January 2, 2004. The spacecraft landed back on Earth January 5, 2006. Stardust’s comet material collector consisted of a tennis racketsize instrument filled with aerogel, a extremely light material designed to trap comet dust with minimal damage due to collision 9

and heating. Analysis of samples brought back to Earth confirmed the existence of PAHs. But in additon, these PAHs, and possibly other organics, were found to contain O, N, and S heteroatoms (so-called because usual PAHs only contain H and C). Some other organics contained OH, CH, CH3, CH2, C-O, and C-N molecular groups, indicative of rich organic chemistry. The amino acid glycine was also detected. Comet impacts could indeed have delivered organics to prebiotic Earth.

Figure 5. Collision between comet Tempel 1 and the “Deep Impact” impactor (Source: NASA). Similarly, meteorites, some of which contain organic compounds, as we have seen, still collide with Earth today and may have played the same role. It is known that planet Earth accumulates about 100 tons per day of meteoritic material in the form of micrometeorites. These micrometeorites measure about 0.1 mm in diameter and are carbon rich (up to 7 percent by weight). Micrometeorite samples have been collected in the “clean” environment of antarctic ice and have been shown to contain detectable amounts of organic molecules. It is quite possible that prebiotic Earth accumulated vast amounts of carbon-containing compounds by sweeping up micrometeoritic material. If comets and meteorites and their cohort of organic molecules truly originate from primeval interstellar clouds (at a further stage of processing in the case of meteorites), these clouds should also contain molecules found in comets and meteorites. Astronomical observations have shown that this is the case. These clouds contain, in addition to gases, solid material in the form of dust. Their spectra indicate that this dust is made of microscopic silicate granules covered with ice composed of water, methanol, methane, carbon monoxide, carbon dioxide, and PAHs. Given that interstellar clouds are constantly bathed in UV light emitted by nearby stars, the potential for interstellar chemistry is great. Interstellar ice analogues have been created in the laboratory: the compounds known to exist in interstellar clouds were sprayed in a vacuum chamber kept at very low temperature to simulate conditions found in space. The ice grains thus formed were illuminated with UV light to mimic starlight. Analysis of these ices revealed that complex organic molecules could be formed,

10

including quinones (electron transfer intermediates in photosynthesis), amino acids, and long-chain hydrocarbons with properties similar to amphiphilic lipids found in the membranes of living cells. These amphiphilic hydrocarbons are able to form microscopic vesicles (liposomes, see cover), presumably through bilayer formation, when dispersed in water. Intriguingly, similar compounds are found in some meteorites and they, too, form these vesicles in water. Such vesicles are able to trap a variety of chemicals, and by keeping these chemicals in close contact, they may have initiated some of the chemical reactions that led to life. Interestingly, amphiphilic hydrocarbons are not formed in Miller-type experiments. In addition, experiments conducted under space-like conditions have shown that a rather common mineral, titanium dioxide (TiO2) present in space dust, can favor the synthesis of RNA bases from the simple compound formamide, also thought to be present in space. In conclusion, space contains a host of organic molecules able to generate, through chemical reactions, some compounds necessary for life to appear. Furthermore, space chemistry seems able to produce molecules not found in gases submitted to electric discharges. Another objection to Miller’s experiments (and perhaps space chemistry) is based on the chirality of the molecules produced in sparked gases and in space. Chirality, also called handedness, refers to the three-dimensional arrangement of atoms in a molecule. A good analogy is that of the symmetry of human hands, hence the word handedness. The left hand and the right hand are directly superimposable when pressed palm to palm. They are not perfectly superimposable when pressed palm to knuckles. To achieve palm-to-palm contact, one of our hands must be rotated, or a single hand can be directly superimposed with its mirror image. The same thing happens in nature: amino acids come in two varieties, left-handed and right-handed, which can be distinguished by their ability to rotate polarized light either to the left or to the right. It turns out that amino acids in living cells belong to the left-handed category. One of the criticisms of Miller-type experiments is that they produce equal amounts of left-handed and right-handed amino acids, something not found in living systems. In contrast, amino acids found in meteorites contain slightly more left-handed amino acids than right-handed ones. One hypothesis offers the possibility that UV light emitted by stars could have favored the synthesis of left-handed amino acid in some celestial bodies. This does not prove that life originated from meteorites, but it certainly does not rule out this notion. Thus space itself, not necessarily Earth and its primitive atmosphere, may have been responsible for the synthesis of the building blocks of life. Or was it?

OCEAN FLOOR CHEMISTRY Hydrothermal vents are another potential source of prebiotic organic compounds. An omnipresent fact of life is that science constantly questions and re-questions received knowledge. That some scientists think that organic compounds important for life were formed in the atmosphere or were brought to Earth by comets and meteorites does not make it so. Other hypotheses can always be formulated. A third hypothesis, which holds that the first organic compounds were made on Earth rather than in space or in the atmosphere, is the hydrothermal vents hypothesis. Earth’s crust is particularly thin at the level of ocean floors. There, tectonic plates slowly slide on top of the magma and by doing so, create ridges and cracks in the crust. Water percolates through these cracks, becomes superheated by contact with the magma, and dissolves many of the minerals present there. The pressure is so high that the water, heated to several hundred degrees, does not boil but is spewed back through chimney-like structures called hydrothermal vents (figure 6). These vents are often surrounded by miniature ecosystems, where many types of bacteria, worms, and crabs proliferate. Now, high pressure and temperature can create some interesting chemistry. Could it be that the building blocks of life were once synthesized from purely inorganic compounds in or near hydrothermal vents? Theoretical studies performed in the mid 1990s indicated that, yes, this is possible. The branch of chemistry called thermodynamics mathematically determines what chemical reactions are possible or impossible. Using this approach, scientists realized that in water, the synthesis of all amino acids and their linking to form small proteins was theoretically possible above 100 C and under high pressure. The ingredients needed to achieve this synthesis were CO2, ammonia salts, and H2S (some amino acids contain sulfur). It turns out that both high temperature and these particular chemicals are found in oceanic hydrothermal vents. Therefore in theory, amino acids and proteins could have been formed at great depth under the surface of the ocean. Of course, such calculations did not prove that these reactions actually took place; some scientist used the derogatory term paper chemistry to characterize these studies. The hydrothermal vent hypothesis had to be put to the test. 11

Figure 6. An oceanic hydrothermal vent. (Source: National Oceanographic and Atmospheric Administration) This was done using an instrument called a bomb, which is simply a reinforced container able to sustain high temperatures and pressure. It was shown that, indeed, a whole catalog of organic molecules, including amino acids and pyruvic acid (an important metabolite ubiquitous in living cells), could be formed at high pressure and temperature from H2S, CO, and CO2 as well as ammonia (from nitrate) and nitrogenated hydrocarbons. Interestingly, iron sulfide (FeS) was absolutely necessary to catalyze these reactions, to generate hydrogen for reduction reactions, and to concentrate and stabilize the reaction products. This mineral is abundant in the earth’s crust in the form of pyrrhotite. Although the formation of nitrogenous bases as found in RNA and DNA has not been reported, hydrothermal vent chemistry seems to be much more than just paper chemistry. In summary, no one of the three hypotheses aimed at explaining the origin of the building blocks of life is complete. Experiments “à la Miller” have not yielded long-chain amphiphilic hydrocarbons like those found in the membranes of living cells, but amino acids and nitrogenous bases were produced. Space-based chemistry can produce the three types of compounds, but critics argue that organic molecules ferried by comets and meteorites colliding with Earth could not have survived atmospheric entry and impact because of the high heat generated. Finally, high temperature-high pressure chemistry, as is presumably taking place in hydrothermal vents, has so far not produced nitrogenous bases and long-chain lipids. How can this conundrum be solved? It is always possible that more research on the preceding three systems will yield some new findings and solve the mystery. On the other hand, it is also possible that life did not need amino acids, nitrogenous bases, and long-chain lipids all at the same time. One can imagine, for instance, a situation in which genetic information (in the form of bases linked together to form a nucleic acid) came first, and that this genetic information started coding for some form of primitive metabolism. Or perhaps another scenario occurred, in which mineral catalysts or primitive enzymes, made of amino acids synthesized in hydrothermal vents or elsewhere, started a “protometabolism” that eventually led to the formation of nucleic acids. In that case, genetic information would have appeared as a result of a mineral-based protometabolism and, possibly, after the products it now codes for, proteins. Yet another

12

possibility is that life appeared as a result of a complicated cooperation between simple proteins and simple nucleic acids. Scientists have not yet developed this third scenario. Therefore we will consider the “metabolism-first” world and the “RNA-first” world in that order, keeping in mind that these two models are not necessarily incompatible.

PROTEINS AND METABOLISM FIRST: THE IRON-SULFUR WORLD There are basically two schools of thought regarding the origin of the complex mechanisms that constitute life. The first school advocates that the primordial mechanisms were simple pre-metabolic reactions and/or spontaneous formation of simple proteins. The second school contends that informational RNA (ribonucleic acid) came first. This topic will be covered later. Two of the most prominent proponents of the idea that protometabolism was key to the appearance of life on Earth are the Nobel Prize winner Christian de Duve of Belgium and Günter Wächtershäuser of Germany. Some of their work is discussed below. Both espouse the view that iron- and sulfur-containing compounds were critical to the establishment of protometabolism in the prebiotic world (a world still devoid of life as we know it but poised to see its birth). Such a view hinges on two ideas: first, a source of energy was needed to make possible some prebiotic chemical reactions, and second, the formation of proteins may have happened spontaneously. It is well known that a great number of chemical reactions taking place in living cells require an energy supply in the form of ATP, a combination of adenine, ribose, and three phosphate groups. Before life existed, there was no abundant source of ATP. What could possibly have been the source of energy that drove prebiotic reactions in the primeval broth? In addition, assuming that the primitive Earth’s atmosphere was not as reducing as thought by others, but that it contained carbon, mostly in the form of CO or CO2 (but not CH4), and no longer any hydrogen, how could hydrogenated (reduced) forms of carbon be produced? It is indeed mostly reduced carbon that is present in biological molecules. A possible answer lies in a combination of the effects of UV light from the Sun and the presence of iron on Earth’s surface. This is how things may have worked according to de Duve. Not surprisingly, his model is based on oxidation-reduction reactions. Iron, in the form of oxides and salts, is a very common element on Earth. When chemically combined with other elements, iron becomes ionized (electrically charged) and its ions can either be doubly positively charged (Fe2+) or triply positively charged (Fe3+). Fe2+ is the reduced form of iron, whereas Fe3+ is its oxidized form. Both forms currently exist naturally on Earth and must have existed in the distant past as well. Reduced iron is soluble in water, whereas oxidized iron combined with oxygen is not. Thus in the absence of oxygen, prebiotic Earth’s waters must have contained significant amounts of reduced iron dissolved in water. Furthermore, water (H2O) always contains a certain amount of protons (H+) and hydroxyl radicals (OH–). Remembering that oxidationreduction reactions are accompanied by electron transfer it is easy to understand how the oxidation of iron may have powered some important chemical reactions on prebiotic Earth. It turns out that under the influence of UV light (provided by the Sun), the Fe2+ ions can react with the protons present in water to form Fe3+ ions and hydrogen gas. In this reaction, reduced Fe2+ donates an electron to each proton (H+) to give Fe3+ (an additional positive charge is gained because an electron is lost) and an H atom. Two H atoms quickly combine to give hydrogen gas, H2, which escapes into the atmosphere. The Fe3+ so generated then reacts with water to give iron oxide, which precipitates out of solution as a black material. Thus iron is oxidized in this process while protons are reduced to hydrogen gas (figure 7). This hydrogen may have rendered the atmosphere more reducing. Furthermore, and again under the influence of UV light, iron ions could possibly have cycled back and forth between a reduced and an oxidized state while reacting with CO2. Carbon dioxide, a gas, is soluble in water. Therefore dissolved Fe2+ ions could have donated electrons to CO2, which, in the presence of H+, would have been reduced to CH4 (methane) or ═CH2 compounds free to react with other substances. In addition, some of the newly formed ═CH2 compounds could have spontaneously reacted with Fe3+ ions (before they precipitated) to regenerate Fe2+. The interesting thing here is that this last reaction releases energy that could have been used to drive other reactions. This mechanism is called the iron cycle (figure 8). Thus iron chemistry allows for the formation of reduced carbon compounds, including methane, hydrogen, and possibly other reduced chemicals such as ammonia, hydrogen cyanide, and hydrogen sulfide. It also produces energy in the absence of ATP. This scenario shows that a native and strongly reducing primitive atmosphere may not have been needed; the iron cycle produced the necessary reduced ingredients in solution and these could have combined to form amino acids and nitrogenous bases to make 13

up the prebiotic broth. On the other hand, if the primitive atmosphere was reducing, the iron cycle could have provided energy to achieve reactions involving precursor compounds made in the atmosphere. Therefore the iron cycle and the formation of organic molecules in the atmosphere may have been synergistic.

Figure 7. How the oxidation of iron ions in water irradiated by ultraviolet (UV) light produces hydrogen gas that escapes into the atmosphere. In this process, reduced iron Fe2+ is oxidized into Fe3+, which reacts with water to produce insoluble magnetite. The electrons lost by Fe+ in the oxidation reaction are captured by H+ ions, always present in water, to give neutral hydrogen atoms that quickly combine to make molecular hydrogen gas, H2. (Source: adapted from de Duve, C. 1991. Blueprint for a Cell. Burlington, N.C.: Neil Patterson.)

Let us now examine how the iron cycle and its energy production could have been taken over by reactions that approximate living mechanisms (which the iron cycle does not). We saw earlier that the universal energy donor in living cells is ATP. Remember that it is the cleavage of one or two phosphate groups from ATP that generates energy usable by other reactions. Could the prebiotic broth have stored energy in compounds containing phosphorus but no adenine and no ribose (assuming these two were either absent or present in very low concentrations)? Yes, and the following reactions explain how.

Figure 8. The iron cycle. As Fe2+ is being oxidized into Fe3+, CO2 dissolved in water is reduced into ═CH2 compounds (hydrocarbons) that can be used as organic building blocks. The reoxidation of some of these ═CH2 compounds into CO2, coupled with the reduction of Fe3+ back to Fe2+, produces energy that can be used to drive other prebiotic reactions. The cycle is catalyzed by UV light. (Source: adapted from de Duve, C. 1991. Blueprint for a Cell. Burlington, N.C.: Neil Patterson.)

We must first assume that a class of molecules called carboxylic acids existed in the prebiotic broth. These organic acids have the general formula R—COOH, where R is a group that may be simply a hydrogen atom, or it may be a methyl group (—CH3) or 14

a more complicated group of atoms. For example, HCOOH is formic acid (produced by some species of ants) and CH3COOH is acetic acid—that is, vinegar. It turns out that carboxylic acids are made abundantly in a sparked mixture of methane, water, hydrogen, and ammonia. They are also found, along with amino acids, in some meteorites. Then we must assume that another class of molecules, called thiols, also existed in the prebiotic broth. Thiol derives from the Greek word for sulfur. The general formula of a thiol is R'—SH, where R' can be H (in which case the thiol is simply H2S, hydrogen sulfide) or a group containing reduced carbon, such as —CH3, to make methyl thiol. We have seen that H2S was probably originally present in the atmosphere of prebiotic Earth and if not, it was produced by hydrothermal vents. It is also soluble in water. It could thus have easily reacted in solution, under the action of UV light, with reduced carbon formed from carbon dioxide through the iron cycle, to produce a variety of organic thiols. Next, the carboxylic acids could have reacted with the thiols to form a class of compounds named thioesters: R—COOH + R′— SH →R′—S~CO—R + H2O The ~ sign in the thioester formula represents an energy-rich chemical bond. The energy necessary to form that thioester bond could have come from the iron cycle, or else this reaction could have occurred spontaneously at high temperature and high acidity (perhaps as in hydrothermal vents?). Finally, we must assume that phosphate (H2PO4–) was present. Thus thioesters could have reacted with phosphate in the following way: R′—S~CO—R+ H2PO4– → R′—SH + R— CO—O~P—HO3– and R— CO—O~P—HO3– + H2PO4– → R—COOH + H2P2O72–, where H2P2O72– is pyrophosphate. And finally, H2P2O72- + H2O



H2PO4-

thereby regenerating phosphate. Pyrophosphate, like ATP, is an energy-rich, phosphate-containing molecule, and it could have played a role in protometabolism in the absence of adenine and ribose. What is also remarkable about this set of reactions is that the carboxylic acid (R—COOH) and the thiol (R′— SH) are regenerated and can thus enter a new cycle of energy-rich pyrophosphate production. Taken together, the iron cycle and thioesters provide an interesting scenario for energy transactions in the prebiotic broth. The author of this scenario, de Duve, has called it the “iron-thioester world.” Needless to say, not everyone agrees with this scheme; particularly uncomfortable with it are those who think that protometabolism is a consequence of the appearance of genetic information, not the other way around. Nevertheless, the iron-sulfur world has more to offer. Having tackled the energy problem, proponents of this view also think that protein enzymes may have appeared without genetic information. Proteins are made by the chemical linking of amino acids. This linking does not occur spontaneously and is performed in cells through the mechanism of translation, involving not only the amino acids but also ribosomes and transfer RNAs. Assuming that ribosomes and transfer RNAs did not exist in the iron-sulfur world, how could proteins have formed? Again, thioesters come to the rescue. As their name indicates, amino acids are acids. Not only that, they are also carboxylic acids containing the —COOH group that we encountered before. Thus amino acids could have reacted with thiols (R'—SH) to form thioesters, too. Furthermore, it has been demonstrated that amino acid thioesters can polymerize (make long chains of linked amino acids) spontaneously, without ribosomes and tRNAs as in present-day living cells! Therefore the prebiotic broth may have contained a large, random set of different proteins, all potentially composed of many different amino acid sequences. And these proteins, much like today’s proteins made by living organisms, could have possessed enzymatic activity. These “proto-enzymes” could have progressively synthesized the building blocks of nucleic acids (the nitrogenous bases), and they even could have diversified the pool of amino acids to form other types of protoenzymes. These many types of protoenzymes could then have been able to catalyze the synthesis of RNA or DNA—that is, genes. At this point, the prebiotic broth would have become genetically informational.

15

There are, of course, problems with this view. For example, we do not know whether random chains formed from amino acid thioesters have any kind of significant and relevant enzymatic activity. This must be tested in the laboratory. Then it is unclear whether these putative protoenzymes (with an assumed catalytic activity) would have been able to coordinate their activities to produce any significant amount of genetic material, RNA or DNA. Thus the iron-sulfur world is an interesting, falsifiable hypothesis that must be buttressed by a considerable amount of lab work. Nevertheless, this theory is attractive because it does provide a possible path to RNA or DNA. Another variation on the importance of sulfur-containing compound in putative prebiotic chemistry is the recently reported formation of very short proteins (called oligopeptides, figure 9) mediated by carbonyl sulfide (SCO). In the presence of metal ions, this gas reacts spontaneously with amino acids dissolved in water to form chains of amino acids containing up to four units. Interestingly, carbonyl sulfide is emitted by both volcanoes and hydrothermal vents. At this point, it is not known whether these particular oligopeptides have any kind of enzymatic activity. But remarkably, other researchers have shown that some oligopeptides containing only two amino acids can catalyze the synthesis of certain sugars. What is more, these sugars are predominantly right-handed and thus have the chirality of sugars present in living cells! The Wächtershäuser model for the origin of protometabolism also relies on iron and sulfur chemistry. This scientist (who is also a patent lawyer) does away altogether with the notion of primordial broth. For him, Earth did not need a helping hand from space, lightning, or a reducing atmosphere; all necessary ingredients were present from the beginning in volcanoes and hydrothermal vents. He considers that only CO2, CO, H2S (present today in volcanic emissions and hydrothermal vents), and FeS (common in Earth’s crust) were needed to get protometabolism started. This is how it would have worked. First, as in the de Duve model, a source of energy and electrons (to achieve reduction reactions) is necessary. This source could have been the reaction between iron sulfide and hydrogen sulfide: FeS + H2S → FeS2 + 2 e– + 2 H+, where FeS2 is pyrite. In addition to releasing electrons for reduction, this reaction also releases energy. The mineral pyrite formed in this reaction has interesting properties, in that it can strongly bind to all sorts of electrically charged molecules. This, according to Wächtershäuser, would have allowed organic molecules to line up in close proximity on the mineral’s surface and undergo further chemical reactions, a protometabolism of sorts. For example, the universal metabolite pyruvic acid could actually be formed in a sequence of three reactions: CO2 + 2FeS + 2H2S → CH3—SH + 2FeS2 + 2O CH3—SH + CO → CH3—CO—SH CH3—CO—SH + CO2 + FeS → CH3—CO—COOH + FeS2, where CH3—CO—COOH is pyruvic acid. Other protometabolic reactions would have occurred in a similar fashion, including the synthesis of amino acids and nucleotides, necessary to make proteins and RNA, respectively. In this scenario, there is no need for thioesters (as in de Duve’s model). Nor is there need for organic material to be delivered from outer space or from a reducing atmosphere. On the other hand, this model provides no clue regarding the functions of putative protoenzymes and first nucleic acids. Finally, the American biochemist Sydney Fox proposed that protein enzymes could have been produced on prebiotic Earth in the absence of genetic information. For this, he first demonstrated that amino acids analogous to those found in a Miller-type of experiment could also be synthesized by simply heating aqueous solutions of formaldehyde and ammonia. These two compounds can be formed under a variety of plausible prebiotic scenarios. Next, he demonstrated that heating dry mixtures of amino acids leads to the production of polymers containing up to several hundred chemically linked amino acids, thus mimicking modern proteins. Fox used dry heat in these experiments for the following reason: the linking of amino acids to produce protein polymers cannot happen in water because this reaction is accompanied by the elimination of a water molecule. Thus in an aqueous environment, the water molecules present in the solution favor the reverse reaction—that is, the destruction of the bonds linking amino acids.

16

Figure 9. How carbonyl sulfide makes short peptides by reacting with amino acids.

Fox reasoned that some of the primeval aqueous broth with its amino acids could have been splashed by the wind or waves onto hot rocks, where the water evaporated and the amino acids started polymerizing. This effect would have created a large variety of randomly formed protein-like molecules. Some of these molecules could have been endowed with enzyme activity and could have started a kind of protometabolism, as assumed by de Duve in the case of thioester enzymes. The question then is, do protein-like polymers “à la Fox” made in the laboratory possess any kind of enzyme activity? The answer is yes, they do, albeit weakly. Furthermore, Fox’s proteinoids, as they are called, can spontaneously form microscopic spheres in which molecules that will be acted upon by proteinoids can be trapped. Thus these microspheres could be seen as primitive cells, possibly performing some protometabolic functions. The main line of criticism aimed at these results and interpretations is that such microspheres, even if they were formed and persisted (they are quite unstable and fragile under laboratory conditions), could not have evolved because they contain no genetic information. Of course, one could always retort that proteinoid microspheres could have become capable of synthesizing random nucleic acids that were themselves able to evolve. Fox died in 1998 and, to my knowledge, work on proteinoids has stopped.

17

CONCLUDING REMARKS We do not know how the building blocks of life appeared on Earth. They may have originated from organic material present in interstellar clouds, from meteorites and comets, from hydrothermal vents, from a reducing atmosphere, from all four, or from sources we have not yet imagined. Whether proteins antedated nucleic acids in an iron-sulfur world or whether an RNA world gave birth to the first proteins is equally unknown. Perhaps even the iron-sulfur and RNA worlds cooperated to get life started. We have several scenarios but we do not know which one or ones prevailed. What this chapter has shown, however, is that scientists do not suffer from a lack of imagination. The hypotheses presented here are testable, and new discoveries will continue to provide material for further research and hypothesis formulation.

CONTROVERSIES The existence of opposing, even sometimes conflicting scientific theories is a sign of health in science, not one of weakness. Controversy is the gist of science. Günter Wächtershäuser, the proponent of the iron-sulfur world, is known to have said: “Without competing theories, you do not have science, you have religion.” He is of course right. People opposed to a scientific explanation for the origin of life point out that science has not yet reached a consensus regarding this matter. They are right too. However, is a dogmatic explanation based on some creation myth in any religion a better alternative? From a scientific standpoint, there potentially exists an infinite number of religious explanations, the existing ones all based on unverifiable revelations, usually described in very ancient texts, whose perpetuation was ensured by a powerful clergy and societal pressure. These revelations cannot withstand scientific enquiry because a dogma is by definition impervious to critical thinking. On the contrary, science offers free and unbiased analysis of the natural world. It is then not surprising that scientific free thinkers have come up with a variety of scenarios for the origin of life. The major difference between dogma and science is that experiments and observations will ultimately weed out hypotheses and theories that do not conform to reality. This shows the great strength of the human mind in its ability to apply logical thinking to the natural world.

FURTHER RESEARCH The quest for organic molecules, in particular those of prebiotic nature, away from planet Earth continues. One of the missions of the Curiosity Mars rover, which landed in August 2012, is to look for organic compounds that are potential biosignatures of extinct or current lifeforms. At the time of this writing, NASA has made no organic molecules have been found. Further, missions to Europa (one of the satellites of Jupiter), believed to harbor an ocean of liquid water underneath a thick ice crust, and potentially hosting prebiotics or possibly life, are being planned. Meanwhile, here on Earth, the chemical analysis of water deep inside hydrothermal vents would help determine whether any kind of prebiotic organic chemistry is taking place there.

DISCUSSION QUESTIONS 1. On what assumptions was Miller’s experiment based? 2. What kinds of organic compounds were synthesized in Miller’s experiment? 3. What are some of the major objections raised against the results obtained by Miller? 4. What is chirality and why is it a problem regarding the origin of organic compounds? 5. What is a possible role for iron on prebiotic Earth? 6. In the absence of ATP, how could chemical energy have been produced on prebiotic Earth? 7. What is a possible role for sulfur on prebiotic Earth?

18

REFERENCES AND WEBSITES Bernstein, M. P., S. A, Sandford, and L. J. Allamandola. 1999. Life’s far-flung raw materials. Scientific American 281:42–49. Brack, A. 1998. The Molecular Origins of Life. Cambridge, England: Cambridge University Press. de Duve, C. 1991. Blueprint for a Cell: The Nature and Origin of Life. Burlington, N.C.: Neil Patterson. de Duve, C. 1995. Vital Dust: Life as a Cosmic Imperative. New York: Basic Books. Hazen, R. M. 2001. Life's rocky start. Scientific American 284:77–85. Huber, C. and G. Wächtershäuser. 2006. α-hydroxy and α-amino acids under possible Hadean, volcanic-origin-of-life conditions. Science 314:630-632. Leman, L., L. Orgel and M. Reza Ghadiri. 2004. Carbonyl-sulfide-mediated prebiotic formation of peptides. Science 306:283-286. Maynard Smith, J., and E. Szathmray. 1997. The Major Transitions in Evolution. Oxford, England: Oxford University Press. Miller, S. L. 1953. Production of amino acids under possible primitive Earth conditions. Science 117:528-529. Tian, V., O. B. Toon, A. A. Pavlov and H. De Sterk. 2005. Hydrogen-rich early Earth atmosphere. Science 308:1014-1017.

This site is an article by Günter Wächtershäuser, a strong proponent of the “metabolism first” model for the origin of life: ajdubre.tripod.com/Sci-Read-0/y-OriginLife-82500/OriginLifeSci-82500.html

This site tells the story of the Miller experiment: exobio.ucsd.edu/birthday_70.htm

These lecture notes are based on revised chapter 4 of “Origins of Life and the Universe”, Columbia University Press, 2003, by P. F. Lurquin.

19

Christian de DUVE (1917- 2013)

Christian de Duve was born in 1917 in Thames-Ditton, near London, England. His parents were Belgian refugees from World War I who had fled their country when it was occupied by German troops. de Duve and his family returned to Belgium in 1920 to live in the port city of Antwerp. Christian de Duve entered medical school at the University of Louvain, Belgium in 1934, graduating in 1941. de Duve was introduced to research in biochemistry during his medical studies and quickly understood that his destiny laid in basic research, not the practice of medicine. Unfortunately, de Duve graduated during World War II, with the Germans occupying Belgium again, meaning that supplies for research were practically impossible to come by. Therefore, de Duve decided to study chemistry at Louvain, graduating with a BS degree near the end of the war. He then did postdoctoral research in Sweden and at Washington University in St. Louis. In 1947 he was offered a faculty position at the University of Louvain, where he stayed until his retirement. He was also offered in 1962 a professorship at Rockefeller University in New York, where he spent about six months per year. For several years, de Duve’s research was centered on the metabolic effects of insulin. But then, totally serendipitously, he discovered two new categories of cytoplasmic organelles, the lysosomes and the peroxisomes. As a result, he completely dropped his insulin project, concentrating instead on these two classes of organelles. For this work, he was awarded the Noble Prize in 1974. Lysosomes are cytoplasmic vesicles that contain digestive enzymes that degrade cellular waste products. They are also responsible for the process of phagocytosis, by which cells pick up food from the outside medium, as well as bacteria, which are then destroyed inside the lysosomes. Peroxisomes are also cytoplasmic vesicles that destroy hydrogen peroxide made by some metabolic reactions. They also play a role in the breakdown of fatty acids. It is only late in his career that de Duve started concentrating on the origin of life. This is not surprising, and this phenomenon actually holds true for a number of scientists interested in this problem. Why? In my opinion, until recently, funding for research in this area and credibility of the whole field were major hurdles. Even though Stanley Miller had published his astonishing results as early as 1953, it is not until much, much later that origin of life research became recognized as a genuine field of investigation. Even today, origin of life research is almost always part of a larger research program.

20

21

CHAPTER 2

The RNA World “What impresses me most,” he continued, “is that everything comes from one single cell. Several million years ago a little seed appeared which split in two, and as time passed, this little seed changed into elephants and apple trees, raspberries and orangutans. Do you follow me, Hans Thomas?” —JOSTEIN GAARDER, The Solitaire Mystery (1996)

Not everyone is buying the iron-sulfur world hypothesis and its haphazard synthesis of protein catalysts in the total absence of genetic information (see topic #1). This skepticism also applies to Sydney Fox’s theory (also topic #1). Many scientists (possibly a majority) prefer to think that genetic information came first and that proteins followed suit, once some kind of transcription/ translation mechanism (see below) had evolved. For this to have happened, it must be assumed that informational macromolecules such as DNA or RNA must have been somehow synthesized in the absence of protein enzymes.

GENETIC INFORMATION FIRST: THE RNA WORLD A second school of thought proposes that before pre-metabolism and simple proteins played important roles, populations of RNA molecules started an evolutionary process that would lead to complex life. In the late 1960s, three scientists, Carl Woese of the United States, Leslie Orgel of England (but working in San Diego), and Nobel laureate Francis Crick also of England independently proposed that RNA, not DNA, could have been the first genetic material. There are four reasons for this. First, many viruses possess an RNA genome. Second, the synthesis of DNA building blocks, deoxyribonucleotides (consisting of nitrogenous bases linked to the sugar deoxyribose and to phosphate groups), proceeds via ribonucleotide (the building blocks of RNA) intermediates. Third, DNA replication in cells is “primed” by short stretches of RNA. And fourth, to be expressed, DNA genes must first be transcribed into RNA molecules, which are then decoded by transfer RNAs on a ribosomal matrix composed of 50 percent RNA. These observations suggested that RNA may be more ancestral than DNA and may have constituted the very first genes. To recall, RNA is a polymer consisting of purine and pyrimidine bases linked to ribose moieties that are themselves linked by phosphodiester bonds (figure 1). This hypothesis raises several questions. The first one pertains to the likelihood of synthesizing the building blocks of RNA, the ribonucleotides, made of the bases A, U, G, and C, the sugar ribose, and three phosphate groups. The second question deals with linking these blocks together to form RNA chains, and the third problem is how to make these RNA chains replicate (multiply) to keep them going so that they evolve into genuine living cells. Today, all three mechanisms involve protein enzymes. In an RNA world, there were no such enzymes, and it must thus be assumed that ribonucleotide synthesis, polymerization, and RNA replication either occurred spontaneously or were catalyzed by non-enzyme catalysts. We have seen in Chapter 1 that the four bases could have been formed from hydrogen cyanide or other carbon- and nitrogen-containing molecules under reducing conditions. Alternatively, these bases could have been brought to Earth by meteor22

ites and comets (although cytosine has never been found in meteorites). The sugar ribose could have been formed from the polymerization of formaldehyde, also presumably abundant in the prebiotic broth, as it is in space. The problem here is that the polymerization of formaldehyde produces many other types of sugars besides ribose, and these other sugars also have a propensity to react with the four bases. Some authors have recently proposed that in the beginning, nucleic acids other than RNA may have been more prevalent. These include molecules formed with sugars containing six carbon atoms instead of the five found in the ribose ring, or a five-carbon-containing sugar (called threose, hence the abbreviation TNA for the nucleic acids containing it) with a structure slightly different from that of ribose. Thus it is not impossible that originally there was a whole family of nucleic acids whose members contained different sugars, with RNA finally taking over for unknown reasons. According to some, phosphorus could have originated from organic phosphates found in meteorites. In this scenario, there would have existed a pre-RNA world before genuine RNA appeared. Another recently proposed pre-RNA molecule would have been a peptide nucleic acid (called a PNA), a very robust nucleic acid-like molecule that can be made in the laboratory in which the bases are linked by peptide bonds, not bonds containing phosphorus. In addition, ribose is not needed to make a PNA (figure 2).

Figure 1. A short piece of single-stranded RNA. In a sense, a peptide nucleic acid is a kind of hybrid between a protein and a nucleic acid. Indeed, peptide bonds of the form –CO-NH- are used to link amino acids in proteins but are never observed in natural nucleic acids. Thus, it is not known whether RNA was really the first informational molecule on prebiotic Earth. So far, linking the four RNA bases to sugars in the laboratory, under plausible prebiotic conditions, has been successful only with A and G. Such reactions with U and C have not been observed. This is definitely a nagging problem, because it is hard to visualize RNA synthesis without the existence of ribo-U and ribo-C. Can these compounds be formed in the presence of special, and as yet undiscovered, catalysts? Were TNAs and PNAs involved? We simply do not know. However, Stanley Miller and colleagues have proposed that in a pre-RNA world, an alternative base, urazole, could have been used instead of U. Indeed, urazole reacts spontaneously with ribose (which U does not) and, like U, can pair up with A. Also, urazole can be made under plausible prebiotic conditions. On the other hand, adding phosphate groups to preformed combinations of bases and ribose has been achieved with all four bases. The problem here is that many of the base-sugar-phosphate combinations (called nucleotides) have structures that would not allow polymerization into RNA molecules. Clearly, much remains to be done to elucidate the mechanisms by which correct structures were formed in prevalent amounts. One possibility is that mineral catalysts favored the formation of correct molecular

23

structures, but such catalysts have not yet been identified. Alternatively, it might be imagined that the correct molecules were formed with the help of protoenzymes and pyrophosphate previously made in an iron-sulfur world.

Figure 2. A single strand of DNA (left) and PNA. Also, sugars found in modern nucleic acids pose a handedness problem, just as seen with amino acids. The sugars present in RNA and DNA are right-handed. However, prebiotic sugars must have been made in the right- and left-handed configurations in equal amounts. We do not know how right-handed sugars were eventually selected for the synthesis of nucleic acids. Nevertheless, assuming that pools of nucleotides with the correct structure came into existence, it is now necessary to link these nucleotides together to form RNA chains. Interestingly, this has been achieved in the laboratory by simply incubating nucleotides in the presence of minerals such as lead salts, uranium salts, zinc salts, or even clay. RNA chains consisting of up to fifty bases linked together were synthesized on a clay (montmorillonite) substrate. Quite certainly, these findings make plausible the synthesis of RNA chains under prebiotic conditions. But then there remains the problem of replicating these RNA molecules. Without replication, the genetic information present in an RNA molecule will be lost as soon as this molecule degrades. Furthermore, without errors in the replication process, RNA molecules once formed cannot change— they cannot evolve. Therefore, ancient RNA should have been able to self-replicate and the replication mechanism should have been error-prone to allow for genetic diversity to exist and selection of the best replicators to operate. Perhaps surprisingly, RNA replication in the absence of enzymes may not have been as daunting a challenge as was once thought. Clues to the prebiotic copying of RNA molecules were provided by the discovery of ribozymes, RNA enzymes. Until 1983, that all biological catalysts were protein enzymes was a firmly entrenched notion. That year, Thomas Cech and Sidney Altman independently discovered that this dogma was wrong. They both received the Nobel Prize for this discovery. Their findings unexpectedly showed that RNA too possesses catalytic, enzymatic activity, hence the name ribozyme (a combination of the words ribonucleic acid and enzyme). What kind of enzymatic activities are displayed by RNA molecules? First, there are conditions under which living cells “cut” some of their RNA molecules by cleaving their ribose-phosphate backbone at predetermined positions (figure 3). This cutting is done by the RNA itself, with no help whatsoever from protein enzymes.

24

FIGURE 3.A ribozyme cutting another RNA molecule. A natural ribozyme, called the hammerhead ribozyme (left), binds to a target RNA through base pair formation and cuts the sugar-phosphate backbone of the target RNA. (Source: http://helicase.pbworks.com/w/page/17605685/Lindsey-Jordan) Next, it was discovered that ribozymes could actually copy themselves and thus replicate, again, in the total absence of protein enzymes! It must be cautioned, however, that only short RNA molecules, a few dozen nucleotides in length, can be replicated that way in the laboratory. What is more, it has not yet been demonstrated that this replication activity is a true replicase. Indeed, a true replicase can not only “read” an RNA template (this is the activity already demonstrated), it should also be able to “read” the product of the first reaction in order to reproduce the original template. This is the activity that has not yet been observed. Further, it was recently discovered that ribosomal RNA (present in the ribosomes, where protein synthesis is taking place in cells) is responsible for the polymerization of amino acids into proteins. In other words, it is not a ribosomal protein enzyme that hooks up amino acids together to make protein chains; ribosomal RNA itself does the job. Some ribozymes even have the ability to splice short RNA molecules together to produce longer ones. Also, some artificial ribozymes can spontaneously bind amino acids, thereby imitating transfer RNAs. Finally, RNA chains consisting of as few as twenty nucleotides have been shown to possess ribozyme activity. This number is well within the ability of clay to catalyze the synthesis of RNA chains. What is more, recent research has shown that some short RNA molecules can bind small organic molecules (other than amino acids) such as adenine or vitamins B1 and B12. The binding of these small molecules triggers ribozyme activity in these RNAs. These special ribozymes are called “ribo-switches.” It is not difficult to imagine such ribo-switches in an RNA world where ribozyme activity could have been modulated and regulated through the binding of small molecules. The discoveries that certain RNA molecules can cut themselves and other RNA molecules, that they can join RNA molecules together, that some ribozymes can replicate to some extent, and that certain RNA molecules can catalyze the formation of proteins have given credence to a putative RNA world. In this prebiotic world, primitive RNA genomes could replicate, become processed by cutting and splicing, and help make proteins that later could become genuine enzymes, taking over some of the ribozyme properties and creating an integrated metabolic circuitry. Before this could happen, RNA genomes had to evolve through replication and replication errors in order to generate genetic diversity, much as mutation and natural selection do in the world today. We will see next how this may have taken place. Science sometimes works in strange ways. One might think that basic principles, such as the origin of the building blocks of life, should be firmly established before starting a discussion of the next step. But this is not necessary if one simply assumes that these building blocks did appear somehow or other. Thus, I will assume below that the chemistry of the prebiotic broth synthesized RNA molecules capable of replication. Whether protoenzymes made in an iron-sulfur world were involved in the replication mechanism is not that relevant here, because this section deals with the evolution of the RNA world and its transformation into a world where cells, no longer isolated molecules, dominated. The focal point of this section is the problem of the evolution of the primitive genetic material into the sophisticated information storage and transfer mechanisms that now operate in living cells. Not surprisingly, some of the ideas developed by scientists interested in the origins of life derive directly from studies done with extant biological systems. One such system of particular interest is constituted by bacteriophages.

25

LESSONS FROM BACTERIOPHAGES Experiments with modern bacterial viruses have paved the way to several hypotheses regarding the RNA world. Simply put, bacteriophages are viruses that infect and kill bacteria. Most bacteriophages have a simple structure, consisting of nucleic acid packaged in a coat made of protein. Many bacteriophages have a genome made of RNA that replicates many times in the infected host before its death occurs. The enzyme that makes possible the multiplication of a bacteriophage RNA genome is called an RNA replicase, and the gene that codes for this protein enzyme is present in the bacteriophage genome itself. To replicate this genome, the RNA replicase “reads” it—that is, it uses it as a template to link the relevant nucleotides together and form a new viral RNA chain. This process is repeated many times to produce a large number of copies of the invading bacteriophage RNA. It has been known for quite some time that RNA replicase is not a very precise enzyme; it makes mistakes at a fairly high rate by incorporating nucleotides at wrong positions in the newly formed RNA molecules. RNA replicase thus frequently produces mutant copies of the original invading bacteriophage RNA. In 1970, Sol Spiegelman of Columbia University published the results of an evolution experiment conducted in the test tube. He mixed RNA isolated and purified from bacteriophage Qβ (this RNA has a length of 4500 nucleotides), purified Qβ replicase, and all four nucleotides [adenosine-, guanosine-, cytidine-, and uridine triphosphates (ATP, GTP, CTP, and UTP)] to serve as building blocks for newly synthesized Qβ RNA molecules. It was known at the time that this simple test tube system worked very well and that copious amounts of Qβ RNA could be made this way. What was new, however, was the fact that Spiegelman did not wait until his test tube was full of newly made Qβ RNA; he did a number of serial transfers in which a drop from the first test tube was added after a short time to a second test tube containing the replicase and nucleotides but no new RNA. The process was repeated a third time, and so on. After a number of transfers, Spiegelman noticed that the RNA produced was no longer the 4500-base-long initial Qβ RNA; rather, the new RNA population consisted of a set of RNA molecules of approximately 500 bases, which he called variants. What had happened? Remembering that replicases are somewhat sloppy enzymes, these variants were produced when the replicase fell off the RNA template before fully completing its replication. These short RNAs replicated much faster than the full-length Qβ RNA simply because, as they were much shorter, they successfully competed with the longer RNA molecules for binding to the replicase. After enough transfers, there was nothing left of the original Qβ RNA; the test tube was now full of short variants. This truly was evolution in the test tube: the short variants outcompeted the long RNA because, being short, they possessed better fitness in the replication process.

Figure 4. The sequence of the 221-nucleotide Qβ minivariant. This molecule is partially double helical owing to the fact that A can pair with U and G with C.

26

These experiments were taken a step further in the laboratory of Manfred Eigen of Germany, a Nobel laureate in chemistry. There researchers mixed Qβ replicase with the ATP, GTP, CTP, and UTP nucleotides but added no Qβ RNA template at all. To what must have been everyone’s surprise, RNA was synthesized! In other words, the replicase was able to make RNA without a template to copy. This also meant that a protein (the replicase) was able to produce a genetically informational macromolecule (RNA) without preexisting genetic information! Could this have happened in the prebiotic broth? We do not know, but it remains an intriguing possibility. What is more, the RNA molecules were not of just any kind. They formed a family of short molecules composed of 150 to 250 nucleotides (therefore very much shorter than genuine Qβ RNA), called minivariants. Surprisingly, a 221-nucleotide-long species (the “midivariant”), one of the minivariants, is also found in nature in bacteria infected by Qβ (figure 4). How is one to interpret all this information? First, the fact that a natural 221-nucleotide-long RNA was produced in the test tube indicated that this experiment was not completely off the wall. Next, this experiment also showed that a whole family of RNA molecules was made under these conditions, not just one single type of RNA. This is important because considerable genetic variation was generated in this system, thanks to the ability of the replicase to make mistakes in the replication process. Taken together, the results of Spiegelman and Eigen show how a simple system can produce variation, and that selective pressure (in the form of serial transfers) can select for a particular family of variants. In fact, something similar may have happened in a pure RNA world without protein enzymes. Indeed, it is known that RNA replication catalyzed by ribozymes is imperfect, and therefore the RNA world would also have been able to create RNA variants. In other words, the existence of RNA variants in the prebiotic world made evolution by natural selection possible, with the most fit RNA molecules outcompeting the less fit. In the beginning of a self-replicating RNA world, better fitness simply meant more efficient replication, and hence better ability to compete for “food,” the nucleotide building blocks of RNA.

QUASI-SPECIES AND HYPERCYCLES The notion of replication error threshold helps us to understand how the RNA world could have evolved. This evolution may have depended on the ability of collections of RNA molecules to cooperate and become encapsulated inside membranes. Let us now consider a prebiotic world where a set of RNA molecules capable of replication (with or without protoenzymes) coexist in a pond with a given supply of nucleotides necessary for their replication. What is the fate of this population of RNA molecules? To tackle this problem, it is useful to think in terms of a quasi-species—a population of RNA molecules composed of individuals all possessing slightly different base sequences produced by replication errors. Or, to put it differently, a quasispecies is an ensemble of minivariants. These minivariants derive from what is called a master sequence, which is a single sequence representing the highest probability of finding a particular base at a particular position in that sequence. For example, the five short RNA molecules in the following list: UCGUCCA AAUUACG ACAAAUG ACGUGCG ACGCACG derive from the master sequence ACGUACG. In this example, it can be seen that the first minivariant has a U in first position, whereas the other four minivariants have an A (in bold) in that position. Thus an A has the highest probability of figuring in that first position and is found there in the master sequence. The second minivariant has an A in second position and a U in third position. We can see that the other variants have a C in second position, and the third position is occupied by a G in three out of five variants. Hence, the most probable base in second position is a C (in bold), and in third position it is a G (also in bold), and so on. Thus the most probable positions of bases are found in the master sequence. 27

Let us now assume that the master sequence has the highest fitness and replicates faster than the variants surrounding it. Knowing that RNA replication in the prebiotic world was imperfect, it is legitimate to ask what a particular replication error rate will do to the persistence of a quasi-species and its master sequence. It can be shown that a replication error threshold is reached at a certain point, beyond which the master sequence practically disappears and the quasi-species formed around the master sequence changes dramatically. This phenomenon is illustrated in figure 5. Here, log n0/n represents the ratio between the master sequence and all other sequences plotted as a function of the replication error rate (1 – Q) where Q is the fidelity of replication of each base. The value of Q varies between 0 and 1; if Q = 0, there is a 100 percent error rate at the level of that base, and if Q= 1, that base is faithfully replicated each time. Therefore the error rate per base per round of replication is (1 – Q). Figure 5 gives the example of a piece of replicating RNA 100-nucleotides long. At low replication error rates, the master sequence represents the vast majority of all the sequences present in the quasi-species. However, at a replication error rate of about 0.05 (5 errors per round of replication), the master sequence all but disappears, the quasi-species becomes randomized (remember that if n0/n=1, with n0, the master sequence, representing 100% of the molecules in the quasi-species, log n0/n=0)

Figure 5. The replication error theshold. log n0/n is the logarithm of the ratio between the master sequence and all other sequences present in the quasi-species. (Source: Wikipedia).

It can further be demonstrated that N < ln s/(1 – Q), where N is the length (in number of bases) of the RNA, ln s is the natural logarithm of the fitness of the master sequence. This equation is called a hyperbolic function. The graph representing that function is shown in figure 6. In it the y-axis represents RNA length and (1 – q) on the x-axis is error rate. The region of the graph where a quasispecies can exist stably is represented by the hatched area. Outside this area, demarcated by the hyperbola, a quasispecies will disappear because the replication error rate is too high. The hyperbola thus defines an error threshold beyond which quasispecies stop constituting “clouds” of variants surrounding a given master sequence. As can be seen from the graph, smaller RNA molecules tolerate higher error rates than large RNA molecules. The validity of this equation has been verified in living cells; for example, the error

28

rate for bacteriophage Qβ is 5 × 10–4 for a length of 4500 bases, a value that puts it below the error threshold. This is why bacteriophage Qβ still exists today.

Figure 6.Another representation of the replication error threshold. The hyperbola (thick line) represents the threshold, and the hatched area represents the region where quasi-species characterized by a genome size N and a certain error rate can survive. No is the lower limit for sufficient encoded information (50 to 100 nucleotides). The number (1 — qo) is a limit below which replication energy and time are prohibitive for a system. (Source: Adapted from Smith, J. M., and E. Szathmáry. 1995. The Major Transitions in Evolution. Oxford, England: Oxford University Press.) These theoretical developments clearly show two things: first, considering that the error threshold in the RNA world must have been very high, it can be calculated that the maximum size of the RNA molecules that could have been maintained under prebiotic conditions was about 100 nucleotides. This is a short length but still above that of modern transfer RNA molecules (seventy to eighty nucleotides). Second, RNA quasi-species displaying too high a replication error rate were doomed to extinction. This does not mean that all the RNA molecules present in this quasi-species necessarily disappeared; it means that sequences became randomized and no longer derived from a single master sequence. However, some of the RNA molecules produced under conditions exceeding the error threshold could have themselves established new, better-adapted quasi-species that replaced the extinct ones. In other words, the RNA world was able to evolve.

Figure 7.A simple hypercycle. The circular blue arrows are replicating quasi-species (as indicated by the fact that they are circular) that may also possess other ribozyme activity. In this scheme, each of the six quasi-species depends on the replicative success of the preceding one, as indicated by the green arrows. (Source: http://pespmc1.vub.ac.be/BIOLEXAM.html)

The next problem then is to understand how and why short prebiotic RNA molecules did not simply evolve into one gigantic quasi-species particularly fit to replicate but perhaps unable to perform any other function, such as coding for proteins. In this scenario, the “living” world today would still consist of that one successful RNA quasi-species. Eigen came up with the notion of hypercycles to solve this conundrum. Eigen’s group later demonstrated that the hypercycle model accounts very nicely for the life cycle of bacteriophages. The hypercycle is thus not just a fancy mathematical model.

29

In brief, a hypercycle is a circular feedback system of replicators (such as RNA molecules) in which each replicator depends on the success of the other replicators. A simple hypercycle is shown in figure 7. Basically, the stability of a hypercycle depends on the degree of cooperation contributed by each of its members. This whole concept is further complicated by the fact that each member of a hypercycle is a quasi-species. However, this complication has an advantage: hypercycles, too, can evolve. Let us imagine that one of the members of the hypercycle evolves into a quasi-species that not only is able to replicate itself faster but also helps the other members of the hypercycle to replicate faster. This is possible if one of the quasi-species mutates into an efficient replicating ribozyme that can replicate the other members of the hypercycle as efficiently as itself. In that case, this hypercycle will outcompete the other hypercycles. If, on the contrary, one of the quasi-species mutates into a “selfish” replicating ribozyme that replicates itself efficiently but somehow inhibits the replication of the other members, that hypercycle will disappear. Successful hypercycles would then have been those with the most versatility, in particular those that would have started coding for protein enzymes capable of improving the fidelity of RNA replication and the ability to make their own building blocks rather than simply relying on the prebiotic broth. Recalling the concept of fitness of organisms, we can see how this concept also applies to hypercycles consisting of molecules, not just cells or organisms. There is a problem with hypercycles, however. As they are made of free-floating RNA molecules presumably dispersed in water, it is hard to see how stable hypercycles could have existed in the absence of membranes containing them. For example, a rock falling into the pond would have disturbed any unbounded hypercycle, however successful it may have been before this accident. There was thus a need for hypercycle encapsulation. How did this happen? As we know, modern cell membranes are based on phospholipid bilayers. These are not produced in sparked gases and it is not known whether hydrothermal vents synthesize them. On the other hand, other amphiphilic molecules, such as octanoic acid (with eight carbon atoms) and nonanoic acid (with nine carbon atoms) are found in meteorites and could have been used to encapsulate hypercycles. Biochemists have recently demonstrated that one octanoic acid, a carboxylic acid with the formula CH3(CH2)6COOH, potentially present on prebiotic Earth, can spontaneously form liposomes (microscopic lipid vesicles) when dispersed in water. Many laboratory experiments have shown that liposomes can trap nucleic acids (RNA and DNA) if they form in a solution containing them. Liposomes can also trap clay particles to which RNA is adsorbed (figure 8).

Figure 8.A single liposome containing microscopic clay particles (A). (B) Similar liposome with RNA adsorbed to trapped clay particle (bright yellow area). (Source: National Institutes of Health).

This phenomenon could have led to the formation of “protocells,” entities made of one or several hypercycles contained within a membrane. Interestingly, liposomes made of amphiphilic molecules containing from eight to fourteen carbon atoms are permeable to substances dissolved in water. Therefore, prebiotic liposomes composed of such amphiphiles and containing RNA would have been able to import the building blocks necessary for RNA replication. What is more, laboratory experiments have shown that vesicles made of octanoic acid can grow spontaneously by incorporating free octanoic acid present in the medium (see topic #3). This could have led to some primitive type of liposome division in the prebiotic world. Modern cell membranes are made of phospholipids containing sixteen to eighteen carbon atoms. Liposomes made from such phospholipids are highly impermeable to water-

30

soluble compounds. However, some ribozymes containing as few as 44 nucleotides have been shown to considerably enhance the permeability to water-soluble compounds of liposomes made of long-chain phospholipids. These ribozymes were produced by in vitro evolution (see below under Further Research). Thus, a pre-cellular RNA world could have relied on this type of ribozyme activity when short amphiphilic molecules in membranes were replaced by long ones. Living cells contain proteins embedded in their phospholipid bilayers and, these are responsible for the import and export of nutrients and waste products. One can then assume that permeability-enhancing ribozymes were replaced by more efficient membrane proteins after the latter appeared.

ORIGIN OF THE GENETIC CODE: FROM THE RNA WORLD TO PROTEINS The genetic code consists of 64 different codons. Where did it come from? How did the RNA world start coding for proteins? As we know, in modern cells the genetic information stored in DNA is first transcribed into messenger RNA molecules. This step was unnecessary in the RNA world because genes were made of RNA, not DNA. As we also know, mRNA molecules are next decoded by transfer RNA molecules to which an amino acid is attached. This process, called translation, results in the synthesis of proteins. In translation, the positioning of tRNAs along mRNA molecules is determined by sequences of three bases, called codons, that are read by the tRNA anticodons which also contain three bases. Now, two important questions arise. First, how did the RNA world develop the ability to attach amino acids to tRNAs, and second, what is the origin of the genetic code that specifies which codon corresponds to which amino acid? In modern cells, amino acids are attached to tRNAs through the action of protein enzymes called aminoacyl-tRNA ligases. An extant tRNA is shown in figure 9. A protein ligase-catalyzed reaction was not possible in the RNA world, where proteins did not exist. Also, where do tRNA molecules come from? Manfred Eigen’s team has done much work in this area. Much like DNA, RNA can form double helical structures through specific interactions between A and U and G and C. Thus primitive tRNAs and primitive mRNAs could have interacted by double helix formation, as they do in modern cells. For example, a proto-tRNA with a CCG anticodon could form a very short double helix with a proto-mRNA molecule containing a GGC codon and lock the amino acid it carries into place. Therefore the decoding of the genetic message in mRNAs by tRNAs is simply an intrinsic property of RNA. It is less clear how tRNA molecules “learned” how to attach specific amino acids. Some modern tRNAs can bind amino acids through the formation of pockets in their structure. Indeed, RNA molecules should not be seen as flat strings of nucleotides; they can assume complicated three-dimensional configurations. Proto-tRNAs may have been able to trap amino acids in the same way. Furthermore, a new type of ribozyme activity has been discovered recently: some short RNA molecules synthesized in the test tube can chemically bind amino acids without the help of protein enzymes. This activity could have been common in the RNA world. Following up on his idea of quasi-species, Eigen asked whether tRNAs could once have existed as a set of molecules grouped around a master sequence. To put it differently, Eigen wondered whether it was possible to trace the lineage of modern tRNAs to an ancestral master sequence that existed billions of years ago. If so, this would add great credibility to the concept of an RNA world. To answer this question, Eigen and coworkers compared the base sequences of 200 modern tRNA molecules, isolated from humans, animals, plants, fungi, and bacteria, and applied to these sequences mathematical techniques that help build phylogenetic trees (family trees or trees of descent). And indeed, a master sequence was found! Figure 10 shows the sequence of “the mother of all tRNAs.” The fact that this sequence folds into the classical doverleaf configuration of modern tRNAs adds great weight to the notion that tRNAs were (and still are) a quasi-species. The first time I saw this molecule, I could not help experiencing a slight shiver and a temporary mental blank. I was looking at a molecular fossil that existed at the dawn of life!

31

Figure 9.Amino acid attachment to a tRNA molecule. Left: Two-dimensional cloverleaf structure of a tRNA molecule. The amino acid binds to the end of the tRNA designated —OH. Right: Three-dimensional structure of the same tRNA. On the left, the anticodon of the tRNA is seen interacting with a GCA codon (coding for alanine) in the mRNA. 1, m1, Ψ, UH2; and mG are special bases found in tRNAs only. (Source: Adapted from Hartwell, L. H., L. Hood, M. L. Goldberg, A. E. Reynolds, L. M. Silver, and R. C. Veres. 2000. Genetics. New York: McGraw-Hill.)

Figure 10.The ancestral tRNA according to Eigen and coworkers. (Source: Adapted from Eigen, M., and R. Winkler-Oswatitsch. 1981. Transfer-RNA: The early adaptor. Naturwissenschaften 68:217-228.)

Next, it became necessary to tackle the origin of the genetic code, which, we know, is composed of sixty-four different codons determining twenty amino acids. It is highly unlikely that this complicated code appeared all at once. By taking a close look at the anticodon portion of the tRNA master sequence (the portion that reads the codon in the mRNA), Eigen and coworkers realized that this master anticodon was able to decode codons of the RNY type, where R is A or G, N is any base, and Y is C or U. In the prebiotic world, in the absence of stabilizing proteins, it would have been important for proto-tRNAs and proto-mRNAs to interact as strongly as possible during the decoding process. Knowing that the interactions between G and C in codon-anticodon binding are stronger than A-to-U interactions, the RNY codons must have been the following four:

32

GGC, which codes for glycine GCC, which codes for alanine GAC, which codes for aspartic acid GUC, which codes for valine In other words, the first genetic code would have been able to determine only four amino acids. Does this make sense? It certainly does when we consider that these are the four amino acids that are the most prevalent in a Miller-type experiment! This of course does not prove that Earth’s primeval atmosphere was strongly reducing (as required to make amino acids in the atmosphere) or that Eigen’s interpretation is correct. Yet, if this is a coincidence, it is a strange one. Thus the original genetic code could have been unidimensional, as shown in figure 11. How did it become three-dimensional? The mechanisms are not known, but we can assume that it first became two-dimensional by addition of an A in the first position and a C or a U in third position. Three-dimensionality was finally achieved by addition of more bases (see figure 11).

Figure 11.Evolution of the genetic code. The primitive unidimensional code is shown in (a) evolved into another unidimensional code and then in (b) with the addition of a second base (Y) that represents U or C. A two-dimensional code (c) then appeared by addition of an R base, representing either A or G. This code then gave birth to the present, three-dimensional code shown in (d). Amino acid abbreviations are as in figure 4.2 (with U replacing T). (Source: Adapted from Eigen, M., and R. Winkler-Oswatitsch. 1981. Transfer-RNA: An early gene? Naturwissenschaften 68:282–292.)

Scientists have wondered for a long time why the code is what is. Why is it that AUG codes for methionine and GUC for valine and not the other way around? Francis Crick has called the genetic code a “frozen accident,” but others are not so sure. For example, RNY codons code for amino acids that, when linked in proteins, tend to form structures called β-sheets. In these structures, amino acids are lined up to form extended protein configurations rather than packed ones. It also turns out that phylogenetic studies indicate that proteins that contain many β-sheet segments tend to be very ancient. This supports the idea that RNY codons are also very ancient, as already suggested by Eigen’s group. But this also suggests that the first proteins coded for by the evolving RNA world—and

33

rich in β-sheets-- may have possessed greater fitness than proteins with other structures. If this was the case, natural selection would have played a role in the evolution of the code. Also, the code seems to have evolved to minimize translation errors. This is suggested by the fact that codons that differ by only one letter are assigned to the same amino acid or to a closely related one. For example, CUU, CUC, CUA, and CUG all code for leucine. In addition, AUU codes for isoleucine and GUU codes for valine, two amino acids with chemical properties close to those of leucine. Thus, in the case of these codons, “reading” errors involving a single base would have had minimal effect on the properties of primitive proteins. We have now reached a point where proto-tRNAs in the RNA world could bind amino acids and also decode the codons found in proto-mRNAs. This thus allowed the positioning of amino acids next to one another in very close proximity. To form a protein, one last step needed to be performed—the linking of adjacent amino acids via chemical bonds. As we saw earlier, modern ribosomal RNA is capable of linking together amino acids in a growing protein chain through ribozyme activity. This could also have happened in the RNA world through the action of proto-ribosomal RNA molecules. When all these activities became integrated—that is, as soon as hypercycles (consisting of membrane-bound proto-tRNAs, proto-ribosomal RNAs, and proto-mRNAs) became able to synthesize proteins—the first protocells were born. This was a giant leap in the direction of life as we know it. Putting aside the complexity of the RNA world and its evolution, the following is a summary and tentative sequence of events that took place in it. Short proto-tRNAs and proto-mRNAs, formed randomly by the linking of nucleotides, exist as replicating quasispecies. They replicate via ribozyme activity and cooperate via formation of hypercycles. Proto-tRNAs capture the four most prevalent amino acids, also via ribozyme activity. The proto-tRNAs interact with protomRNAs via GNC codons. Proto-tRNA-proto-mRNA interactions through codon-anticodon binding bring amino acids in close contact. Ribozyme activity carried by proto-ribosomal RNA (also a quasispecies) links the amino acids together to make simple proteins with primitive enzymatic activity. Ribozyme activity splices together short proto-mRNAs to make longer proto-mRNAs with enhanced coding ability. At this point, or conceivably earlier, the hypercycles, consisting of proto-tRNAs, proto-mRNAs and proto-ribosomal RNAs, become encapsulated by membranes. Once this happens, evolution by natural selection gains full force, as the fittest protocells start proliferating faster than the less fit. Finally, the genetic code starts evolving as proto-enzymes catalyze the synthesis of amino acids other than the four initial ones. As a result, protein (enzyme) variability and versatility increase. At a certain point, RNA replicases, able to replicate RNA molecules with much more fidelity than ribozymes, are born. This higher fidelity allows the propagation of longer mRNA molecules, and diversity increases further. Life, in the form of RNA-containing protocells, is definitely on its way. As a final note, it should be stressed that the iron-sulfur world and the RNA world are not two completely incompatible models. In the iron-sulfur world, protoenzymes formed by spontaneous thioester condensation or participation of carbonyl sulfide would have helped the various steps just described. These protoenzymes would then have been taken over by proteins made through genuine, RNA-encoded proto-translation.

PROTOCELLS At a certain point the encapsulated RNA world must have started its own metabolism, generating what we can call protocells. What would the very first protocells have looked like? First, since DNA did not exist yet, they must have been ribo-organisms with a genome made of RNA. Next, their metabolism must have been very simple. They may have been chemoautotrophs, organisms relying entirely on mineral nutrients found naturally in the environment. One possible scenario for their overall metabolism was that they reduced atmospheric carbon dioxide in the presence of hydrogen and oxidized iron sulfide into pyrite with the help of hydrogen sulfide:

34

4 CO2 + 7 H2 → (CH2COOH)2 + 4 H2O, where (CH2COOH)2 is oxalic acid, a compound that can participate in other oxidation-reduction reactions. Hydrogen was provided by the formation of pyrite: FeS + H2S → FeS2 + H2, where FeS is iron sulfide and FeS2 is pyrite. The reduction of carbon dioxide into oxalic acid releases energy that could have been stored in ATP or pyrophosphate, or both, and used to drive other cellular reactions. Protocells evolved through mutations introduced into their genome by ribozyme replication errors and, later on, by RNA replicase-induced mistakes. It is also possible that gene duplication, that is, the formation of two copies of the same gene inside one cell, accompanied the appearance of protocells. We know that gene duplication is one of the main engines of evolution. Indeed, when an essential gene is duplicated, as long as one copy of this gene remains intact, the other copy can mutate without causing harm to the cell. As mutations accumulate in the second copy of the gene, this gene diverges more and more from its companion. At a certain point, a mutant duplicated gene can start coding for a protein different from that coded for by the original gene. . For example, gene duplication explains very well the origin of genes that code for the protein part of hemoglobin and myoglobin, globin. Humans can produce six types of hemoglobin and one type of myoglobin, an oxygen-binding protein found in muscle. It has been shown that the corresponding seven genes derived from a single ancestral copy that duplicated over time. Globin gene analogs are even found in some plant species. We will see in a further lecture how gene duplication can explain at least in part the evolution of differentiated organs in eukaryotes. By analogy, it is very possible that protocells rapidly diverged by RNA gene duplication. The RNA world hypothesis is a very attractive one because it bases the appearance of life squarely within the realm of evolution. RNA, by virtue of its ability to store information in a base sequence, propagate it by replication, and evolve through mutation, allows much more flexibility than can be seen in a world in which proteins appeared first. Indeed, even though proteins can store information through their amino acid sequences, they cannot replicate and they cannot mutate. Natural selection cannot operate on them. Not surprisingly, the most ardent proponents of a protein-first world tend to be biochemists more anxious to imagine primitive metabolic pathways than genetic information storage and evolution. Conversely, adherents to the view that the RNA world appeared first tend to be of a more genetic bent and consider that metabolic pathways are a consequence of the evolution of the RNA world. This debate is far from over, because at this point, we do not have firm evidence either way. Right now, the pendulum is swinging in the direction of the RNA world, but research on both hypotheses continues. For how long did a putative RNA world exist? Remembering that life could not have been established before the end of the period of heavy meteorite bombardment in the solar system 3.8 billion years ago, and taking into account that the first genuine prokaryotes have been dated to 3.5 billion years ago, the RNA world could not have lasted more than 300 million years—a very long time. For example, dinosaurs did not even exist 300 million years ago. At that time, Earth was populated by a large number of microscopic species and many types of sponges, mollusks, and worms. Land plants had already appeared, but not yet flowering plants. Fishes had been in existence for a while, but amphibians and reptiles were recent. However, there were no mammals and no birds, and the first hominids would not appear until about 296 million years later. Nevertheless, it is legitimate to ask whether life could have appeared from nothing in only 300 million years.

FURTHER RESEARCH In vitro evolution research has just begun. More surprises may be in store. Many of the ribozymes described in this chapter have been synthesized in the laboratory using a technique called in vitro evolution. This technique allows scientists to generate millions of different RNA molecules in a matter of days, thereby considerably shrinking evolutionary time. This is how one of the variants of this technique works. First, one starts with a population of short DNA molecules that can be random or designed to reflect the sequence of a given ribozyme. These DNA molecules are then used as templates for transcription by an RNA polymerase. Thus, one ends up with a population of RNA molecules corresponding to the original DNA population. Next comes the critical step of selection. In this step,

35

the population of RNA molecules is screened for a particular property. For example, if one is interested in the study of RNA molecules able to bind amino acids, one designs a technique whereby these particular RNA molecules can be separated from all the other RNA molecules in the mixture that are unable to bind an amino acid. This step enormously enriches the RNA population with molecules possessing the desired property (amino acid binding). Then, these enriched RNA molecules are reversed transcribed by an enzyme called reverse transcriptase, the result of which is to produce DNA copies of the enriched RNA. It turns out that reverse transcriptase can be forced to “make mistakes” and incorporate “wrong” bases in these DNA copies, which is equivalent to mutating these DNA copies. This is how evolution works: mutations occur and mutants are selected by the environment. Here of course, mutations are introduced by the experimenter and selection takes place in a test tube (hence the words in vitro), at the level of molecules, not live individuals. Once the collection of mutant DNA molecules is obtained, it can be replicated many times in the test tube to provide a working amount of DNA. This latter step can also be induced to introduce even more mutations. Finally, the population of mutated DNA molecules is transcribed into RNA and screened for enhanced ribozyme activity. This cycle can be repeated several times, the result being that more and more mutants (variants) of the original DNA population are produced by each cycle. The bottom line is that this technique allows the production of an enormous collection of different RNA molecules that may display totally unexpected properties, such as novel ribozyme activity.

DISCUSSION QUESTIONS 1. What is the role of ribozyme activity in the RNA world? 2. What is in vitro evolution? 3. Which of the two models, the iron-sulfur world and the RNA world, do you find more convincing and why? 4. How do quasi-species evolve? How can they cooperate to make hypercycles? 5. What functions should the first informational RNA molecules have possessed? 6. How did the genetic code evolve?

REFERENCES AND WEBSITE Cech, T. R., J. F. Atkins and R. F. Gesteland . 2005. The RNA World. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press. de Duve, C. 1995. Vital Dust: Life as a Cosmic Imperative. New York: Basic Books. Eigen M. 1992. Steps Towards Life: A Perspective on Evolution. Oxford, England: Oxford University Press. Eigen, M., P. Schuster, W. Gardiner and R. Winkler-Oswatitsch. 1981. The origin of genetic information. Scientific American 244:78-94. Eigen, M., and R. Winkler-Oswatitsch. 1981. Transfer-RNA: An Early Gene? Naturwissenschaften 68:282–292. Eigen, M., and R. Winkler-Oswatitsch. 1981. Transfer-RNA: The Early Adaptor. Naturwissenschaften 68:217–228. Hartwell, L. H., L. Hood, M. L. Goldberg, A. E. Reynolds, L. M. Silver, and R. C. Veres. 2000. Genetics: From Genes to Genomes. Boston: McGraw-Hill. Joyce, G. F. 2007. A glimpse of biology’s first enzyme. Science 315:1507-1508. Khvorova, A., Y.-G. Kwak, M. Tamkun, I. Majerfeld and M. Yarus. 1999. RNAs that bind and change the permeability of phospholipid membranes. Proc. Natl. Acad. Sci. USA 96:10649-10654. Mansy, S.S., Schrum, J.P., Krishnamurthi, M., Tobe, S., Treco, D.A. and J.W. Szostak. 2008. Template-directed synthesis of a genetic polymer in a model protocell. Nature doi:10.1038/nature07018.

36

Francis Crick (1916-2004)

During his lifetime, Crick has been called by some “the greatest theoretical biologist alive.” His colleague and Nobel Prize co-winner James Watson has said of him: “I have never seen Francis Crick in a modest mood.” To a journalist who once asked him: “Dr. Crick, what is your favorite hobby?” Crick replied: “Conversation…especially with pretty women!” A simple portrait of Francis Crick emerges from this: he seems to have been a genius, a flamboyant man, and a sarcastic jokester. Perhaps a little like the Beatles, his compatriots. He was also probably in the same league of fame, having discovered the structure of DNA and usually speaking to standing-only audiences. Francis Crick was born near Northampton, England, where his father ran a boot and shoe factory. He was a physicist by training, having obtained his BS degree in this field at University College in London in 1937. Two years later, Great Britain was at war with Germany. Like so many other scientists, Crick was recruited to work for the war effort and spent his time designing magnetic and acoustic mines. In 1949 he decided to switch fields and do biological research. This was not an unusual move at the time: many physicists had become attracted to the idea of applying sophisticated physical techniques to biological molecules. Crick joined the Cavendish Laboratory (Cambridge University), famous for its major contributions to X-ray crystallography, as a graduate student. There, with two colleagues, he developed the theory of X-ray diffraction by helical molecules. Together with American biologist James Watson, Crick discovered the double helical structure of DNA in 1953, based on experimental data obtained by Rosalynd Franklin and Maurice Wilkins. For this achievement, Crick, Watson, and Wilkins were awarded the Nobel Prize in 1962. In the meantime, Crick had finally obtained his Ph.D. at age 37. Francis Crick is one of the founding fathers of the science of molecular biology. Incredibly, Crick was turned down in 1958 for a professorship by Cambridge University. Nevertheless, he remained at Cambridge for another nineteen years. During this time, he made important contributions to our understanding of the genetic code and was first to hypothesize the existence of an “adaptor molecule” to achieve translation. This molecule turned out to be transfer RNA. Crick moved to California in 1976 where he was given a professorship at the Salk Institute, near San Diego. There, he directed his efforts toward the study of neuroscience and consciousness. He remained active until the end, editing a scientific manuscript on his deathbed.

Maynard Smith, J., and E. Szathmary. 1997. The Major Transitions in Evolution. Oxford, England: Oxford University Press. Vitreschak, A. G., D. A. Rodionov, A. A. Mironov and M. S. Gelfand. 2004. Riboswitches: The oldest mechanism for the regulation of gene expression? Trends Genet. 20:44-50. Zhang, S., Blain, J.C., Zielinska, D., Gryaznov, S.M. and J.W. Szostak. 2013. Fast and accurate nonenzymatic copying of an RNA-like synthetic genetic polymer. Proc. Natl. Acad. Sci. USA doi:10.1073/pnas.1312329110/-/DCSupplemental. The following site describes how in vitro evolution experiments are carried out: rnaworld.bio.ku.edu/class/RNA/RNA00/RNA_World_4.html

These lecture notes are based on revised chapters 4 and 5 from “Origins of Life and the Universe”, 2003, Columbia University Press, by P. F. Lurquin.

38

CHAPTER 3

The Origin of Structure in Physical and Biological Systems: Order from Chaos. Artificial cells. Nature’s many ordered systems can now be regarded as intricately complex structures evolving through a series of instabilities. Eric Chaisson, Cosmic Evolution (2001)

All life forms, whether they are unicellular like bacteria and amoebas, or multicellular like sponges, trees and humans, are highly complex, highly organized systems. In other words, living cells and organisms are highly structured. A cursory look at the prebiotic world, where the building blocks of life were interacting chemically, does at first suggest complete randomness, complete absence of order. Indeed, there is seemingly nothing in prebiotic chemistry that would suggest otherwise. For sure, amino acids, RNA bases, and other compounds needed to produce living systems reacted with one another to increase the molecular complexity of the prebiotic broth. But it is not easy to see how structure, the spatial and temporal organization that characterizes life, would have been created out of these haphazard interactions.

Yet, something happened that allowed the RNA world—a vast collection of RNA molecules of different lengths endowed with a variety of catalytic properties, perhaps associated with randomly made protein protoenzymes—to acquire structure and evolve into a complex cellular world. The question then is: How could a world made of scattered, dissolved molecules spontaneously transform itself into organized structures that at first approximated life as we know it? Starting about four decades ago, physical chemists and physicists have addressed this important issue and come up with a partial answer: order (such as life) can appear from disorder (such as chemicals randomly dispersed in water) if prebiotic reactions took place far from thermodynamic equilibrium and contained some autocatalytic steps. These scientists showed that structure can appear from randomness in what are called dissipative systems, systems that gain order by releasing entropy into the environment and indeed do operate far from thermodynamic equilibrium. This statement raises several questions. First, do dissipative systems violate the second law of thermodynamics, that stipulates that disorder must always increase and so must entropy? Next, what does it mean to operate far from equilibrium, and what is autocatalysis?

A BROAD DEFINITION OF ENTROPY The second law of thermodynamics states that entropy (and disorder) must increase in closed systems. Life is an open system meaning that, within it, entropy can decrease (order can increase) locally without violating the second law.

39

For some, entropy is akin to a mystical (or mystifying) concept somehow related to chaos or its opposite, order. In fact, entropy is a measurable quantity represented by S, in which S = Q/T, that is, the entropy of a system is equal to the quantity of heat (Q) contained in the system divided by its absolute temperature (T). Entropy is measured in kilocalories per Kelvin. However, since entropy is a thermodynamical quantity, it is preferable to speak in terms of entropy change of a system over time or space. The second law of thermodynamics states that for an unspecified system, entropy change must always be positive or zero, that is, in mathematical terms, δS = δQ/T ≥ 0 where δ means change. A simple example will clarify the meaning of the second law. Assume there is a well-insulated vessel from which all the air has been pumped out. Also assume that a movable partition divides the vessel into two equal halves. Let us now introduce a given volume of hot gas at temperature T1 into the left-hand portion of the vessel. The partition remains closed. Then, let us introduce the same volume of the same gas, at a lower temperature T2, into the right-hand portion of the vessel. Let us now open the partition. As soon as this is done, the two volumes of gas are in contact and diffuse into one another. As we know intuitively and from experience, the hot gas will transfer heat to the cold gas, the hot gas getting colder and the cold gas getting warmer. After a while all the gas present in the vessel will be at the same temperature, the system will have reached thermodynamic equilibrium. In such a system, it can be calculated that entropy has increased by a value proportional to the initial temperature of each gas and the quantity of heat transferred from the hot gas to the cold gas, as in δS = -δQ/T1 + δQ/T2. The first term of the right hand side of the equation has a negative sign because the hot gas loses heat while the cold gas gains heat, the second term then having a positive sign. In this example, the value of δS is a positive number because T1 (the temperature of the hot gas) is bigger than T2 (the temperature of the cold gas). On the other hand, if the two gases were initially at the same temperature, there would be no heat exchanged between them and δS would be equal to 0 because δQ = 0. Therefore, entropy change in the gas example is either greater than zero or equal to it. Let us now consider violations of the second law. One could have imagined a hypothetical—but counterintuitive-- different situation applied to the above example. Upon mixing, the cold gas could have transferred some of its heat to the hot gas, making the hot gas hotter and the cold gas colder. In such a case, the entropy change of the system, δS, would have been negative. This is never observed, inasmuch as a red-hot iron bar dipped into a bucket of water does not get white-hot while the water freezes. To illustrate this concept even further, let us consider a screw propelling a ship. As the ship’s engine turns the screw, not only does the ship move, but in addition, the action of the screw warms up the water with which it is in contact. Thus, mechanical energy is partially converted into heat energy. An example of heat (and energy) exchange forbidden by the second law is a case where water in contact with a motionless screw would spontaneously get colder, thereby transferring energy to the screw, making it turn, and propelling the ship without the need for an engine. Even though such an action would solve the world’s energy needs once and for all, it is not part of reality because it violates the second law. Going back to the example of the gases at different temperatures, let us see how entropy relates to order vs. disorder. One can say that, as long as the hot and cold gases are separated by a closed partition, the system, the vessel, is in a state of order: hot gas to the left, cold gas to the right. When the partition is opened, the two gases mix and equalize their temperatures. At this point, the system has become less ordered because the original two volumes of gas can no longer be distinguished, they have become one: it is impossible to determine which portion of the gas was initially hot and which part was initially cold. Therefore, an increase in entropy, as calculated above, is accompanied by a decrease in order of the system, or, equivalently, an increase in its disorder. Based on this kind of definition, the appearance of highly ordered living systems from the disorder of the prebiotic broth should have been impossible because it would have entailed a decrease in entropy. And indeed, this is an argument often used by creationists who then go on to say that life must have been specially created through a miracle, thanks to the intercession of a creator able to defeat the second law of thermodynamics. In fact, a miraculous intervention is unnecessary because the examples given above only apply to closed systems, that is, systems that do not exchange energy with the outside world, as in the cases of our wellinsulated vessel containing gases, the hot iron bar and the bucket of water, and the strange hypothetical ship. On the contrary, our planet, the system in which life exists, is not an isolated system. It is an open system that receives enormous amounts of energy from the sun in the form of heat and light. Therefore, the examples violating the second law given above do not apply to Earth at large and do not apply to life.

40

The thermodynamics of open systems is based on the fact that there, entropy change is written δS = δeS + δiS, where δeS is the entropy change of the environment (for example the whole universe) and δiS is the internal entropy change of a local system interacting with the environment (for example a growing tree interacting with the soil, the atmosphere, and sunlight). Therefore, it can be seen that the entropy of a local system can decrease (its order can increase), provided that the entropy of the environment increases. In other words, δiS can be negative as long as δeS >> 0. Thus, order can appear somewhere as long as disorder increases elsewhere. In effect, when order is created, as when stars and planets form, and as life appears, the whole universe serves as a gigantic entropy “sink.” Dissipative systems or structures (see below) are examples of systems—such as life--where entropy decreases locally at the expense of an entropy increase of the environment. In fact, all living organisms can be construed as ordered, dissipative structures that “dump” entropy into the universe to maintain themselves, grow and reproduce. They do this by keeping their metabolic functions far from thermodynamic equilibrium.

EQUILIBRIUM VS NON-EQUILIBRIUM Life is a system that operates under non-equilibrium conditions. As stated, life is characterized by a vast number of chemical reactions that overall operate far from thermodynamic equilibrium. To understand this, consider an isolated system in which a reversible reaction is taking place, such as A + B C + D (note the double arrow indicating a reversible reaction) where the forward reaction proceeds at a rate kd, whereas the reverse reaction proceeds at a rate kr. By definition, equilibrium is achieved when the product of the concentrations of C and D divided by the product of the concentrations of A and B, [C]x[D]/[A]x[B] = kd/kr = K, which is a constant that cannot change over time. At equilibrium, dS = 0, because we are considering here a closed system which receives no energy or matter from the outside world. In other words, the concentrations of A, B, C, and D are fixed and outside energy is not available to drive the reaction in one direction or the other. Thus, by definition, once equilibrium is reached, this system undergoes no net change; its entropy can neither decrease nor increase. This system cannot undergo any changes at all. Clearly, life cannot operate at thermodynamic equilibrium because if it did, all metabolic transactions would stop, meaning death for the cell or organism. Rather, life operates far from thermodynamic equilibrium. It is an open system where the net result of chemical reactions is of the irreversible type (as indicated below by the single arrow) A + B −> C + D and where the notion of an overall equilibrium constant K does not apply. That life is an open system is easy to understand. Consider plants that receive light energy from the sun and use it to power photosynthesis to which atmospheric carbon dioxide fixation and further metabolism are coupled. Plants are then eaten by herbivores which, in turn, are eaten by carnivores. We can thus say that ultimately, herbivores and carnivores are also the beneficiaries of light energy that comes to us from outside our terrestrial system, from the sun. Therefore, life is an open system that operates far from thermodynamic equilibrium because life receives energy (light) and matter (food) from the environment and processes them in a steady, continuous, irreversible manner. Life also generates structure. Just think about the simplicity of a fertilized mammalian egg relative to the complexity of an adult mammal. A young fertilized egg is a single undifferentiated cell whereas an adult is composed of trillions of cells differentiated into many different organs. At this point, it is thus legitimate to make an analogy between living systems and dissipative systems that, remember their definition, generate order from disorder. Both operate far from thermodynamic equilibrium, generate complexity from simplicity (more order from less order), and are open systems. Since we are interested in the origins of life and the origin of complexity, let us next see how the theory of dissipative structures may help us understand the transition between the prebiotic broth and primitive life. But first, it is important better to understand the nature of dissipative structures and how they occur in the inanimate world.

41

THE SURFACE OF THE SUN AND OIL IN A PAN CAN GENERATE DISSIPATIVE STRUCTURES Dissipative structures are ordered systems that form far from thermodynamic equilibrium. They can be found in inanimate matter. The temperature of the sun is not uniform throughout its volume. On the surface, the temperature is about 6000 K whereas in the core, the temperature is about 15 million K. Thus, there exists in the sun a steep temperature gradient between the center and its surface. This is a typical situation of non-equilibrium. Interestingly, the surface of the sun seen at high magnification shows definite structure: this surface is not uniform. Figure 1 demonstrates the existence of what are called convection cells in the photosphere, the outer layer of the sun. In these structures, gas circulates perpendicularly to the surface in an up and down motion. As hot gas from the interior moves toward the surface, it cools down, then sinks, is heated again, and resurfaces. This circular motion takes place in cells that are about 1,500 kilometers in size. These convection cells give the photosphere its granular aspect.

Figure 1. Solar convection cells at high magnification (Source: Courtesy of NASA).

Fascinatingly, a similar type of structure can be replicated on Earth, using less impressive temperature gradients and a totally different medium, cooking oil. When a thin layer of oil in a frying pan is heated on medium, convection cells that look very much like the structures seen in the sun’s photosphere appear, except that in oil, these structures are about 5 millimeters in size. These are called Bénard cells, after the name of the French physicist who first reported them. Here also, the formation of these cells depends on the steep temperature gradient that exists between the bottom of the heated pan and the top of the oil layer, which is in contact with air at room temperature. In both cases, structure is formed from gas (the sun) or liquid (the oil) where totally random motions of matter particles—and hence no structure--are normally expected. Both types of cells are ordered dissipative structures that form far from thermodynamic equilibrium, in this case, thanks to the existence of temperature gradients. Of course, these two examples are a far cry from the complex structure displayed by living cells, but there is more.

AUTOCATALYSIS CAN ALSO GENERATE ORDER OUT OF CHAOS Autocatalytic systems (systems that catalyze their own transformation) also create order far from equilibrium. Thus far, we have seen how systems heated far from thermodynamic equilibrium can generate simple structure. In addition, another phenomenon, autocatalysis, can also generate structure in dissipative systems. This phenomenon plays no role in the for-

42

mation of solar convection cells and Bénard cells in oil. What then, is the function of autocatalysis in the production of other types of dissipative structures? Autocatalysis simply means self-catalysis, a phenomenon which can take place in chemical reactions where a compound, under its own influence, makes more of itself. The archetype of autocatalytic reactions that generate structure is the Belousov-Zhabotinsky (BZ) reaction, named after two Russian scientists who discovered and studied it. This reaction produces order from initially disordered (well-mixed) chemicals in the form of colored rings that expand in space and time as chemical waves (figure 2).

Figure 2 Three views of the Belousov-Zhabotinsky reaction at different times. The initial reaction mixture is homogeneous (not shown). Over time, ring-like structures appear and expand in space. Note that the yellow background interferes with the real colors of the rings. Source: Wikipedia. The simplest BZ reaction mixture contains malonic acid, bromate, sulfuric acid, cerium (or manganese) ions and an ironcontaining indicator that turns red or blue, depending on the progress of the reaction. In fact, the BZ reaction is quite complicated, as it involves dozens of intermediate compounds. However, it is the formation of bromous acid from bromate which is the autocatalytic step in the BZ reaction. This step is inhibited by bromide, which is also made in the BZ reaction through the reaction of bromine gas (also produced in the reaction) with malonic acid. But, as bromide reacts with other compounds present in the reaction mixture, and as its concentration so decreases, the autocatalytic formation of bromous acid resumes. This takes place in a cyclical fashion over time and space, which explains the formation of expanding rings seen in figure 2. Many variants of the BZ reaction exist, as well as do other reactions that also generate order in space and time. For example, when the BZ reaction is allowed to take place in a well-stirred vessel, it produces a “chemical clock” where the color of the solution changes with time from red to blue in a very precise and reproducible manner. Other examples of oscillating reactions are the chlorite and chlorite-thiosulfate oscillators, as well as reactions involving chlorite, iodide, and malonic acid (the latter reaction is shown in figure 3). Interestingly, the spatial and temporal order created by these reactions can be disturbed by altering the conditions under which the reactions take place. For example, increasing the temperature or changing the concentration of the reagents can cause the structures (rings or circles) to assume a chaotic behavior: under these changing conditions, the structures can acquire new shapes and can ultimately vanish altogether. The technical term for these structural changes is “bifurcations.” It turns out that subtle modifications of conditions can induce dissipative structures to veer in a variety of directions through bifurcations—instabilities, really-- and accordingly change their internal organization up to the point where structure is lost. The intriguing behavior of chemical dissipative structures has been the subject of analysis by the equations of theoretical physical chemistry and computer simulations as explained next.

43

Figure 3. Another example of chemical reaction that generates a dissipative structure. The structure shown here can evolve into diagonal stripes when the parameters of the reaction are changed. Source: University of Brussels, Belgium.

THE BRESSELATOR MODELS BIFURCATIONS AND THE VARIOUS STATES POTENTIALLY ASSUMED BY DISSIPATIVE STRUCTURES Computer programs shed light on the appearance and disappearance of order in autocatalytic systems through bifurcations. The Brusselator, a set of theoretical reactions invented at the University of Brussels, Belgium, is the first of several models of autocatalytic reactions displaying dissipative structure formation. Another model is called the Oregonator. In the Brusselator (figure 4), reagents A and B are fed into a reaction vessel. As the equations in the figure show, A is converted into X; B then reacts with X to produce Y and D. Then, X reacts with Y to give more X. This loop is the autocatalytic portion of the reaction. Finally, X is converted into E. The Brusselator can be tuned by varying the inputs of A and B and the diffusion coefficients of compounds X and Y. The diffusion coefficients are the rates at which intermediate compounds X and Y diffuse inside the reaction vessel. It can be shown mathematically that the variation of the concentrations of X and Y as a function of time and space are non-linear, that is, they are not linearly proportional to time and space, they fluctuate according to more complicated mathematical functions. When the Brusselator (or the Oregonator) is tuned, at a certain point, it generates structures amazingly similar to those seen in the BZ and other similar reactions (figure 5). Further, with a different tweaking of the parameters (concentrations and diffusion coefficients), a stable succession of peaks and valleys of concentration is observed (figure 6).

44

Figure 4. Reactions that compose the Brusselator.

Figure 5. Simulation of the BZ reaction using the Oregonator (Source: courtesy of Dr. Lingfa Yang, Brandeis University).

Figure 6. A wave-like pattern generated by the Brusselator. This diagram shows the distribution of compound X in space. Source: Wikipedia. 45

Finally, for cerain values of its parameters, the Brusselator assumes a chaotic behavior and identifiable structure is lost. Thus, the Brusselator is a good approximation of real oscillating reactions as it reproduces self-organization and chaotic behavior observed in the laboratory. How does one explain the different states acquired by the Brusselator and by extension those acquired by actual autocatalytic oscillating reactions? The answer is bifurcations, as already mentioned above. It can be demonstrated that dissipative systems operating far from thermodynamic equilibrium, like the BZ reaction and the Brusselator, are quite sensitive to small changes in their parameters. Figure 7 shows that a Brusselator operating at a given concentration of input compound B, for example, will occupy a single state until a particular threshold is reached (this is the straight line to the left of the figure). At the first concentration threshold, the system becomes unstable and can occupy two possible states, at the second threshold, four states, and at the third threshold, eight states. As higher thresholds are reached, many more states become possible and the system approaches a chaotic behavior where too many states exist for stable structure to form. These bifurcations of higher and higher order show that the system is non-linear.

FIGURE 7. Bifurcations experienced by a dissipative system far from thermodynamic equilibrium. The ordinate is arbitrary and represents the number of states assumed by the system. The abscissa represents the values given to one of the parameters of the system, for example, compound B in the Brusselator. Source: Wikipedia.

In conclusion, non-linear, autocatalytic reactions taking place far from equilibrium are capable, within certain boundaries, to generate order (structure) in both space and time. These dissipative structures can experience abrupt changes when external conditions are modified. “Swinging” a system in one direction or another is reminiscent of the evolution by natural seolecton of living systems. But do the Brusselator and oscillating reactions truly mimic life (and perhaps its evolution), and in particular, do they apply to prebiotic evolution?

Possible applications of dissipative structures to life and its origins Dissipative structures apply to conditions that may have existed in the prebiotic world. In truth, the theory of dissipative structures applied to living systems is still in its infancy. This is because living cells and organisms are enormously more complex than the reactions shown in fig. 4. So far, dissipative structures mimic reasonably well some models for gene regulation and simple cellular differentiation. To that effect, figure 8 shows the fluctuating levels of a certain type

46

of messenger RNA in a fruit fly (Drosophila melanogaster) embryo. The similarity between figs. 6. (the Brusselator in one of its modes) and fig. 8 is striking, suggesting that the theory of dissipative systems may be relevant to embryonic development. Nevertheless, a very large amount of work remains to be done in this area. But, what about the prebiotic broth and its evolution?

Figure 8. Wave-like pattern of messenger RNA in a developing fruit fly embryo. (Source: Dissecting the transcriptional control of body patterning), PLoS Biol (2004) 2(9): e319, courtesy of the Public Library of Science). There, in principle, chemical interactions must have been much simpler than in living systems, and approximate models like the Brusselator or its variants could be relevant. In fact, the following examples demonstrate that autocatalytic reactions could have brought about structure in the prebiotic world. First, we think that the RNA world could have been autocatalytic in the sense that RNA molecules could have catalyzed their own replication, cutting, and splicing through ribozyme activity. Further, other ribozyme activity could have been responsible for the binding, ordering, and linking of amino acids to form simple proteins. Finally, collections of different RNA molecules called hypercycles (recall topic #2) could have coordinated a number of ribozyme activities. In and of themselves, hypercycles would not necessarily have generated visible structures such as seen in the BZ reaction. Nonetheless, the concept of order or structure goes beyond merely visible phenomena. In fact, hypercycles also represent structure in that they integrate chemical reactions, many being autocatalytic ribozyme reactions. To my knowledge, nobody has yet applied the theory of dissipative structures to ribozyme hypercycles, but this could be a (challenging) project worth pursuing. For now, it is possible that the RNA world experienced many bifurcations due to changing chemical and physical conditions, each and every one of them potentially leading to successful or unsuccessful evolutionary steps.

Figure 9. Electron micrograph of fatty acid vesicles. The special technique used here shows the imprints of the vesicles in a platinum-carbon substrate (Source: Autocatalytic self-replicating micelles as models for prebiotic structures, P. A. Bachmann, P. L. Luisi and J. Lang, Nature (1992) 357:57-59, with permission)

In order to evolve in the direction of cellular life, the RNA world must have become encapsulated inside microscopic vesicles that would have mimicked simple cell membranes. This is another example of structure formation because molecules organized into membranes are more ordered than the same molecules randomly dispersed in the environment. Today, some types of fatty acids are major components of all cell membranes. How could primitive membranes have formed in the prebiotic RNA world? First, let us assume that a variety of fatty acids existed on prebiotic Earth. It has been demonstrated in the laboratory under plausi-

47

ble prebiotic conditions that fatty acids containing from 8 to 14 carbon atoms can spontaneously form vesicles in an autocatalytic fashion. An electron micrograph of such vesicles is shown in figure 9.

Figure 10. Self-assembly and growth of fatty acid vesicles. A. Self-assembly without (lower line) and with clay particles (upper line), B. Growth of self-assembled vesicles over time, C. Experiment showing that all vesicles and not just some of them experience growth. The left peak shows the vesicle size distribution at the beginning of the experiment and the right peak shows their size distribution several hours later (Source: adapted from Experimental models of primitive cellular compartments: Encapsulation, growth, and division, M. M. Hanczyc, S. M. Fujikawa and J. W. Szostak, Science (2003) 302:618-621).

Thus, assuming that RNA hypercycles coexisted side by side with vesicle-forming fatty acids, one can see how these hypercycles could have become bounded within these vesicles to form the first concentrated, integrated, genetic networks. The complexity of the RNA world could have been further increased by the interactions of prebiotic RNA molecules with clays such as montmorillonite. Not only that, montmorillonite has recently been shown to accelerate considerably the autocatatlytic formation of fatty acid vesicles and promote their growth. Figure 10 illustrates this phenomenon. Spontaneous division of these growing vesicles has not yet been demonstrated, although it has been observed when suspensions of vesicles are forced through small-pore filters. Moreover, it has also been shown that microscopic grains of montmorillonite associated with short RNA molecules can be trapped inside fatty acid vesicles (see topic #2). One can thus imagine that RNA synthesis through ribozyme activity could have taken place inside such vesicles. In summary, complex structures such as RNA hypercycles contained in microscopic vesicles could form spontaneously, in an autocatalytic manner. It remains to be seen whether these vesicles can effect some kind of primitive genetic function (such as the synthesis of short proteins) when they are formed in the presence of ribozymes able to bind amino acids and other ribozymes able to link these amino acids together.

48

SPONTANEOUS CHEMICAL REACTIONS AND THE CONCEPT OF FREE ENERGY Free energy flowing through chemical systems powers spontaneous reactions. We saw earlier that autocatalysis and temperature gradients can drive the formation of dissipative structures that generate order from disorder. In the prebiotic world, temperature gradients must have existed near hydrothermal vents, although it is not known whether ribozyme and fatty acid vesicle synthesis are possible under these conditions. We must also consider the possibility that spontaneous generation of structure could have taken place in an environment where the temperature was largely constant. Therefore, it is important to understand, where, in a general manner, the energy needed to allow spontaneous formation of structure is coming from. Indeed, there is no “free lunch” in nature, meaning that spontaneous reactions must still be powered by a source of energy. This is where the science of thermodynamics comes to the rescue with the concept of “free energy.” In thermodynamics, concepts such as entropy, energy, and several others are interrelated in a series of equations. One thermodynamic function not yet seen here is F, free energy. This function is also designated by the letter G. Free energy can be defined by the relation δE = δF + TδS + SδT where E is the total energy of a system, T is absolute temperature and S is entropy. To put it briefly, F, free energy is energy capable of doing work. It is widely believed that the total energy of the universe is quite possibly equal to zero, with all the total matter-energy (as per Einstein’s E = mc2) balanced by the expansion of the universe (which can be construed as negative energy). Assuming that this is the case, then, δE = 0 (even if the energy of the universe is not 0, δE is still zero, because the universe cannot receive from nor lose energy to any outside system). In addition, assuming a phenomenon taking place at constant temperature, we also have δT = 0. The above equation can then be simplified to δF = - TδS It can be shown that for spontaneous, irreversible chemical reactions, δF is a negative number, meaning that for these reactions to take place, the entropy (S) of the universe must increase, as already alluded to earlier in this chapter. But for entropy to decrease locally, as in order-generating systems such as dissipative structures, F, the free energy must increase concomitantly. What can possibly be the ultimate source of free energy?

THE EXPANDING UNIVERSE AS CREATOR OF FREE ENERGY AND ORDER The ultimate source of free energy is the expansion of the universe and the decoupling of matter and radiation. That the universe expands, stretching space in the process, has been known for decades. Universal expansion was predicted by Einstein and his General Relativity Theory, and has been observationally confirmed countless times. Expansion started at the moment of what is known as the “Big Bang,” the birth of the universe. According to cosmological theory, in the early expanding universe, matter and radiation were in thermodynamic equilibrium and the entropy of the universe was thus maximized. At this stage, the universe could not have acquired or produced any structure. Further, expansion alone could not have changed that situation because the quantity of heat of the universe remained (and still remains) constant. Indeed, the universe being all there is, it cannot receive or donate heat to anything that is not itself. Thus, recalling that δS = δQ/T, with δQ = 0, δS = 0. In other words, the entropy of the early universe could not change. Under these conditions, no free energy could be produced because with δS = 0, δF is also 0. But conditions changed drastically about 400,000 years after the Big Bang, when matter and radiation decoupled. At that time, and at a temperature of about 3,000 K, the universe became transparent because photons and matter stopped being thermally equilibrated. This is because the universe had become cold enough to allow the formation of atomic matter, meaning that matter no longer existed in the form of a plasma in which electrically charged subatomic particles (protons and electrons) move freely. In other words, after 400,000 years, matter no longer interacted strongly with radiation. What is more, since that time on, it has been shown that the temperatures of matter and radiation have not decreased at the same rate. This means that matter/radiation decoupling introduced a temperature gradient, a condition of non-equilibrium, in the whole universe. It also follows from the existence of a thermal gradient between matter and energy that the entropy of the universe does not increase as fast as it could, a phenome49

non ultimately responsible for the creation of free energy. Thus, free energy, the ingredient necessary to form structure, is a direct consequence of the expansion of the universe. In fact, the release of free energy in an expanding universe can be seen as the ultimate “force” that made the formation of stars, galaxies, planets, and life possible. This view of structure formation in the universe is well described in astrophysicist Eric Chaisson’s book Cosmic Evolution (2001). In short, the appearance of order at all levels, cosmic evolution, directly derives from the Big Bang. Fascinatingly, this concept unifies cosmology, physics, and biology in a sort of grand synthesis. Next, let us see whether—and perhaps how--humans will one day be able to reach what one could call the Holy Grail of structure formation: the synthesis of life.

SYNTHETIC LIFE Researchers are attempting to create life in the laboratory. Two main avenues of research are currently being explored. Humans have long been titillated by the possibility of creating life forms from non-living components. From ancient alchemists, to Mary Shelley’s Victor Frankenstein, to contemporary molecular geneticists, humans have fantasized about creating life in the laboratory. Today, we are probably not too far from achieving this goal. In 2002, researchers at the State University of New York-Stony Brook reported that they had recreated the RNA-containing poliovirus (which causes poliomyelitis) from commercially available compounds, such as polymerases, DNA, and RNA building blocks, which are routinely manufactured by companies that sell fine reagents to research laboratories. This synthetic poliovirus was indistinguishable from its natural counterpart, including in its ability to infect animals. To recreate the virus, researchers used the published sequence of the viral genome to synthesize a DNA copy of this RNA genome by linking together short stretches of DNA of appropriate sequences. This DNA molecule was then transcribed in vitro into RNA copies which were subsequently introduced into mouse cells. These RNA molecules were fully functional as they multiplied in the host cells and directed the production of new viral particles. Of course, the poliovirus genome is short (7,500 nucleotides), which is one of the reasons why it was chosen for reconstruction. Nevertheless, it took two years of work to build this short genome just a few years ago. In 2006, other researchers reported they had achieved the same feat in a matter of days, thanks to technical improvements which allow the much more rapid synthesis of long stretches of DNA. Therefore, it is now realistic to contemplate the in vitro synthesis of short bacterial genomes. For the sake of simplicity, let us imagine that 500 genes are sufficient to provide a variety of basic functions that characterize life. Using an average length of 1,000 base pairs per gene, a common value for bacterial genes, an organism with a chromosome containing just 500,000 base pairs of DNA could possibly be alive in the sense that it could perform basic metabolism and cell division. The size of such a chromosome would only be about 67 times bigger than the size of the synthetic poliovirus genome made in 2002. Also, since we know the sequences of tens of thousands of bacterial genes, it may no longer be an impossible task to decide which essential genes to synthesize and link together to form an artificial bacterial chromosome. This feat, and more, was achieved in 2010 by a team under the leadership of Craig Venter and Hamilton Smith, Nobel laureate. Venter’s lab was already well known for having sequenced the human genome a few years earlier. In order to generate a lifeform equipped with a synthetic chromosome, the researchers first sequenced the genome of a free-living bacterium called Mycoplasma mycoides. Next, they recreated that same sequence, starting from DNA building blocks (deoxynucleotides), by assembling in the test tube 1,078 chemically synthesized DNA fragments in order to recreate a single genome consisting of 1,007,947 base pairs. To make this genome unique, they inserted four “watermarks”, short DNA sequences not found in natural M. mycoides’ genome. Finally, they transplanted this synthetic genome into closely related Mycoplasma capricolum cells permeabilized with polyethylene glycol. In some cases, the genome of recipient M. capricolum cells was eliminated and replaced by the synthetic M. mycoides genome. Cells with the transplanted genome continued to divide normally, produced proteins coded for by M. mycoides DNA, and exhibited the four “watermarks” deliberately inserted into the donor genome. Therefore, this team turned one bacterial species into another by achieving a full DNA transplant. As we saw, the synthetic DNA was transplanted into living cells equipped with all the necessary hardware to replicate the introduced chromosome and express its genes. Therefore, it cannot be said—as some have reported—that artificial life had been created in the laboratory, even though converting a bacterial species into another one is quite a remarkable feat and is a major step in the right direction.

50

Are there other ways of finding a proper envelope to package an artificial chromosome and observe its functioning? Rather than trying to build both an artificial chromosome and an artificial cell to contain it, an approach-closely related to the M. mycoides example-would be to take advantage of natural DNA-less bacterial cells as “packaging” material. Several mutant strains of the bacterium Escherichia coli are unable to divide normally and produce two identical DNA-containing daughter cells. Rather, these mutants produce one normal daughter cell and one small one, called a minicell, which contains no DNA. Minicells are of course unable to divide further, but they contain all the enzymatic machinery needed to perform transcription and translation such as RNA polymerase, ribosomes, transfer RNAs, as well as a host of other necessary factors. A first step in the creation of life in the laboratory would be to introduce an artificial bacterial chromosome into minicells and see whether this chromosome can replicate and use its genes to direct the synthesis of specific proteins and other cellular components. Eventually, one would like to see these minicells divide, in which case an entirely new life form would have been created in the laboratory. These new life forms could be called “Frankenbacteria” or “Bactofrankensteins” and it goes without saying that at first, strict containment measures will have to be taken to prevent the accidental release of these creatures into the environment. This sort of approach to the synthesis of life is sometimes referred to as “top-down” because it consists in trying to determine which minimal set of genes is necessary to confer the characteristics of life upon minimal cells. In a top-down approach, the DNA of the synthetic cells is human-made, but the hosts for this DNA—such as minicells, for example—are not. Another approach, called “bottom-up,” is more ambitious. Here, the goal is to produce entirely artificial cells, perhaps as they appeared a long time ago in the RNA world, before being supplanted by a population of DNA-based last-universal-commonancestors (LUCAs), the direct ancestors of modern prokaryotes. Recalling what we saw earlier, a bottom-up approach could start with liposomes (another name for lipid vesicles) encapsulating several ingredients, such as genes and the machinery needed to express them. Important steps in this direction have already been taken. The results described below were obtained with DNA genes, not RNA genes, as the aim of the authors was not to try to replicate what may have happened in the RNA world. Rather, they were interested in testing the functioning of genes encapsulated within artificial membranes. It has been known for years that DNA-free cell extracts called coupled transcription-translation cell-free systems are able to synthesize specific proteins in vitro when specific genes are added to them. These systems are very inefficient, however, and stop working after just a few hours. Researchers came up with the idea of trapping a transcription-translation system from E. coli inside liposomes made from a phospholipid call lecithin. This type of phospholipid is found at high concentration in egg yolk. The cellfree E. coli transcription-translation system contained ribosomes, tRNAs, translation factors, RNA polymerase, as well as all 20 amino acids to make proteins, and the four nucleotides ATP, GTP, UTP and CTP to make RNA. The DNA genes added to the transcription-translation system were either a squid gene that codes for a green fluorescent protein or a firefly gene that codes for the protein enzyme luciferase. These proteins are easily detectable because both emit visible light that can be recorded. Results showed gene expression in these liposome-encapsulated transcription-translation systems, an excellent indication that confinement of the system inside a small volume (ranging from about 0.5 μ3 to about 14,000 μ3, which covers more or less the volume range of prokaryotic and eukaryotic cells) did not interfere with its functioning. These liposomes made the gene products for a period of only five hours, however, a performance hardly better than that of non-encapsulated systems. The researchers reasoned that the encapsulated systems stopped functioning so soon because the pools of amino acids and nucleotides inside the liposomes had been exhausted. However, refreshing these pools was not possible by simply adding more amino acids and nucleotides to the incubation mixture containing the liposomes. This is because these liposomes, made of lecithin, a mixture of phospholipids composed of molecules containing 16 and 18 carbon atoms, are highly impermeable to water-dissolved molecules such as amino acids and nucleotides. The challenge was then to make the liposomes permeable. For this, the researchers took advantage of a protein from the bacterium Staphylococcus aureus called α-hemolysin. Alpha-hemolysin can become incorporated inside phospholipids bilayers (double lipid layers that make up liposomes) and by doing so, it creates small holes, pores, that allow the passage of small molecules dissolved in water. Results showed that when the encapsulated transcription-translation systems containing both the green fluorescent protein gene and the α-hemolysin gene, were supplied with an outside source of amino acids and nucleotides, they sustained gene expression for at least four days. In other words, permeabilized liposomes functioned 20 times longer than non-permeabilized liposomes. Thus, even though these liposomes could not divide, they imitated “bare-bones” cells containing only two genes, and whose membranes were made of simple phospholipids. The next challenge will be to see whether the genetic complexity of these encapsulated systems can be increased by adding more genes and eventually, to see whether these systems can be made to imitate cell

51

division. It will take time to explore a variety of systems, but the experiments described above show that total synthesis of life in the laboratory is not an impossible dream.

CONCLUDING REMARKS The topic discussed above relies mostly on theoretical considerations involving thermodynamic and cosmological arguments to explain the origin of structure, including living structure. Whenever I present the reasoning developed here to my students—who are in part non-science majors—I can detect at first puzzlement, then surprise, and finally deep skepticism (sometimes even hostility). Most science majors are usually unfazed because they have had to suffer through at least elementary thermodynamics, including entropy and free energy. For them, that the Big Bang and its aftermath continue to generate free energy, that dissipative structures can form because they release entropy into the universe, and that autocatalytic reactions exist are not such big deals. They have been exposed to the jargon. But I think that knowing the tricks of the trade does not explain all about these different reactions to the topic. In my opinion, people with a non-scientific background often do not quite understand that science is much more than a collection of so-called “facts” eventually categorized through inductive reasoning. Restricting oneself to this attitude makes it easy to reject anything—including dissipative structures-- that may be perceived as being “just another theory,” in spite of the fact that thermodynamics is commonly used in very practical ways by chemists and engineers. In reality, the type of “grand thinking” explained above can be very fruitful—even if partially wrong—because it is innovative, multidisciplinary, and off the beaten track (far from equilibrium!). On the other hand, we must recognize that at this point in time, not much laboratory work has been done to determine whether dissipative structures are truly heuristic in our efforts to solve the puzzle of the origins of life. The observation of the autocatalytic formation of fatty acids vesicles in water may be a first step in that direction, however. Further, we saw that several research groups are actively seeking to create life in the laboratory, inspired in part by events that may have taken place in the RNA world.

CONTROVERSIES Are theoretical studies and computer simulations relevant to the origin of life? Much of the material presented above is highly theoretical. At this point in time it is not yet clear exactly how the concepts of dissipative structures, autocatalytic reactions, and free energy can be integrated to provide an experimental framework pertinent to the origins of life. In his book The End of Science (1996) science journalist John Horgan argues that theoretical studies involving computer simulations, such as those conducted by evolutionary biologist Stuart Kauffman at the Santa Fe Institute, and the late physical chemist Ilya Prigogine at the Universities of Texas and Brussels, are too abstract to be of much relevance to the origins of life and structure. However, in spite of Horgan’s gloomy 1996 prediction, science has not yet ended quite a few years after his book came out. On the other hand, computer models such as the Brusselator, the Oregonator, and Stuart Kauffman’s have not yet produced scientific breakthroughs regarding the origin of simple life and its evolution. One can always hope that a future combination of theory and laboratory work, preferably involving multidisciplinary research teams, will lead to the building of credible models for the spontaneous appearance of the first cells.

FURTHER RESEARCH The study of self-organizing lipid vesicles continues to produce some surprising results. An excellent example of where future research can lead is provided by graduate student Irene Chen’s paper that won the 2006 Grand Prize of the General Electric scientific essay contest. Her work was performed in the laboratory of Jack Szostak, Nobel laureate, at Harvard University. In this paper, Irene Chen suggests a possible mechanism for the growth of prebiotic lipid vesicles containing RNA. Her observations indicate that RNA-filled vesicles experience osmotic stress (high internal pressure) whereas empty vesicles do not. As a result, RNA-containing vesicles are able to “steal” lipid molecules from empty vesicles and thus experi-

52

ence growth at the expense of the latter. What is more, incorporation of the “stolen” lipid molecules results in the acidification of the interior of the RNA-filled vesicles. This means that these vesicles experience an acidity (pH) gradient across their membranes, a common occurrence in living cells, which use pH gradients to store energy and power some metabolic steps such as the uptake of small molecules. Thus, the simple chemical and physical properties of artificial lipid vesicles can mimic to some extent the properties of living cells. Furthermore, vesicles containing vigorously replicating RNA are expected to experience increasing osmotic stress, allowing them to grow at the expense of vesicles containing RNA molecules replicating more slowly or not at all. This mechanism would be equivalent to Darwinian evolution where fit vesicles (the ones containing actively replicating RNA) overtake less fit vesicles.





DISCUSSION QUESTIONS 1. Discuss the concept of entropy. Under what conditions can entropy decrease locally? 2. Can structure appear in systems at thermodynamic equilibrium? Why or why not? 3. What are dissipative structures (or systems)? What are bifurcations? 4. Computer programs can be used to study autocatalytic reactions far from equilibrium. What do these programs tell us? 5. How can one make artificial cells? 6. What is the putative role of fatty acids in the prebiotic world? 7. What function could clay have exercised in the prebiotic world? 8. What is the origin of free energy in the universe? How was (and still is) free energy “used” in the universe? 9. Discuss the top-down and the bottom-up approaches to create life in the lab. 10. What genes were used in the first life creation bottom-up experiments? What did they code for?

REFERENCES AND WEBSITES Bachmann, P. A., P. L. Luisi and J. Lang. 1992. Autocatalytic self-replicating micelles as models for prebiotic structure. Nature 357: 57-59. Black, R.A., Blosser, M.C., Stottrup, B.L., Tavakley, R., D.W. Deamer and S.L. Keller. Nucleobases bind to and stabilize aggregates of a prebiotic amphiphile, providing a viable mechanism for the emergence of protocells. 2013. Proc. Natl. Acad. Sci. USA doi:10.1073/pnas.1300963110/-/DCSupplemental. Chaisson, E. J. 2001. Cosmic Evolution: The Rise of Complexity in Nature. Cambridge, MA: Harvard University Press. Chen, I. A. 2006. The emergence of cells during the origin of life. Science 314:1558-1559. Church, G. and E. Regis. 2012. Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves. New York, NY: Basic Books. Forster, A. C. and G. M. Church. 2006. Towards synthesis of a minimal cell. Molecular Systems Biology 2. Article number: 45. Published online: 22 August 2006. doi: 10.1038/msb4100090. Gibson, D. G. and 23 others. 2010. Creation of a bacterial cell controlled by a chemically synthesized genome. Science 329:52-56. Horgan, J. 1996. The End of Science: Facing the Limits of Knowledge in the Twilight of the Scientific Age. Reading, MA: Addison-Wesley Publishing Company, Inc. Kauffman, S. 1995. At Home in the Universe. New York, NY: Oxford University Press.

53

Lurquin, P.F. and K. Athanasiou. 2000. Electric field-mediated DNA encapsulation into large liposomes. Biochem. Biophys. Res. Comm. 267:838-841.

The Web page below is a good theoretical and practical introduction to oscillating chemical reactions. This is the English translation of a site intended for French high school students. http://www.faidherbe.org/site/cours/dupuis/ascil.htm

The following three Web pages describe the Brusselator. They are for the mathematically savvy. http://math.ohio-state.edu/~ault/Papers/BRusselator.pdf http://mathworld.wolfram.com/BrusselatorEquations.html http://cmp.caltech.edu/~mcc/STChaos/Brusselator.html

54

Ilya Prigogine (1917-2003)

Source: Nobel Prize website. In spite of his first and last names—which are neither French nor Dutch, the two main languages of Belgium--Ilya Prigogine is a Belgian theoretical chemist and physicist. He was born in Moscow, Russia just a few months before the Bolshevik revolution. In 1921 his family, consisting of his parents, his older brother and himself, left Russia for Germany, where they would stay until 1927. They finally moved to Belgium in 1929. Twenty years later, Prigogine became a Belgian citizen. His interests as a teenager were history, philosophy, archaeology and classical piano, but not science, at which he would excel later in life. Despite his classical education, Prigogine enrolled in the chemistry curriculum at the University of Brussels, graduating with a PhD in 1941, just before the occupying German authorities closed down the university. After World War II, Prigogine became a theoretical chemistry and physics professor at his alma mater where he taught, among others, a course in quantum mechanics. Later, he also became a professor at the University of Texas, spending his time approximately equally between Brussels and Austin. Prigogine is best known for developing the thermodynamics of non-linear systems far from equilibrium. It is for this work that he received the 1977 Nobel Prize in Chemistry. Prior to his contributions (and those of his many associates and students), thermodynamics dealt almost exclusively with reversible systems (such as chemical reactions) at or near equilibrium. Prigogine felt—probably owing to his earlier studying of French philosopher Henri Bergson’s interpretation of time and change—that many natural phenomena are irreversible and never reach equilibrium. This is certainly the case for life and many metabolic reactions. From then on, he introduced the idea of irreversibility in thermodynamics and its second law, statistical mechanics (which also applies to thermodynamics), quantum mechanics, and cosmology. One consequence of this work was the discovery of dissipative structures (first mentioned by him in 1967) and their evolution through bifurcations and fluctuations. This is how the Brusselator originated. Prigogine always thought that the dichotomy between the humanities and the sciences should be removed, as, in his words, “Time [including in its philosophical meaning of duration for us humans] and complexity are concepts that present intrinsic mutual relations.1” Prigo, as we called him, was also an outstanding teacher, demanding, but not outlandishly so, as per PFL’s own experience. Unfortunately, his imperious manners—never displayed in his immediate academic entourage where he was loved and respected—irritated, put down, and turned off many.

1. http://nobelprize.org/nobel_prizes/chemistry/laureates/1977/prigogine-autobio.html

55

CHAPTER 4

The Prokaryotic World and first eukaryotes If the genetic code is universal, it is probably because every organism that has succeeded in living up till now is descended from one single ancestor. But, it is impossible to measure the probability of an event that occurred only once. François Jacob, The Logic of Life (1973)

From RNA GENES TO DNA GENES Later in evolution RNA genomes were converted into DNA genomes, and the DNA world we know today appeared. DNA genomes, since they took over almost all life-forms (except some viruses), must have had evolutionary advantages. What were they? The chemistry of DNA shows that it has at least three advantages over RNA as far as genomes are concerned. First, DNA is more stable than RNA in the cell’s environment, because thymine and deoxyribose are more chemically stable than their RNA counterparts, uracil and ribose. Natural selection would, of course, have favored cells whose genomes had become more stable, more long lasting. Second, it is known that ribozyme activity is the result of the presence of an extra OH group in ribose, the sugar found in RNA. Since DNA contains deoxyribose and thus does not have that extra OH group, it would have been unable to engage in the many ribozyme activities we have studied. This was a definite advantage once long genes were created. Indeed, ribozyme activity, important as it was at the very beginning of life, always runs the risk of cutting large RNA molecules into smaller pieces. This would have destroyed long RNA genes. This problem was thus solved with the appearance of DNA genes. Finally, the creation of DNA completely separated gene replication from translation. In the RNA world, ribozyme activity accomplished both at the same time, meaning that a serious error at one level would have created serious errors at all levels. This is avoided in the DNA world. But then, how is RNA converted into DNA? This must have happened in the distant past through the creation of new genes and the mutation of preexisting ones. Remember that in modern cells, the building blocks of DNA are first synthesized as RNA building blocks. Thus deoxyribose is made from ribose through the action of an enzyme that eliminates the extra OH group from ribose. This enzyme is called a ribonucleotide reductase. Furthermore, the DNA base thymine is produced by the addition of a methyl (CH3) group to the RNA base uracil through the action of a methylase enzyme. Then, the deoxyribonucleotides dATP, dGTP, dCTP, and dTTP (the last being deoxythymidine triphosphate) must have been used to synthesize DNA by copying the original RNA gene templates. This function is accomplished today by enzymes called reverse transcriptases. HIV, the virus that causes acquired immunodeficiency syndrome (AIDS), for example, has an RNA genome that is converted into a DNA copy by the virus’s own reverse transcriptase after infection of the host cell. Other viruses, inclusing some plant viruses also contain a reverse transcriptase gene. Reverse transcriptases may be mutant forms of earlier RNA replicases. Enzymes that today replicate DNA (DNA polymerases) and transcribe it into mRNAs (RNA polymerases) may also be mutant forms of older RNA replicases or reverse transcriptases. The mechanisms leading from the RNA world to the DNA world are not well understood, however, and several hypotheses have been proposed to explain this shift. Modern lifeforms are separated into three domains: the Bacteria, the Archaea, and the Eukarya. Bacteria and Archaea are all microscopic organisms that differ in a number of physiological and molecular properties (see below). Eukarya range from unicellular 56

organisms like yeast and molds to mammals and oak trees. Patrick Forterre, a French geneticist, asked himself recently why there are three domains of life, and how they could have originated from the RNA world. Forterre notes that much of what we know about phylogenies (trees of descent) is based on the translation apparatus (notably tRNAs and ribosomes) of living cells. For example, archaeal genes for ribosomal proteins can easily be distinguished from their bacterial and eukaryal counterparts, separating living organisms into three neat categories. However, Forterre points out that things get more confusing when the DNA replication machineries of the three domains are compared. There, archaeal enzymes such as DNA polymerases and helicases (enzymes that separate the two strands of the DNA double helix in preparation for replication) are closely related to their eukaryal partners but are very different from their bacterial counterparts. In addition, bacterial DNA replication enzymes are similar to enzymes found in viruses. How is this possible? Forterre’s hypothesis is that the cellular RNA world was infected by DNA-containing viruses, forcing it to convert to a cellular DNA world. For now, let us restrict ourselves to Bacteria and Archaea and their early evolution. The virus hypothesis holds that viruses already existed in the RNA world. What is more, some of these viruses, according to the hypothesis, had innovated the synthesis of DNA using RNA templates, thanks to reverse transcription and conversion of RNA precursors into DNA precursors. This would have conferred a great selective advantage to DNA-containing viruses because the RNA world would have been defenseless against a new nucleic acid—viral DNA—that ribo-organisms would not have “recognized” and potentially annihilated. Some of these DNA viruses would then have established themselves inside their host ribo-organisms, the RNA genomes of which they would have gradually converted into DNA. Once conversion was completed, the DNA world would have outcompeted the RNA world thanks to the selective advantages of DNA over RNA (see above). In Forterre’s scenario, one particular type of DNA virus would have infected ribo-organisms to lead to the evolution of Bacteria, while another type of DNA virus would have infected other riboorganisms to lead to the evolution of the Archaea. Indeed, this would explain why Bacteria and Archaea have very different DNA replication enzymes while possessing similar genes involved in translation: the latter genes were acquired from the RNA world but the former would have originated from DNA viruses. As we will see later, the virus hypothesis also attempts to explain the origin of eukaryotes. For now, let us assume that indeed, there was such a thing as a DNA-based progenote, regardless of its origin and let us study its fate.

FIRST DNA-CONTAINING CELLS AND THEIR EVOLUTION The progenote, a primeval population of DNA-based microorganisms, must have been able to live in the total absence of oxygen. Its metabolic capabilities are not well understood. Descendants of the progenote probably developed non-oxygenic photosynthesis, followed by oxygenic photosynthesis and then aerobic respiration. Microscopic fossils and their chemical footprints are not particularly rare. Fossil forms of cyanobacteria, unicellular bacteria able to perform photosynthesis, have been found in rocks dating back 3.5 billion years. This date has been contested recently, however, and cyanobacteria may be slightly more recent than formerly thought. Evidence for the existence of organic molecules that could, in all likelihood, have been synthesized only by living cells has been found even in slightly older rocks. Cyanobacteria still exist today and can form, in association with other organisms, domed, layered structures called stromatolites, which are found on the coast of Western Australia, among other places. Notably, fossil stromatolites (figure 1) have been discovered in Ontario, Canada, and elsewhere, and they contain microscopic structures that look like modern cyanobacteria, at least as far as shape and size are concerned. Organic compounds that resemble chlorophyll degradation products have also been found in these rock formations, as well as low levels of carbon 13, a sign that photosynthetic life-forms proliferated there a long time ago. The element carbon exists in the form of three isotopes, carbon 12, 13, and 14. All three isotopes contain six protons in their nucleus, but carbon 12 has six neutrons, carbon 13 has seven, and carbon 14 has eight. Carbon 14 is radioactive, but its rate of decay is rather rapid, so that carbon 14 can be used only to date materials that are a few thousand years old, not billions of years old. Carbon 13 is stable and represents about 1 percent of the total carbon found as CO2 in Earth’s atmosphere, whereas carbon 12 represents 99 percent. The process of photosynthesis favors the utilization of carbon 12 over that of carbon 13 in the form of CO2. This means that photosynthetic organisms contain low levels of carbon 13, and this was exactly what was found in the rocks where cyanobacteria-like fossils were discovered. The evidence that the 3.5-billion-year-old microscopic fossils were once living cyanobacteria is thus quite good.

57

However, cyanobacteria are sophisticated organisms; they contain chlorophyll, and their modern version can fix atmospheric nitrogen. It is not very likely, therefore, that cyanobacteria were the first DNA-containing cells. They probably evolved from a more rudimentary type of cell, called by some a progenote and by others the last common ancestor, that would have been the mother of all life-forms based on DNA. This progenote is thus placed at the very root of the global phylogenetic tree (the tree of life) and was itself derived from RNA-containing ribo-organisms.

Figure 1. Fossil stromatolite. (Source: http://www.ucmp.berkeley.edu/bacteria/cyanofr.html) What would the progenote have looked like? So far, we can only speculate. Scientists agree that it must have been an anaerobic organism, since there was still no oxygen in the atmosphere. In terms of metabolism, the progenote was probably not relying on complex nutrients (they did not exist yet, either), was not photosynthetic, and may have been a sulfur metabolizer or a methanogen, or both. Sulfur metabolizers convert elemental sulfur into hydrogen sulfide, whereas methanogens react carbon dioxide with hydrogen to form methane. Both reactions are reductions and could have been coupled with an electron transfer system to generate ATP. Several modern prokaryotic species occupy niches that are rich in sulfur, methane, and hydrogen and that are also characterized by high temperature and acidity. Thus many scientists think that the progenote originated in a hot environment, such as the volcanic ponds and springs that exist in Yellowstone National Park. What is more, these hot, volcanic gas-containing niches are populated today by several species of prokaryotes, mostly belonging to the domain Archaea. Of course, not everybody agrees that DNA-based life had a hot origin. Recently, researchers conducted an extensive computer search of the G and C content of ribosomal RNAs from forty different organisms ranging from heat-loving bacteria to mammals. A 58

high G+C content indicates more stability at high temperature than a high A+U content in nucleic acids. Based on a phylogenetic analysis, they found that the putative progenote was not particularly rich in G + C bases and hence may not have been a heatloving species after all. What does this controversy over the hot origin of DNA-based life mean? First, it means that a clashing of ideas is very much a part of the scientific process. Second, it means that we really do not yet know what the progenote truly was, and third, it may also mean that some scientists are swayed by models that seem to be in line with conventional thinking. It is a fact that biology does not really have a good theoretical background against which claims and even experimental observations can be verified or rejected. A hot origin for the appearance of DNA-based life (and perhaps the RNA world as well) is consistent with the concept that Earth was a hot, volcanic, meteorite-bombarded place right before or during the emergence of life. This is conventional, mainstream thinking. But we are not entirely sure that Earth was a hot place at the dawn of life. In fact, based on calculations describing stellar evolution, some scientists think that the young Sun, about 4 billion years ago, delivered only 75 percent of the thermal radiation it is delivering today. This “cold Sun” scenario would have been tempered by a strong, Venus-like but not quite so pronounced, greenhouse effect that could have made Earth hot. Some scientists dismiss this hypothesis and claim that life appeared on a glacial, “snowball” Earth. This, they say, would have much increased the stability of fragile organic molecules necessary for life to appear. What can be made of all these conflicting ideas? Not too much so far; the jury is still out. Another problem simply has to do with vocabulary. When prokaryotes were discovered in extreme environments, such as those displaying very high salinity, temperature, or acidity, they were dubbed archaebacteria (now Archaea), in contrast to regular bacteria or eubacteria (now Bacteria), which inhabit friendlier niches, at least in human terms. When it was discovered that Archaea significantly differ in their molecular composition from Bacteria, the concept that they were very primitive, archaic organisms (hence their name) living under harsh conditions took root. There is no evidence that Archaea are more archaic than other prokaryotes, but the name has stuck. In fact, it is now believed that Archaea may have given rise to eukaryotes, organisms such as bread mold, spiders, goldfish, and humans. This does not make them more or less primitive than Bacteria. One of the differences between Archaea, Bacteria, and Eukarya is the presence of D-amino acids (right-handed) in their cell wall. We know that all amino acids used for protein synthesis (even by Archaea) are of the left-handed, L type. The presence of Damino acids in Archaea is also seen as a primitive character because the primeval broth would have contained equal amounts of L and D types of amino acids. Some books go on to say that during evolution, more “advanced” Bacteria and Eukarya lost the ability to incorporate D-amino acids in their cell envelope. However, it is equally justified to say that Archaea gained this ability. And, by the way, some members of the Bacteria do use D-amino acids to a limited extent to synthesize specific compounds. These considerations simply emphasize our present ignorance of what were the progenote and its immediate descendants. One can always speculate, however. In the absence of oxygen, close descendants of the progenote must have had a metabolism of the anaerobic type. Anaerobic organisms still exist today, such as many Bacteria and Archaea, while among Eukarya, a well-known example is brewer’s yeast, used to make beer. These organisms use fermentation to produce energy in the form of ATP. Descendants of the progenote also may have been able to fix atmospheric N2 to produce ammonia, which is then converted into amino acids and nitrogenous bases. Nitrogen fixation uses the ATP formed during the fermentation process. In addition, they must have developed an electron transfer system based on porphyrin-containing compounds (figure 2). Let us remember that electron transfer chemical reactions are crucial in metabolism. They take place in oxidation/reduction reactions (where the term “oxidation does not necessarily mean that oxygen is involved; generally speaking, in chemistry, oxidation refers to the transfer of electrons from one chemical to another). Also remember from topic #1 that some authors have imagined that pre-metabolic electron transfer reactions involved the iron cycle. Without oxygen as a reducible substrate, the progenote may have “breathed” sulfur compounds instead to produce more ATP. Some modern prokaryotes do just that, and they use porphyrins as electron carriers. Fascinatingly, porphyrins are made in Miller-type gas-discharge experiments. Finally, the progenote and its offspring may have developed non-oxygenic photosynthesis (also based on porphyrins), in which CO2 is reduced into glucose, which is then fermented. Non-oxygenic photosynthesis exists today in certain microorganisms. It does not rely on the splitting of water and hence produces no oxygen. The electron donor here is not H2O; it can be H2S or even H2. A recently published phylogenetic analysis of genes involved in photosynthesis strongly supports the idea that non-oxygenic photosynthesis based on bacteriochlorophyll preceded photosynthesis based on chlorophyll. Bacteriochlorophyll is found in bacteria of

59

the green sulfur and non-sulfur types, whereas chlorophyll is found in cyanobacteria and plants. This, of course, raises the question as to why cyanobacteria and plants have the same type of chlorophyll. More about that later.

Figure 2. The general structure of porphyrins. In hemoglobin, myoglobin, and cytochromes (proteins that all bind oxygen), the porphyrin ring contains an iron ion at its center. In chlorophyll, this ion is magnesium. Then, the photosynthetic apparatus must have evolved in such a way that oxygenic photosynthesis became possible. This is the type of photosynthesis that cyanobacteria are capable of. The “invention” of oxygenic photosynthesis was to have dramatic consequences for the evolution of life on Earth, as we will see. The overall equation that describes oxygenic photosynthesis is 6 CO2 + 6 H2O + energy → C6H12O6 + 6 O2, where the energy is provided by solar photons, and C6H12O6 is glucose. Oxygen gas (O2) is simply released in the environment. Cyanobacteria thus created an “oxygen crisis” and contributed to what may have been the first mass extinction afflicting many species. Prior to the appearance of oxygenic photosynthesis, life had evolved in a completely anaerobic environment. It turns out that oxygen is a very reactive gas and is in fact quite toxic to obligate anaerobes. When cyanobacteria first started evolving oxygen, it must have reacted quickly with other gases, present in solution, such as ammonia, carbon monoxide, and hydrogen sulfide, to yield nitrogen, carbon dioxide, water, and sulfur dioxide. It would also have reacted with dissolved minerals such as reduced iron. This is why not all life was wiped out. Gradually, however, as oxygen became more plentiful, it started dissolving in water and eventually escaped into the atmosphere. This spelled doom for many species of Bacteria and Archaea, which then became extinct or were relegated to oxygen-poor areas such as lake or ocean bottoms. The species that survived in the presence of oxygen could have done so only by creating oxygen-detoxifying enzymes (such as catalases and peroxidases); those that had not accomplished this became confined to anoxic niches. Another consequence of oxygen production via photosynthesis was the progressive formation of the ozone layer. Ozone strongly absorbs the most mutagenic wavelengths of ultraviolet (UV) light, so the ozone layer would have slowed down evolution by gene mutation. However, concomitantly, the blocking of UV light enhanced the survivability of oxygen-tolerant cells living near the surface of water or on wet land. Finally, the last and perhaps most important consequence of the oxygen crisis was the appearance of aerobic respiration. Aerobic respiration uses oxygen gas as a metabolite to produce high levels of energy in the form of ATP. Aerobic respiration consists of a set of oxidation-reduction reactions involving specific cytochromes, all equipped with porphyrin rings. It is possible that these cytochromes evolved from those used in anaerobic respiration where metabolites other than oxygen are used as electron acceptors. In conclusion, the evolution of cyanobacteria from the ancestral progenote and its descendants led to the appearance of two major energy-producing mechanisms that an enormous majority of eukaryotes use today: oxygenic photosynthesis and aerobic res-

60

piration. It is estimated that oxygen in the atmosphere reached its present level about 2 billion years ago. Cyanobacteria are such successful organisms that the descendants of those that oxidized Earth’s atmosphere are still ubiquitous today.

PATHS OF FAST EVOLUTION The first prokaryotes may have evolved at a fast pace, thanks to horizontal (or lateral) gene transfer. The modern prokaryotic world is enormously diverse. We now know that evolution by gene mutation is but one cause of this diversity. Indeed, prokaryotes have also developed some kind of primitive sex and other methods that allow transfer of whole groups of genes from one cell to another. In all likelihood, prokaryotes living billions of years ago could do the same thing. The phenomena allowing exchange of DNA between bacterial cells are known collectively as horizontal gene transfer and comprise the mechanisms of conjugation, transduction, and transformation. In conjugation, two prokaryotic cells become united by a tubular bridge through which the chromosome of one of the mating partners is transferred into the other cell. In this process, thousands of genes can potentially be exchanged by two mating bacterial cells (figure 3). The blending of many different genes in a single cell can of course have important consequences for its evolution. The second mode of horizontal gene transfer involves DNA-containing bacteriophages as little gene carriers. When a bacteriophage infects a prokaryotic cell, its DNA is rapidly replicated many times. Concurrently, some of the bacteriophage genes are transcribed and translated to synthesize one or several coat proteins. These are destined to coat the newly formed bacteriophage DNA and so produce many new viral particles. As the new bacteriophage DNA is being packaged by the coat proteins, DNA from the prokaryotic host can become accidentally trapped into some of the new bacteriophage particles. Since the amount of DNA that can be contained in the bacteriophage coat is limited, the presence of host DNA prevents trapping of bacteriophage genes. Thus bacteriophage particles containing host DNA, but no bacteriophage DNA, are formed. When the particles are released by the dying host, those bacteriophages that contain bacterial DNA (but no bacteriophage genes) can still infect a new host bacterium, but they do not kill it. The result is that the new host acquires many genes from the previously infected (and now dead) host. These additional genes blend with the existing genes of the new host and thus modify its genome. Finally, bacterial cells die all the time in nature and, in the process, their membrane gets disrupted and their DNA is released into the environment. Some bacterial species have evolved the ability to pick up this released DNA and use it as their own to code for new functions. This phenomenon is known as transformation. Clearly, horizontal gene transfer between prokaryotes can lead to very rapid evolution and diversification, since it involves the transfer of many genes at a time. Now that the genomes of dozens of prokaryotic species have been fully sequenced, it is becoming clear that such sharing of genes was quite common in the past. Prokaryotes had plenty of time to diversify; they ruled the world alone between 3.5 billion and 2 billion years ago.

FIRST EUKARYOTES Several hypotheses have been formulated to explain the appearance of eukaryotic cells. Three hypotheses are discussed in the next two sections: the endosymbiont hypothesis, the hydrogen hypothesis, and the viral hypothesis. A major transition in the living world took place about 2 billion years ago: the appearance of the first eukaryotic cells. The 2.1-billion-year-old fossilized remains of a photosynthetic alga (containing chloroplasts) were discovered not too long ago in Michigan and constitute the oldest known eukaryotic fossil. As we know, eukaryotic cells are very significantly different from prokaryotic cells. One of the major differences is the existence in eukaryotes of a complex cytomembrane system that includes the nuclear membrane, the cytoskeleton, and organelles such as mitochondria and chloroplasts. The cytomembrane system allows animal cells to perform endocytosis, the process by which the cell membrane creates invaginations (little pockets) in which extracellular material, including nutrients, can become trapped. These invaginations are then sealed off inside the cell and the vesicles so created fuse with

61

preexisting cellular bodies such as lysosomes. Lysosomes contain an array of enzymes that digest whatever gets trapped in the cell membrane’s invaginations. Another important difference between prokaryotes and eukaryotes is the way in which the eukaryotic genome is organized. Bacterial chromosomes are circular, meaning that their DNA is a double helix closed on itself. By contrast, eukaryotic chromosomes are linear, and their DNA is associated with many proteins that have no counterparts in most prokaryotes. These proteins are organized together with the DNA in the form of nucleosomes (figure 4). Such an organization is not found in prokaryotes. Furthermore, the coding sequence of eukaryotic genes is usually interrupted by noncoding DNA sequences called introns (figure 5).

Figure 3. Two bacterial cells united by a conjugation tube through which DNA from the donor cell (the “hairy” one) is transferred to the recipient cell. (Source: http://www.environmentalgraffiti.com/biology/ news-culture-exchange-world-bacteria)

Figure 4.Organization of eukaryotic DNA in a nucleosome. DNA is represented by the coiled red cylinder. DNA is wrapped around a core of eight histone proteins, two copies each of histones H2A, H2B, H3, and H4. Histone H1 (in green) stabilizes the DNA coil. (Source: http://www.accessexcellence.org/RC/VL/GG/nucleosome.php.)

62

Figure 5.Introns in eukaryotic genes. A eukaryotic DNA gene contains coding (exons) and noncoding (introns) base sequences. Both exons and introns are copied into an mRNA molecule (preRNA), which is then processed for intron removal. After intron elimination, exons are stitched together to form an uninterrupted coding sequence consisting of exons only (mature RNA). (Source: adapted from Alcamo, I. E. 1996. DNA Technology: The Awesome Skill. Dubuque, Iowa: Wm. C. Brown.)

Introns are extremely rare in prokaryotes. Finally, there are morphological differences between the two types of cells; some of the main ones are illustrated in figure 6. .

Figure 6.A summary of the main morphological differences between prokaryotic and eukaryotic cells. The eukaryotic cell is a composite containing features not necessarily present in all eukaryotic cells. Differences and degree of complexity are obvious. (Source: Margulis, L. Early Life. 1984. Sudbury, Mass.: Jones and Bartlett, figure 1.1, page 3. Reprinted with permission of the publisher.)

How could all these differences have come about? The presence of chloroplasts and mitochondria in plants cells and mitochondria in animal cells is best explained by a hypothesis developed in the late 1960s by Lynn Margulis of Boston University. Her model is known as the endosymbiont hypothesis. Margulis had been interested in the symbiotic relationships between the algae and fungi that form lichens. In lichens, these two types of organisms are intimately mixed, and they cooperate in their fight against the harsh environment in which they live. Taking this 63

idea one step further, Margulis hypothesized that chloroplasts and mitochondria were at one time free-living prokaryotes that somehow became engulfed by larger cells and established themselves as symbionts inside their hosts (hence the name endosymbionts). At the time, this hypothesis was greeted with a good deal of skepticism. Now that the properties of chloroplasts and mitochondria are much better understood, Margulis’s hypothesis has become a classical model presented in elementary biology textbooks. Indeed, we now know that both mitochondria and chloroplasts contain their own DNA. The general architecture of their genomes, their protein-synthesizing apparatus, and their sensitivity to certain antibiotics make it clear that they are of prokaryotic origin. Mitochondria and chloroplasts need the cooperation of nuclear genes to function properly because, over time, many genes they originally housed have been transferred to the nucleus of plant and animal cells. Thus in the endosymbiont model, animal cells presumably originated from the capture of aerobic bacteria—possibly from a class known as purple bacteria and capable of making ATP via respiration—by larger anaerobic bacteria able only to ferment. These trapped bacteria thus became mitochondria. Plant cells, in turn, came from large anaerobic prokaryotes that had engulfed the progenitors of mitochondria (as in the case of animal cells), as well as cyanobacteria and their photosynthetic machinery that later on became chloroplasts. In both cases, the evolutionary advantage of endosymbiosis was a vastly greater ability to produce ATP. Furthermore since all eukaryotes that possess chloroplasts also possess mitochondria, many think that acquisition of mitochondria came first. Curiously, it has been recently demonstrated that the eukaryotic human pathogen Toxoplasma, which is not a plant, contains an organelle that looks very much like a remnant from a unicellular photosynthetic organism! This organelle contains DNA that is related to that of a green alga (also a eukaryote, not a prokaryote like a cyanobacterium) but that has no chlorophyll and has lost the genes responsible for photosynthesis. The explanation for this is that this organelle once was a free-living eukaryotic cell engulfed by another eukaryotic cell. Thus this capture must have occurred more recently than the ones that gave rise to the first eukaryotes. Further study of Toxoplasma may shed light on the phenomenon of endosymbiosis in general. Although the endosymbiont model is now well accepted by the scientific community, the problem of identifying the hosts that captured purple bacteria and cyanobacteria to produce the first eukaryotes is not solved. What could these hosts have been? Not all eukaryotes contain organelles. For example, the unicellular human parasite Giardia lamblia has neither mitochondria nor chloroplasts. Could it be that the ancestors of Giardia were primeval eukaryotes that never captured any endosymbiotic guests, in contrast to most other eukaryotes? This seems unlikely, because there is good evidence that the nuclear genome of Giardia contains genes of prokaryotic origin. This evidence suggests that the ancestors of Giardia once contained organelles of prokaryotic descent that transferred some of their genes to their nucleus and then disappeared. Giardia is thus not a good example of a “primeval” eukaryote. Some other answers may be found when the genomes of more unicellular eukaryotes, with and without mitochondria or chloroplasts, are sequenced. In the meantime, scientists have formulated a more intricate hypothesis to explain the origin of the first eukaryotic cells. This model is called the hydrogen hypothesis because it assumes that eukaryotes originated from the fusion of an anaerobic, hydrogendependent archaeal cell with an aerobic bacterial cell that was able to respire oxygen (to produce CO2) and that also produced hydrogen as a waste product. The evolutionary advantage of this fusion would have been that the two partners, now together, could have made use of waste products (CO2 and H2) and could have benefited from plentiful ATP produced by respiration. This hypothesis rests on the following observations. First, some modern eukaryotes, devoid of mitochondria, contain an organelle, called the hydrogenosome, that produces hydrogen gas and ATP. The hydrogenosome contains no DNA (which may have been lost or transferred to the nucleus), but some of its components resemble that of mitochondria. This suggests a bacterial origin for these hydrogenosomes. Furthermore, many contemporary Archaea strictly depend on H2 and CO2 to produce ATP. Thus an ancestral archaeal cell could have had a similar metabolism. Finally, phylogenetic analysis shows that eukaryotes are more closely related to Archaea than they are to Bacteria (more about that later). Therefore saying that eukaryotes descend from both Bacteria and Archaea makes a lot of sense (figure 7). This hypothesis is falsifiable, because if correct, it means that eukaryotes harboring mitochondria have kept the respiration function (oxygen metabolism) of the original bacterial portion of the fusion but have lost the ability to generate hydrogen. On the other hand, in eukaryotes devoid of mitochondria but able to produce hydrogen, it is the respiration function that has been lost and the ability to generate hydrogen that has been kept. Finally, for those eukaryotes that contain neither mitochondria nor hydrogenosome, it can be hypothesized that many of the bacterial genes were transferred to the archaeal cell genome (presumably to provide some needed functions) before the bacterial portion of the partnership disappeared. Again, sequencing of a variety of eukaryotic genomes will either support or refute this hypothesis. In all cases, traces of ancestral genes, responsible for respiration or

64

hydrogen production or both, should be found in eukaryotes that can perform only one of these two functions or none at all. We do not have answers yet because, for now, most genome sequencing efforts are restricted to the human genome, those of model organisms such as the mouse, the rat, and the fruit fly and those of agronomically important species (cattle, cultivated plants), and human pathogens. However, this state of affair is changing rapidly given that DNA sequencing techniques have become so much less expensive.

Figure 7.The hydrogen hypothesis. Three modern outcomes of the hypothetical fusion between an archaeal cell and a bacterial cell are shown. A: An amitochondriate anaerobic eukaryote, such as Giardia, produces ATP by glycolysis and fermentation. In some cases, lactate is produced instead of ethanol. Both the ability to produce hydrogen and the ability to perform respiration have been lost. B: In these anaerobic cells, the hydrogenosome represents the remnant of the aerobic bacterial cell that originally produced hydrogen as a waste product. The respiration function has been lost. C: This mitochondriate aerobic cell has lost the ability to produce hydrogen but has kept the respiration function present in the mitochondrion. All three types of cells have kept the anaerobic glycolytic pathway inherited from the archaeal fusion partner. (Source: adapted from Martin, W., and M. Muller. 1998. The hydrogen hypothesis for the first eukaryote. Nature 392:37–41.)

The hydrogen hypothesis, however, does not explain how the bacterial-archaeal partnership developed a cytoskeleton and an intricate membrane system. Indeed, neither Bacteria nor Archaea have a cytoskeleton or a complicated membrane system although Bacteria, at least, harbor genes homologous to eukaryotic cytoskeletal genes. Not enough is known yet about the genetics of the cytoskeleton and eukaryotic membranes to make educated guesses about their origin. It is interesting to note that some eukaryotes have a very simple membrane system. Also, eukaryotes possess in their membranes some protein and lipid components that seem to be of archaeal origin. Clearly, much more research is needed in these areas. In addition, the hydrogen hypothesis does not explain the existence of chloroplasts. We must assume that once formed, the proto-eukaryote would have engulfed cyanobacteria to give rise to plant lineages. Other features of eukaryotic cells that the hydrogen hypothesis does not explain are the existence of linear chromosomes (as opposed to circular bacterial and archaeal chromosomes) and the ability to “cap” molecules of messenger RNA. Indeed, unlike prokaryotic mRNAs, eukaryotic mRNAs always contain a modified guanine as the very first base. In addition, eukaryotic mRNAs are terminated with a “tail” of many adenines strung together. The function of the modified G is to firmly dock eukaryotic mRNAs to ribosomes just before translation begins. Prokaryotes use a completely different mechanism to attach their mRNAs to ribosomes. The function of the tail is to provide eukaryotic mRNAs with protection from degradation. Since eukaryotic cells do not divide faster than once in at least 24 hours, their mRNAs must display greater stability than prokaryotic mRNAs, whose hosts divide much faster. What is the origin of these three features? 65

VIRUSES AND THE EVOLUTION OF LIFE Were viruses involved in the appearance and evolution of eukaryotes? Many modern viruses harbor a linear DNA chromosome and also cap and tail their mRNAs. Hence, a hypothesis, published in 2001, proposes that all eukaryotes are ultimately derived from an ancient virus that fused with an archaeal mycoplasma. A mycoplasma is a prokaryotic cell that is devoid of a cell wall; it is able to undergo membrane fusion with other organisms simply because there is no cell wall to interfere with this fusion mechanism. Modern archaeal mycoplasmas do exist and may descend from very old ancestral forms. What is more, many viruses possess an outer lipid membrane that covers their protein capsule. These viruses penetrate modern cells by fusing with the cellular membrane. Therefore the viral hypothesis contends that such a virus invaded an archaeal mycoplasma host, where it became established and became the proto-eukaryotic linear chromosome by recruiting genes from the archaeal chromosome. Many archaeal genes were subsequently lost by this chimeric organism. But then, we also know that modern eukaryotes contain the descendants of many bacterial genes. Where are these coming from? Well, since the ancestral mycoplasma—now containing a viral chromosome—did not have a cell wall, it may also have been able to perform endocytosis and may thus have engulfed bacterial prey whose genes were then marshaled to perform new, useful functions. If correct, the virus hypothesis surmises that viruses are quite old, as old as prokaryotes. We do not actually know this to be the case. At any rate, the virus hypothesis would be supported by the discovery of complex viruses capable of infecting archaeal mycoplasmas. Such viruses have not yet been discovered. Finally, the virus hypothesis and the hydrogen hypothesis should not be seen as contradictory. Rather, it can be argued that the mycoplasma host engulfed carbon dioxide- and hydrogen-producing bacteria (as in the hydrogen hypothesis) and in addition, it also acquired permanent DNA viruses that ended up making the linear eukaryotic chromosome (figure 8).

Figure 8.The virus hypothesis. A: An archaeal mycoplasma with bacterial endosymbionts trapped in vacuoles. Virus particles surround this cell. B: A mycoplasma, with endosymbionts, stably infected by a complex DNA-containing virus destined to become the eukaryotic chromosome. After fusion with the mycoplasma membrane, the virus particles have lost their lipid membrane. (Source: adapted from Bell, P. J. L. 2001. Viral eukaryogenesis: Was the ancestor of the nucleus a complex DNA virus? Journal of Molecular Evolution 53:251–256.)

We saw earlier that another variant of the virus hypothesis, the Forterre hypothesis, proposes that Bacteria and Archaea diversified from an RNA world that became infected by DNA-containin viruses. This hypothesis also proposes that the first eukaryotic cells appeared under the same circumstances. But in the latter case, a third type of DNA virus is invoked: this is a virus that had a common ancestor with the virus that gave rise to the Archaea. This feature nicely explains why Archaea and Eukarya possess a similar DNA replication machinery, which is however, very different from that found in Bacteria (figure 9).

66

Figure 9. The Forterre hypothesis on the evolution of the DNA world from the RNA world mediated by DNA-containing viruses. FvB, FvA, and FvE are DNA-containing founder viruses for Bacteria, Archaea, and Eukarya. LUCA stands for Last Universal Common Ancestor. (Source: US National Academy of Sciences).

CHROMOSOME AND GENE ORGANIZATION IN EUKARYA The last issue I wish to discuss is the origin of the complex genome organization in eukaryotes. As we saw in figure 4, eukaryotic DNA is wrapped around a core of histone proteins. There are five such different histone proteins in all eukaryotes. Interestingly, five histone genes have been identified in the genome of the archaeal Methanococcus jannaschii, a prokaryotic species that dwells near hydrothermal vents. But surprisingly, the genome of this Archaea is not organized in eukaryotic-like nudeosomes; the function of the histones in this organism is nevertheless to compact DNA. What is more, the fundamental genes coding for DNA replication and transcription in Archaea are very similar to the corresponding genes found in Eukarya, as we just saw above. This adds much weight to the hypothesis that eukaryotes are more closely related to archaeal cells than they are to bacterial cells. Again, the term archaeal (meaning archaic) should not be taken literally in this context, because Archaea look in some ways less archaic (at least in a human idiosyncratic framework) than Bacteria. The great obscuring feature of all this research on the origins of eukaryotes is that archaeal genes are found in bacterial cells and vice versa. This is in all likelihood the result of horizontal gene transfer that probably occurred eons ago (and very possibly still occurs today) among prokaryotic species. Potentially, the sequencing of many more prokaryotic and eukaryotic genomes will provide clues to the difficult question of the origin of eukaryotes. Finally, and again concerning eukaryotic genome organization, there is the question of the origin of introns. Introns, DNA sequences that interrupt the coding sequences of eukaryotic genes, are extremely rare in prokaryotes, but they are not absent. In addition, chloroplast genes, thought to be of ancient bacterial ancestry, do contain introns. The question then is, have introns been there from the beginning in prokaryotes (or even before that, in the RNA world) and were they subsequently lost by the enormous majority of prokaryotes? On the other hand, are introns a new, eukaryotic “invention” that found its way to the genomes of rare, contemporary prokaryotes? Two hypotheses have been formulated. One holds that introns are ancestral and assumes that they may have originated in the RNA world. In this model, introns are seen as nucleic acid sequences, devoid of function and present by mistake in original RNA genes. These noncoding sequences then allowed exon (the coding part of genes) shuffling and creation of new genes. Indeed, the presence of noncoding sequences in ancestral genes would have allowed “cutting and pasting” of exons to create a great diversity of proteins during evolution. These introns would then have been largely lost by most prokaryotes and a few unicellular eukaryotes (Giardia, for example, does not seem to have introns in its genes). In the second hypothesis, introns are simply a property of the eukaryotic world. Which is right? If the exon shuffling model is correct, some argue, it should be possible to recognize some common patterns in ancient genes (genes that may have developed in the RNA world and that are found in all cells today). These genes would have originated from the assembly of old exons, and these blocks of DNA should still be recognizable today in these ancient genes. No pattern was found in the genes coding for alcohol dehydrogenase, globins (proteins carrying the ancient porphyrin rings), pyruvate kinase, and triose phosphate isomerase. With the exception of globins, these proteins are involved in the ancestral glycolysis/fermentation pathway. The conclusion from these results is that introns are not ancient. Of course, one can retort that the study of four genes is much too limited to reach any kind of conclusion. Additionally, the opponents of the “old introns” hypothesis have not provided an alternative explanation for their origin. In conclusion, we do not know with any kind of certainty what the origin of introns is. 67

CONCLUDING REMARKS As you are now aware, the science of the origins of life is very challenging. Exciting hypotheses have been formulated and crisp mathematical models explaining the origin of structure have been developed, yet enormous uncertainties remain at practically every step of the way. Are we getting closer to an answer? This is impossible to predict, as is usual in science. More research, in particular the full sequencing of hundreds of genomes, may give us more clues as to their ascent. This will take some time, however, because sequencing efforts are presently mostly restricted to organisms of medical or agronomic importance. Or we may find answers not on Earth but elsewhere in the solar system, if extant life or traces of extinct life exist on other planets and satellites. The evolution of life on Earth clearly proceeded well beyond the appearance of eukaryotic cells. We are here to prove it! However, we have not discussed so far in these notes the full path that life took in the 2 billion years that followed the creation of the first complex cells. In a nutshell, during this time, multicellularity was achieved, then cell differentiation into different organs burst into existence, followed by the evolution of sex, animals, plants, and humans. This took only about 1 billion years, a drop in the universe’s time bucket and only about one third of the time it took Earth to become host to unicellular eukaryotic life. The next topic examines how the fairly new branch of science called evolution and development or evo-devo contributes to our understanding of the appearance of increased complexity in the past 2 billion years. Finally, the next section also describes what we know about the evolution of the creature that came up with odd idea of asking questions about origins: Homo sapiens.

CONTROVERSIES Some investigators contend that eukaryotes are not reducible to Bacteria and Archaea. Geneticist Patrick Forterre says about the models put forth to explain the origin of the Bacteria, Archaea, and Eukarya that: “All these models have drawbacks that are usually emphasized in turn by each of their proponents to repudiate the others.” This situation is reminiscent of the dispute between the proponents of a prebiotic soup with organics made in the atmosphere and those supporting hydrothermal vents playing the same role. But in fact, these disputes keep science alive and well. Nonetheless, it should by now be clear to the reader that no consensus exists regarding the early evolution of the three domains of life, let alone the existence of an RNA world. For many researchers, phylogenies show that Bacteria and Archaea contributed to the origin of Eukarya, although the newest virus hypothesis reduces the homology between Archaea and Eukarya to infections by viruses sharing a common ancestor. Some other esearchers go even further than this: for them, Eukarya owe nothing to either Bacteria or Archaea. Based on their interpretation of gene homologies and types of proteins found in the three domains of life, these investigators conclude that evolution did not necessarily start from simple prokaryotes to generate complex eukaryotes. The opposite could also occur, with evolution reducing the complexity of genomes rather than increasing it. In that case, according to these authors, the common ancestor of all DNA-based life on Earth could have been a population of unicellilar organisms, some of which already possessed complex features unique to some eukaryotic cells, such as the ability to ingest microscopic prey (phagocytosis). Other members of the population were less complex and could have evolved by genome reduction. These researchers call this eukaryotic ancestor the “raptor” but unfortunately, they do not indicate how the “raptor” appeared and from where, although they totally dismiss the fusion hypothesis. The big problem here is that even though all researchers have access to the same DNA sequence data, they cannot agree on a single interpretation of these data. In science, this is often an indication that even more data are needed, hence the need for future researh.

FURTHER RESEARCH Deciphering the genome of rare unicellular eukaryotes as well as gene replacement experiments may shed life on the evolution of early life. As just mentioned, it may well be that the sequencing of the genomes of a whole series of unicellular eukaryotes will clarify the presently murky picture of eukaryote evolution. Many species of such eukaryotes are known to taxonomists but not so many are biologically well-described and even fewer have had their DNA sequenced. Thus, only time and genome sequences will tell. The virus hypothesis is also amenable to laboratory work. For example, one could try to create new domains of life by removing the DNA replication apparatus of Bacteria and replacing it with one isolated from Archaea, or the other way around. Remem-

68

ber that the virus hypothesis proposes that the origin of the DNA replication machinery is to be found in DNA viruses that evolved in the RNA world. Also, one could engineer RNA genes, introduce them into modern Bacteria abd Archaea expressing a reverse transcriptase gene, and then see what happens to these introduced RNA genes. Would they be converted into stable and functional DNA genes?

DISCUSSION QUESTIONS What is the virus hypothesis for the evolution of early life? How did oxygenic photosynthesis impact the evolution of early lifeforms? What is horizontal (lateral) gene transfer? How does it work? What is endosymbiosis? How could it be important in evolution? What is the hydrogen hypothesis? Is it widely accepted?

REFERENCES AND WEBSITES de Duve, C. 1995. Vital Dust: Life as a Cosmic Imperative. New York: Basic Books. Forterre, F. 2006. Three RNA cells for ribosomal lineages and three DNA viruses to replicate their genomes: A hypothesis for the origin of cellular domain. Proc. Natl. Acad. Sci. USA 103:3669-3674. Hamilton, G. 2006. The gene weavers. Nature 441:683-685. Hartwell, L. H., L. Hood, M. L. Goldberg, A. E. Reynolds, L. M. Silver, and R. C. Veres. 2000. Genetics: From Genes to Genomes. Boston: McGraw-Hill. Kurland, C.G., L.J. Collins and D. Penny. 2006. Genomics and the irreducible nature of eukaryotic cells. Science 312:1011-1014. Margulis, L. 1984. Early Life. Boston: Jones and Bartlett. Maynard Smith, J., and E. Szathmary. 1997. The Major Transitions in Evolution. Oxford, England: Oxford University Press. Pennisi, E. 2004. The birth of the nucleus. Science 305:766-768. Whitfield, J. 2006. Base invaders. Nature 439:130-131.

The following website presents an extensive description of the putative passage of the RNA world to the DNA world: http://emergentcomputation.com/preDNA.html and this website gives one view of the three domains of life http://www.astrobio.net/news/article184.html

These lectures notes are based on revised chapter 5 from “Origins of Life and the Universe”, Columbia University Press, 2003, by P. F. Lurquin.

69

LYNN MARGULIS (1938-)

Lynn Margulis obtained her AB from the University of Chicago in 1957 and her PhD from the University of CaliforniaBrekeley in 1953. She is famous for having developed the endosymbiont theory to explain the origin of eukaryotes. Honors did not come easily to Margulis, however. At the time she formulated her theory, in 1966, she was a faculty at Boston University and was married to Carl Sagan who was to become a famous astronomer and science popularizer. Her article was rejected by many journals but nevertheless ended up published in 1967 in the Journal of Theoretical Biology. This article is now a classic. But for many years, the endosymbiont theory had very littyle credibility in the world of science. It is to her credit that Lynn Margulis showed great tenacity, giving many talks at home and abroad, and pursued her research in spite of much opposition. She was vindicated when the presence of prokaryotic DNA was demonstrated in mitochondria and chloroplasts. Some claim that the endosymbiont model shows the weaknesses of the theory of Darwinian evolution, which is based on slow, cumulative changes in organisms. Margulis herself has suggested that much. However, one should not get too enthusiastic too quickly about this. Darwin knew nothing about mutations an endosymbionts and therefore could not provide a mechanism for evolution by natural selection anyway. What Margulis has demonstrated is, that in addition to mutations, which are indeed rare, another mechanism, the capture of prokaryotes early in evolution (and perhaps even taking place today), can provide selective advantages to organisms. This no way negates the roles of Darwinian fitness and natural selection. Lynn Margulis is currently a faculty at the University of Massachussetts-Amherst where she conducts research on the possible symbiotic origin of eukaryotic cilia and the Gaia hypothesis. Both are controversial subjects, which is not surprising in Margulis’s case. The Gaia hypothesis states that the biosphere of our planet is a self-regulating system, something with which many ecologists would disagree. Margulis is a prolific book author and has published several works with her son, Dorion Sagan.

70

CHAPTER 5

Increasing Complexity: From Simple Eukaryotes to Homo sapiens Evolution is no longer [considered to be] necessarily progressive, it no longer strives toward perfection or any other goal. Ernst Mayr

VARIATIONS IN THE BLUEPRINT. Evolution is a branching out process By comparing protein and DNA sequences in many different species, scientists have been able to show that all studied life-forms on Earth are genetically related. This suggests that all existing species descend from a common root that appeared a very long time ago.

We all know that once in a while, an organism displays unexpected properties. For example, some bacteria in a large population cease to be able to synthesize vitamin B1 while all the others go on manufacturing this compound. In humans, one child may be unable to metabolize the sugar galactose and so will suffer from the hereditary disease galactosemia. These defects are caused by a mutation in a specific gene. This mutation is then passed on to the progeny of these organisms. How can this happen? It turns out that DNA replication is not absolutely perfect. In a rare while, the DNA replication machinery inserts a wrong base in a growing DNA strand. This incorrect DNA is transmitted to the descendants of the cell in which the mistake has occurred, and it continues to be propagated as an incorrect DNA molecule. Since proteins depend directly on the sequence of bases present in the DNA, insertion of a wrong base results in an incorrect codon and thus insertion of an incorrect amino acid into the protein coded for by that gene. This is what a mutation is: the formation of an incorrect gene resulting in an incorrect protein, which, if it is an enzyme, has a great likelihood of not performing a correct function. In the two examples given earlier, a mutation occurred in one of the genes involved in the vitamin B1 biosynthetic pathway, whereas the galactosemic child carried a mutation in the gene coding for the enzyme responsible for converting galactose-1-phosphate into another galactose derivative. Galactose-1-phosphate then accumulates and becomes toxic. Mutations are a fact of life; they cannot be completely avoided, since the mechanism of DNA replication itself is not errorproof. Thus, mutations lead directly to the appearance of genetic variants in populations of organisms. Most mutations are deleterious to the organisms that harbor them. Most, but not all. You probably know that some diseases caused by bacterial pathogens, which used to be easily curable with antibiotics a decade or so ago, are now resistant to treatment. This is because these bacteria have become resistant to antibiotics. In some cases, this resistance is caused by a simple spontaneous mutation in their DNA. Now if you are that bacterial mutant, you will thrive in an environment laced with antibiotics, whereas your nonmutant colleagues will become extinguished. What happens next? The dying nonmutant bacteria stop consuming food (present, for example, in the human intestine) and clear the way for the mutants to proliferate wildly since they no longer have any competition for resources. The result is uncontrolled multiplication of pathogenic mutants causing the patient to become very sick in spite of antibiotic treatment. From the standpoint of the mutants, nothing could be better: they have no competitors for food, they proliferate abundantly, and they are spread in nature—in humans, by diarrhea or coughing, for example. In other words, from a tiny minority, the mutants have become an overwhelming majority. This is great success in the struggle for life. Thus some mutations can be beneficial to those (here, the bacteria) that harbor them.

71

This leads us directly to the concept of natural selection, critical in the understanding of evolution in general and evolution of early life-forms in particular. Selection consists in amplifying (making more numerous) organisms that otherwise would be present in a given environment in very small numbers. In the preceding example, the selective agent was the antibiotic that wiped out benign bacteria, as well as antibiotic-sensitive pathogens in the human victim, thereby creating a new niche for the pathogenic antibiotic-resistant mutants to occupy. To put it slightly differently, the bacterial mutants display greater fitness in the antibioticladen environment than the non-mutants. Greater fitness means greater proliferation. In this example, the selective agent is an antibiotic, a human-made chemical. We call this process artificial selection. Other examples of artificial selection are higher yield in crop plants and milk production in cows, both brought about by selective breeding implemented by human agents. It is not difficult to imagine a similar situation occurring in nature, without antibiotics and without intervention of breeders. Populations of living creatures are genetically heterogeneous. This genetic diversity is clearly visible in humans and though not so visible in a herd of buffaloes, the genetic variation is there (one notable exception is cheetahs, which are so inbred as to make them almost clones of one another; this is why they are in danger of extinction). In other words, large populations are collections of individuals sharing genes that make them belong to this or that species, but there exist many slight variants for many of these genes, making each individual unique. When we look at wild populations that have occupied their ecological niches for thousands of years, we know that the individuals composing these populations are fit to survive and reproduce successfully in that particular ecological niche. This means that the genetic variation found in this population is well adapted to the environmental conditions of the niche. Now suppose that these environmental conditions change. This change can be catastrophic or gradual, like a sudden drought or a subtle climate change. It is at this level that genetic variation will play a large role; those individuals that, through subtle genetic differences already in place, possess better resistance to the new environmental threats, will be better fitted to the new conditions than their partners and so will become more successful competitors for resources. Over time (millions or billions of years), much genetic variation can be accumulated by living populations, and many ecological situations can change. This leads to the appearance of new species and the extinction of others. In a nutshell, this is how natural selection and evolution take place. The key to understanding evolution is that genetic variation preexists in populations. The variants (it is better not to call them mutants to avoid the risk of negative connotation) may represent a small minority under current environmental conditions, but they can quickly become a majority if these conditions change in a way that makes the variants better fit than the majority. The force at work (through, for example, climate change, fire, flood, asteroids falling, or volcanic activity) is natural selection. It is natural selection that has molded the different species, hence the different gene combinations and their accompanying organisms that exist on Earth. Another important evolutionary mechanism is called genetic drift or simply drift. This mechanism is based on chance alone. To understand this we must take into account two different concepts. First, genetic variants exist at certain frequencies in natural populations, with some variants more or less numerous than others. Second, not all variants reproduce equally. For example, very young and old individuals do not reproduce. In effect, only about one third of all individuals in a population reproduce at each generation. Therefore, each successive generation represents only a sample of the variants that existed in the previous generation. In small populations, this sample may not be representative of the overall variant distribution, the result being, in some cases, that certain variants may simply disappear because they did not reproduce. Thus, drift can “swing” variant frequencies in one direction or another in a short amount of time. This is also evolution of a population. Not all of the innumerable and possible DNA base combinations exist in the biosphere because a very large number of them would be incompatible with the conditions that have existed and now exist on the planet. In fact, there is strong and fascinating evidence that all living things are descended from a common ancestor that lived prior to 3.5 billion years ago. The branch of genetics called phylogeny busies itself with the genetic relationships that exist within and among species. For a long time, phylogeny was mostly based on the external appearance of organisms, and it was concerned with characteristics such as shape, skeletal features (including those of fossils), and organ structure. Then, when it became possible to determine the amino acid sequence of proteins, in the 1950s, molecular technology became prevalent. The invention of DNA sequencing in the late 1970s made comparisons at the level of genes a reality. It all started with the study of two proteins, cytochrome c and hemoglobin, in various mammalian lineages. Researchers discovered that the number of amino acid substitutions in these two proteins was proportional to the evolutionary distance separating them. For example, there is only one amino acid change when human and monkey cytochrome c are compared. There are, how72

ever, twelve such changes when human and dog cytochrome c are compared. This number increases to sixty-six in a comparison between humans and yeast. Furthermore, the number of substitutions is only one in a comparison between the horse and the donkey. The fossil record, too, shows that humans and apes are more closely related than humans and dogs. Similarly, we know that horses and donkeys are very closely related. Thus a comparison between the fossil record and the rate of amino acid substitutions in proteins gave scientists the idea that such substitutions could be used as a molecular clock. The more changes there are between two similar proteins found in different species, the more distantly related they are. Similar experiments with other proteins gave an excellent correlation between the percent of amino acid substitution and time since various species diverged, based on fossil evidence. The fossil record can thus be used to calibrate the molecular clock. Fossils can indeed be dated with great accuracy by a variety of techniques, only some based on radioactive decay. Since changes in amino acid composition are the result of changes in the DNA codons, DNA sequencing can also be used to study the relatedness between species. Studies at the DNA level have confirmed the results obtained with proteins. Measuring the evolutionary distance based on DNA sequence divergence between extant organisms allows scientists to perform what has been called molecular archaeology. This is accomplished by reversing the procedure described in the previous paragraph. There, fossil evidence was used to calibrate the molecular clock. Then why not use that calibrated molecular clock to calculate the time at which various species diverged? This would be extremely useful in those cases where fossils do not exist, such as with most of the predecessors of the thousands of microbial species that exist today. This was indeed done, and it has led to the construction of what are called phylogenetic trees. Figure 1 shows the general phylogenetic tree of life with its three domains, the Bacteria, the Archaea, and the Eukarya. Bacteria and Archaea are all microscopic, single-celled organisms whose DNA is not enclosed by a membrane. They are known collectively as prokaryotes. In contrast, Eukarya (also called eukaryotes) comprise single-celled and multicellular organisms, from yeast, to lettuce, to humans. Their DNA is always found in a membranous body called the nucleus (also see topic #4). Scientists built the phylogenetic tree linking the three domains of life using comparisons between thousands of protein and DNA sequences. In this tree, the lengths of the various branches are proportional to the length of time since divergence. For example, the long branch uniting the Eukarya with the Archaea is about 1.8 billion years long. This means two things: first, that Eukarya derive from Archaea (but also see topic #4), and second, that it took about 1.8 billion years for some Archaea to evolve into Eukarya. That this tree represents reality is shown by the technique known as gene resurrection. As its name indicates, this technique allows the production of genes that are now extinct, for example a dinosaur gene. To do this, scientists sequenced the stretch of DNA that codes for the protein eye pigment rhodopsin from a mammal, a bird, a reptile, and an amphibian. By looking at the base pair changes present in these different genes, scientists built a phylogenetic tree showing the evolutionary path of the rhodopsin gene and estimated the amount of time during which this gene evolved as it appeared in the different species. Next, using sophisticated computer algorithms, scientists recreated a DNA sequence that must have existed at the time of the dinosaurs. This DNA sequence was then synthesized in the laboratory and subsequently introduced into living cells. The synthetic dinosaur gene did indeed produce rhodopsin! This tells us that the phylogenetic tree was informative, because if it had not been, rhodopsin could not have been produced. Instead, scientists would have obtained a protein bearing no resemblance to rhodopsin or even perhaps, no protein at all. Intriguingly, since rhodopsin allows vision in dim light conditions, we can now infer that dinosaurs were able to see in semi-darkness! Going back to figure 1, the tree shown here is rooted (the root is the thick vertical line), and this root represents the universal ancestor(s) from which all life on Earth is descended. As we saw earlier, the universal ancestor(s) appeared in the putative RNA world. Clearly, life has come a long way. But does this mean that humans, complex eukaryotes as they are, represent the ultimate in what evolution can produce? It is tempting for us humans to believe that we have somehow reached the absolute and perfect pinnacle of a long evolutionary process that started as long as 3.8 billion years ago. This temptation most certainly comes from our ability to reflect upon our origins—an ability not shared by any other life form—and to develop scientific models that make sense of our empirical observations of the world. Undoubtedly, this unique capacity originated in our big brain, the most complex assembly of cells found in nature. But as the great evolutionary biologist Ernst Mayr (1904-2005) reminds us, evolution does not imply progress nor does it lead to perfection. Indeed, humans cannot be said to be more perfect than, say, beavers or rhododendrons. In fact, humans, beavers, and rhododendrons simply evolved within and now occupy those ecological niches for which they are each best adapted. As we know, beavers are adapted to an aquatic environment surrounded by trees, from which they make dams in order to catch fish. Rhododendrons are 73

adapted to sandy soils and mild, humid weather. How is this relevant to human evolution and why is this comparison not trivial? There is now good evidence that the first Homo sapiens, who appeared as early as about 160,000 years ago, were not adapted to life in water and did not evolve in sandy areas, Rather, the first humans evolved in a tropical, savanna-like environment that would not have been conducive to the evolution of beavers and rhododendrons. In other words, humans, beavers, and rhododendrons each went their separate evolutionary ways, in different locations, without any indication that one was absolutely “better” than the others.

Figure 1. The tree of life showing the three domains of life. Each domain contains several branches (most of them unnamed in this drawing, but known) that correspond to different organisms. Archaea and Bacteria are all microorganisms devoid of nucleus and grouped under the name prokaryotes. The domain Eukarya is composed of unicellular and multicellular organisms, all equipped with a cell nucleus where DNA is concentrated. Eukarya are also known by the name eukaryotes. Divergence between Archaea and Bacteria took place 2 billion years ago, and that between Archaea and Eukarya, 1.8 billion years ago. Fungi, plants, and animals diverged from other Eukarya about 1 billion years ago. Homo sapiens (not represented at the scale of this diagram, but included under Animals) diverged from other hominids about 160,000 years ago. The numerous gene exchanges that occurred over time through lateral gene transfer are not indicated for clarity. Of course, humans have also evolved culturally, another feature made possible by our large brain, and as a result, they occupy today niches that beavers and rhododendrons could not possibly inhabit. However, we will see later in this topic that some aspects of human cultural evolution are tied to our recent biological evolution. Thus, like all lifeforms, we will continue to evolve, but not necessarily in a direction that will make us more “perfect” in absolute terms. What is undeniable, however, is that the eukaryotic world has become much more complex in the past two billion years or so. This is clearly shown by the fossil record, where the oldest eukaryotic fossils are unicellular whereas younger fossils show much more complicated cellular assemblies. As we know, multicellular eukaryotes (e.g., sponges, birds, mammals, etc.) are more complex than unicellular eukaryotes (e.g., amoebas, paramecium, molds, etc.). In addition, multicellular eukaryotes show different degrees of organization, from simple sponges to more complicated insects, to more complicated still, mammals. This type of hierarchical complexity was first codified by the Swedich naturalist Carl Linnaeus (1707-1778), often rightly called the “father of taxonomy.” 74

What, then, is taxonomy? Simply put, taxonomy is a system that classifies living organisms according to their degree of complexity. In the early days of Linnaeus and his followers, taxonomy was based mostly on distinctive anatomical features displayed by lifeforms. Later, physiology, biochemistry and finally, DNA sequencing, were added to gain a more precise picture of the complexity seen in the living world. It should be noted that initially, taxonomy did not imply any evolutionary relationship between lifeforms since the idea of descent with modification (branching out of variants) was unknown at the time. In other words, taxonomical closeness between organisms did not necessarily mean that they were closely related evolutionarily. However, genome sequencing, which allows the building of phylogenies (trees of descent), now enables us to make correlations between taxonomical complexity and positioning of organisms in the tree of life. Recall that DNA-based phylogenies are put together using the molecular clock and the fossil record. The following sections focus on the evolution of complexity in animals and their phylogenetic relationships.

A WHIRLWIND TOUR OF INCREASED EUKARYOTIC COMPLEXITY Taxonomists recognize several major steps in the formation of the body plan and the body complexity of animals. The sequencing of DNA from many organisms supports the idea that body plan evolution with time can be correlated with taxonomical complexity, with a twist. A broad category that groups organisms according to shared physical traits and common ancestry is the phylum. For example, the phylum chordates (chordata) clusters all vertebrates together. Another phylum, the arthropods (arthropoda), groups all insects, spiders, crabs, lobsters, centipedes, and many more. In what follows, we will consider just eight animal phyla, out of a total of 41, and see how complexity developed over time. An animal is by definition an organism whose developing embryo passes through a stage called a blastula. A blastula is a young ball-like embryo composed of tens to hundreds of cells surrounding a fluid-filled cavity. No other multicellular organisms (such as plants and fungi) progress through a blastula stage in their development. The common ancestor of all animals is thought to have been a unicellular, chloroplast-less eukaryote perhaps similar to modern-day choanoflagellates (fig. 2) that probably lived about 2 billion years ago.

Figure 2. Photomicrograph of the choanoflagellate Monosiga brevicollis (Source: Wikipedia) Much, much later, about 600 million years ago, during a period called the Neoproterozoic, we find evidence of multicellular life in the form of two new phyla: sponges (Porifera) and cnidarians (Cnidaria). Sponges and cnidarians, possessing no hard body parts, did not fossilize well and therefore, their fossils are rare. However, some fossils do exist and have been dated. For example, figure 3 shows fossil giant jellyfish, a typical cnidarian. Sponges are the simplest known animals. Some of their cells resemble choanoflagellates, except that in the case of sponges, these cells are grouped into a single organism rather than being free. Sponges have no differentiated organs, although they contain different types of specialized cells, making them more complex than unicellular eukaryotes. However, their level of complexity is

75

still low. In fact, some sponges can be disaggregated by filtration through a mesh, the loose cells then spontaneously regrouping to reform a full sponge. In addition, the body plan of sponges shows no symmetry of any kind.

Figure 3. Fossil jellyfish. These fossils measure about 1 meter in diameter. (Source: Wikipedia).

Cnidarians occupy the next level of complexity in that they possess specialized tissues, and their body plan shows simple radial symmetry. This type of symmetry occurs when body parts are distributed evenly around a central point. Typical cnidarians are sea anemones, corals, hydra, and jellyfish (figure 4). Even though cnidarians harbor specialized tissues, they have no organs (heart, liver, etc.) as we commonly understand them. A cnidarian is probably best described as a sack where the opening serves as both mouth and anus and where tentacles bring food to the mouth.

Figure 4. Typical cniderians: sea nettles (Source: Wikipedia).

Next in line is the phylum flatworms, of which the human tapeworm is a typical example. When cnidarians and flatworms diverged, the latter “invented” a characteristic common to all animals that branched out at a later time: bilateral symmetry. This type of symmetry applies when the body plan is such when both sides of the body are mirror images of each other. Bilateral sym76

metry is obvious in mammals, for example. Bilateral symmetry, shared by all animals called bilaterians, appeared at least as early as during the era called the Cambrian, a period that lasted between 543 million years and 505 million years ago. It is during this period that most animal phyla (but not all extant species, of course) recognizable today made their first appearance. One speaks of the Cambrian radiation to refer to that fact. The Cambrian fossil record is abundant because it is also at that time that hard body parts—that fossilize better—first appeared. Figure 5 shows the fossil of a Cambrian bilateral marine animal found in British Columbia.

Figure 5. Fossil of the extinct, marine Cambrian animal Marella splendens. Bilateral symmetry is evident. The size of this fossil is about 2 cm. (Source: Wikipedia).

The next innovation regarding body plan took place in a branch that eventually would lead to all the other phyla in existence today: the formation of a central body cavity. In other words, the only animals that do not possess a body cavity are the sponges, the cnidarians, and the flatworms. The existence of a central body cavity is linked to the arrangement of three basic tissue layers: the ectoderm, the mesoderm, and the endoderm. In the developing embryo, the ectoderm is a cell layer present on the outside, the endoderm is present in the inside, with the mesoderm separating the ectoderm and the endoderm. In adult animals possessing a body cavity, the latter is lined with the mesoderm layer. Roundworms (such as the small nematode Caenorhabditis elegans, a laboratory animal) are the simplest animals with three tissue layers arranged around a body cavity. Finally, the last split that we will consider for the moment is the separation between protostomes and deuterostomes. Flatworms, roundworms, molluscs (snails, mussels, squid), and arthropods (Arthropoda) are protostomes. Echinoderms (sea urchins, sea stars) and chordates (Chordata) (mostly animals with backbones, including humans) are deuterostomes. In protostomes, a pocket forming on the side of the blastula becomes a mouth first and then deepens to make the anus second. In deuterostomes, the pocket generates the anus first and the mouth last. One consequence of this process is that in protostomes, mesodermal tissues (like the heart) are located near the back of the animal whereas endodermal tissues (like the nerve cord) are located near the front. 77

In deuterostomes, this arrangement is reversed. In all cases, endodermal tissues (like the gut) are located centrally. Figure 6 represents the branching of five of the animal phyla discussed above.

Figure 6. A tree showing the branching out of five different phyla. No time scale is implied. (Source: http://biology.st-andrews.ac.uk/stories/2011_08_17.aspx) An important question now arises: does the taxonomical classification of figure 6, based on morphology, physiology and embryology correspond to a phylogentic ranking based on DNA sequences? In other words, is figure 6 also a partial tree of life for the animal kingdom? The answer so far is yes. Indeed, an increase in body and tissue complexity is accompanied by increased genetic complexity. But this answer is presently only partial. Even though the genomes of many different organisms have been deciphered, we are very far from having sampled all animal phyla. For example, we know the sequence of a sponge genome as well as that of a cnidarian, the sea anemone Nematostella vectensis. We also know the genome of one round worm (C. elegans), that of several arthropods (for example Drosophila melanogaster, the honey bee and two different mosquito species), and that of an echinoderm, the sea urchin. Among chordates, we know the genomes of the mouse, the rat, and the chimpanzee, for example. And of course, we know the human genome. It should be kept in mind that complex genome sequencing is expensive and that for obvious reasons, most animal genomes deciphered so far are those of laboratory animals or those of animals with medical or agronomical importance. In fact, the deciphering of whole genomes to study evolution has only just begun. What then, can be concluded from the comparison of the genomes from animals belonging to different phyla? This comparison revealed a few surprises. The DNA of the sea anemone was the first known sequence of a “simple” animal, a cnidarian, with two tissue layers (as opposed to three in bilaterians) but possessing a nerve net (although not a central nervous system), radial symmetry, and a primitive muscular system. All other fully sequenced genomes are from bilaterians. The ancestor of both cnidarians and bilaterians is thought to have lived about 700 million years ago, during the Neoproterozoic era. By using the notion of the molecular clock, and by identifying genes common to bilaterians and a cnidarian, one should be able to draw conclusions regarding the genomic complexity of this ancestor, as follows.

78

The sea anemone genome contains about 18,000 protein-coding genes, a number similar to that found in the fruit fly (16,000 genes) and a nematode (C. elegans, 19,000 genes), and not too dissimilar from the number found in humans (25,000 genes). Genome comparisons revealed that the common ancestor must have harbored 7766 “core” genes common to the cnidarians and bilaterians. But interestingly, both the fruit fly and the nematode have lost 1292 of these common genes, meaning that the collection of “core” genes found in the sea anemone and humans is more similar than the collection found in Drosophila and C. elegans. In addition, sea anemone genes contain a large number of introns (base sequences not present in mature messenger RNA) that are very similar to human introns. This similarity does not hold true for the fly and the nematode genes, where introns are much less numerous. In passing, this observation also suggests that complex eukaryotic introns existed as early as about 700 million years ago. Further, the clustering of sea anemone genes is more similar to the clustering observed in humans than it is in the two invertebrates, the fly and the worm. This, then, suggests that two bilaterians, the fruit fly and C. elegans, evolved by losing genes and scrambling them. By contrast, this did not happen as much in vertebrates. And finally, when the sea anemone genome was compared with the DNA sequences of plants, fungi and protists (amoebalike unicellular eukaryotes), researchers found that about 1500 of the ancestral genes (about 20%) originated after animals diverged from plants and fungi. Some of these genes seem completely new (they are animal innovations) while others seem to have originated from the combination of parts of old and/or new and ancient eukaryotic genes. Included in these novel genes are those controlling cell adhesion and others controlling the formation of a neuromuscular system. Presumably, these genes evolved further to generate the more complex systems found in bilaterians. It is only when we decipher the genomes of many more bilaterians that we will we have a finer picture of the evolutionary events that caused the various splits seen in figure 6.

HOW DOES SPECIATION OCCUR? As we just saw, new phyla (and also new species) appeared through the formation and fixation of new genes and new gene combinations. As we know, these events occurred through mutation, natural selection, and drift, the three basic tenets of evolutionary theory. But there is more: to gain a better understanding of the evolution of species, one must also take into account extinction, sexual selection, and geographic isolation, three mechanisms through which natural selection is exercised. A change in chromosome number, called polyploidy, can also lead to speciation. Most species that have ever existed on Earth are now extinct. It is a sobering observation that extinction is a fact of life. Particularly impressive are the five mass extinctions that have afflicted our planet during the past 500 million years. One of them, which took place about 250 million years ago, wiped out over 50% of all species then in existence. But it is perhaps the last mass extinction, which occurred at the Cretacious-Tertiary (K-T) boundary 65 million years ago, that is best understood. This mass extinction brought about the demise of the dinosaurs and destroyed about 15% of all species alive at the time. Most scientists agree that this catastrophic event was caused by a large asteroid impact on what is now the Yucatan peninsula in Mexico. This asteroid may have been as big as 15 km wide, the size of a mountain. Evidence for an asteroid impact is found in part in the presence of high concentrations (up to 50 times background level) of the element iridium in a thin, worldwide, sedimentary rock layer laid down at the K-T boundary. Iridium is rare on Earth, but it is much more common in meteorites, which is consistent with the idea that an iridium-rich body colliding with our planet, vaporizing, and dispersing iridium all over. It is estimated that the energy released by the impact was equivalent to 2 million times that released by the most powerful hydrogen bombs ever tested by humans. The consequences were catastrophic. First, the collision ignited global giant fires, as shown by the widespread presence of soot in the K-T boundary layer. Second, while present in the atmosphere, soot blocked solar radiation, which must have resulted in global cooling. Third, the impact must have triggered violent earthquakes, tsunamis, and massive volcanic explosions. Finally, since the impacted region was rich in calcium sulfate, the vaporization and decomposition of this mineral must have produced sulfur dioxide which, when reacting with water vapor, produced sulfuric acid and hence acid rain. In short, the impact deeply disturbed ecological niches all over our planet, eradicating many species. But of course, changing ecological niches can also be seen as opportunity for non-extinct species to occupy newly created niches and potentially proliferate more extensively than before the disturbance. This is perhaps how mammalian species diversified more abundantly after the K-T event.

79

It should be noted that natural events are not solely responsible for the extinction of species. Indeed, it is estimated that over 1,000 species have become extinct since the year 1600 CE. In most cases, the cause of these extinctions is known: the presence of humans. Some species have been hunted to extinction (e.g., the moa, a giant, flightless Polynesian bird), and more recently, extinctions can be blamed on humans destroying natural habitats. According to some estimates, if the rate of extinction persists, it would take only about 100 years for 60% of all species alive today to go extinct. Astonishingly, this number is close to the cumulative percent extinction caused by the five great mass extinctions that took place between 450 million and 65 million years ago! Reflecting on these numbers, geneticists Scott Freeman and Jon Herron have dubbed our impact on natural habitats the “human meteorite.” Another mechanism of speciation is sexual selection. Here, potential speciation is caused by mate choice. To understand this, consider a species where a particular trait allows individuals of one sex to mate with greater success than individuals devoid of this trait. Since evolution is the same as differential reproduction, one can see that a trait favoring success in competition for mates can cause divergence within a species once this trait appears. This type of selection is supported by, among others, observations made with Hawaiian species of Drosophila. There, two species of the fly, D. heteroneura and D. silvestris, differ markedly by the shape of their head. Male (but not female) D. heteroneura have a wide head, whereas both sexes of D.sylvestris have a narrow head. Further, male heteroneura flies compete for females by grappling with one another. How could head shape and competition for females have led to speciation (divergence) in Hawaiian Drosophila? Let us imagine an isolated fly population ancestral to both D. heteroneura and D. sylvestris. All flies of both sexes have narrow heads and the males compete by grappling. Let us now imagine a mutation leading to the formation of a wide head in males accompanied by a behavioral change: combat by head butting. Assuming that head-butting is more successful than grappling, wide-headed males would copulate more often than their narrow-headed male competitors, thereby increasing the proportion of wide-headed males in subsequent progenies. This mutation then has a good chance of becoming fixed. Of course, this does not mean that narrow-headed males never copulate in the presence of wide-headed males and therefore no longer reproduce. Rather, narrow-headed males would avoid their head-butting rivals and copulate away from them, in effect, isolating themselves. Over time we thus see that sexual selection can lead to the formation of two new subpopulations from an original single population: wide-headed D. heteroneura and narrow-headed D. sylvestris. In this scenario, the two species have become reproductively isolated (and fully diverged) thanks to a mutation affecting the manner in which copulation is most successful. Is this scenario amenable to empirical verification? The answer is yes. As with many traits in nature, head width is slightly variable among male D. heteroneura. Observations have shown that the wider the head, the more frequently a male fly will win a head-butting contest and the more frequently it will copulate. These observations by themselves do not prove that this is how the two fly species diverged, but they strongly support the interpretation given above. There are other examples in nature of evolution by sexual selection: for instance, it is thought that ciclid fishes in lake Victoria in Africa, as well as ordinary sticklebacks, are currently diverging (speciating), the former based on body color and the latter based on body size. In the case of ciclid fishes, evolution by sexual selection also seems accompanied by speciation based on feeding habits: some cichlids are bottom feeders whereas others are surface feeders. Interestingly, both types of feeders can be seen to mate in aquariums, but they do not do so in nature, suggesting that reproductive isolation is in progress in these fishes. As we just saw, reproductive isolation accompanies the formation of new species. But reproductive isolation also exists if subpopulations are physically prevented from mating, for example by the existence of geographic barriers. Geographic isolation explains well why marsupials (e.g., kangaroos, koalas, wombats, etc.) are abundant in Australia but rare everywhere else in the world. The best explanation for this is that marsupials are poor competitors and do not thrive in niches where placental mammals also exist. Marsupials and placental mammals (e.g., lions, tigers, humans, etc.) derive from a common ancestor as shown by DNA-based phylogenies. This common ancestor lived about 180 million years ago and may have first evolved in Eastern Asia. Interestingly, at that time, Australia was not the island-continent that we know today. Rather, geological evidence shows that Australia was part of a supercontinent that started breaking up about 167 million years ago. Before that time, the ancestors of presentday marsupials were moving in the direction of what was to become Australia, while the ancestors of present-day placental mammals were not. As soon as Australia broke free, marsupials were in a sense “trapped” there, practically in the absence of competing

80

placental mammals. Thus, marsupials continued to proliferate abundantly and evolve in Australia, but much less so in the rest of the world. Evolution by geographic isolation is perhaps best documented among the 1,100 or so species of Drosophila that inhabit the Hawaiian islands. These flies differ considerably in body size, wing size, and body color. They also occupy many different ecological niches. Also, many of these different species are specific to one of the islands that constitute the Hawaiian archipelago. For example, going from west to east, D. hemispiza is found in Oahu, D. planitibia is found in Maui, and D. heteroneura is found in Hawaii. DNA-based phylogenies using the molecular clock show that D. hemispiza is the most ancient species, followed by D. planitibia and then D. heteroneura. Meanwhile, the geological past of the Hawaiian islands is well known: they did not emerge from the Pacific Ocean all at the same time. In fact, we know that Oahu is older than Maui, which in turn is older than Hawaii. Thus, the temporal sequence of the evolution of Drosophila in the archipelago matches the temporal sequence of the birth of the various islands. What scenario can explain the speciation of these flies, staring with the ancestral D. hemispiza present today in Oahu? One can imagine that a small number of D. hemispiza flies (even perhaps a single gravid female) were dispersed by the wind or on floating debris and so reached Maui soon after its emergence. Maui would of course have been devoid of any Drosophila. There, D. hemispiza would have been able to diverge and evolve independently from the ancestral population and eventually lead to a new species, D. planitibia. Subsequently, subpopulations of D. planitibia could have dispersed to reach Hawaii, after it had formed, where they would have differentiated into D. heteroneura. Let us also recall that genetic drift would have been the result of the dispersal of small numbers of individuals, meaning that gene frequencies in the newly founded populations, as they expanded, could have been very different from those in the ancestral population. This type of colonization of new niches by small subpopulations is called a founder effect. We will see later that some aspects of human evolution can also be explained by a founder effect. Finally, reproductive isolation can also be brought about by a change in chromosome number. This mechanism is particularly important in plants, which can tolerate significant changes in chromosome numbers, whereas animals usually cannot. In order to understand how chromosome numbers can be modified in eukaryotes, it is important to recall how gametes (reproductive cells) form during the special cell division mechanism known as meiosis. Contrary to all the other cells found in a eukaryotic organism, gametes contain a single set of chromosomes and are said to be haploid. For example, human eggs and sperm cells contain a chromosomal set consisting of 23 chromosomes. When a sperm cell fertilizes an egg, the two sets are put together to generate a diploid embryo whose cells now possess 46 chromosomes. To designate the diploid state in humans, one writes that the chromosome number is 2n = 46. Thus, the haploid state in humans is n=23. Since haploid gametes are derived from diploid cells, there must exist a mechanism that halves the chromosome number during gamete production. This mechanism is meiosis. Meiosis cannot take place in cells that contain an odd number of chromosomes because if this were the case, gametes so produced would not contain a full haploid set of chromosomes. For example, if 2n were equal to 45, it would be impossible to generate gametes each containing 23 chromosomes. It should also be kept in mind that rarely, meiosis fails to separate the two sets of chromosomes during gamete production. In the case of humans, these gametes would contain 46 chromosomes instead of 23. If a human sperm cell containing 46 chromosomes were to fertilize a human egg also containing 46 chromosomes, the product would be an embryo harboring 4n=92 chromosomes in its cells. If a 2n=46 sperm cell fertilized a normal n=23 egg, the result would be a 3n=69 embryo. Such chromosomal makeups are not compatible with life in humans and other mammals. However, plants are much more tolerant of changes in chromosome numbers. Let us consider a plant species where 2n=18 and where n is thus equal to 9, the number of chromosomes found in the male and female gametes of this plant. Let us also consider another, closely related, plant species that is sexually compatible with the first one but where 2n=20 and thus n=10. When the gametes from the first plant species fertilize the gametes of the second species, the hybrid plants resulting from this cross will contain 19 (9+10) chromosomes and thus, 2n=19. Meiosis in these hybrid plants cannot produce gametes containing whole haploid sets because 19 is not divisible by two. This hybrid is thus sterile. However, once in a while, meiosis completely fails to operate. Let us assume that faulty meiosis occurs in the male gamete of the above first parent, thus producing gametes containing 18 chromosomes. Let us also assume that concomitantly, faulty meiosis takes place in the female gametes of the second parent, producing gametes containing 20 chromosomes, also double the total number. The union of these two gametes will produce individuals where 2n=38 chromosomes. Meiosis will thus generate gametes with

81

n=19. Alternatively, chromosome doubling can accidentally occur in the hybrid itself, making possible the formation of functional gametes also with n=19. Thus, one can see how entirely new hybrid, fertile individuals can appear in crosses involving different species, provided chromosome doubling takes place. What is more, these hybrids are reproductively isolated from both parent species because the gametes of the hybrid now contain two different duplicated genomes: two each from the two different original parents. A union between a hybrid gamete and a gamete from either one of the original parents would produce individuals containing two sets of chromosomes from one of the original parents and a single set of chromosomes from the other original parent. Such individuals are sterile due to chromosome imbalance. Therefore, one can see that the new hybrids can only reproduce among themselves and are thus reproductively isolated. The example given above is real: this is how Brassica napus (rutabaga, 2n=38) was produced from the hybridization of Brassica oleracea (cabbage, 2n=18) with Brassica campestris (turnip, 2n=20). It is estimated that up to 1,000 species of land plants—including many cultivated plants—are the result of many polyploidization events.

EVOLUTION, DEVELOPMENT, AND GENE NETWORKS The study of embryonic development and the genes involved in this process helps us understand how animal body plans evolved. Many of the genes controlling development are linked in networks and are the result of gene duplication events. We saw earlier that all life forms share in common a good number of genes that possess similar base sequences and often code for similar functions. These genes are said to be homologous. Their homology indicates common ancestry. Often also, in eukaryotes, homologous genes are found in families, or clusters, composed of variants of each other. In other words, the members of these families arose by duplication followed by mutation of an ancestral (and presumably single) copy of the original gene present at the root of each family. Gene families are not rare. For example, roundworms (C. elegans) contains 665 gene families composed of two genes. This number is 93 for families composed of four genes, and 98 for families of ten genes or more. Gene duplication can commonly be observed in the laboratory and is known to be the result made by a mechanism operating during meiosis. Before gametes are formed, chromosomes line up in pre-gametic cells and undergo the phenomenon known as cross-over. During this process, chromosomes overlap and exchange pieces of material. If the chromosomes are perfectly lined up—which happens in the vast majority of cases—chromosomes that swap material remain normal and intact. However, if the lining-up is imperfect, unequal cross-over ensues, with the result that some chromosomes will actually gain a piece of chromosomal material from the neighboring chromosomes with which they overlap. The result is the duplication of sometimes long stretches of DNA. It is estimated that the human genome has gained up to 26 million base pairs through gene duplication since the human-chimpanzee split 5-7 million years ago. What is the fate of the genes present in the DNA fragments that now form one or several gene families? The original genes will continue to function normally but, since they are not needed to ensure full, normal function, duplicated genes can mutate and become either non-functional or else acquire entirely new functions over time. It is of course the latter possibility that can have major evolutionary consequences because it adds new information to genomes. For example, gene duplication explains very well the existence of several types of human hemoglobins, from embryonic and fetal hemoglobins to hemoglobins found in adults. Of course, these hemoglobins perform the same function, the ferrying of oxygen. But that is not all: gene duplication also explains well how animal body plans and organs have evolved over time. The study of embryonic development coupled with the study of evolution constitutes the rather new branch of biology called evolution and development or, more concisely, evo-devo. Since evodevo research sheds light on the origin of species, it is also called molecular paleontology. A young embryo is originally composed of identical cells all containing the same genes. But as the embryo develops, some of its cells receive molecular signals—often stored in the egg in particular locations—that turn some genes on and some off. Some of the activated genes have the ability to activate other genes or suppress their activity. This is how cellular differentiation takes place in a growing embryo: cascades, also called networks, of genes that are turned on or off determine the genetic activity of cells and tissues progressively acquiring specific functions. In addition, when one thinks about speciation, one realizes that the shape of an organism, or its body plan, is an important factor in discriminating one species (or phylum) from another. It is well known that particular body plans are first determined in a growing embryo. It thus makes sense to study the correlation between embryonic devel-

82

opment and gene cascades in order to understand how different body plans, and hence species, appeared over time. It turns out that many gene cascades in eukaryotes are homologous and have evolved through gene mutation. For example, homologous genes responsible for eye development have been cloned from the fruit fly, the mouse, and humans. Even though fruit flies have compound eyes whereas mice and humans do not, a “master” gene responsible for eye formation in all three species shows very significant sequence homology. In fact, when the mouse “master” gene is introduced into fruit flies, it determines the formation of extra compound eyes. This experiment demonstrates that this “master” gene is evolutionarily conserved, not only in its sequence, but also in its function: this gene orders the system to “just make eyes.” Other genes under its control then determine the nature of the eyes—compound or not—in a cascade of events that is species-specific. Another example, this time dealing with body plan, is the Hox gene family. These genes were first discovered in mutant fruit flies that, for example, developed two pairs of wings instead of just one. The conclusion of these experiments was that the Hox genes are responsible for body plan expression. An extensive study of Hox genes across all major animal phyla showed that these genes are ubiquitous and share several common characteristics. First, they occur in clusters, suggesting that the original Hox sequence duplicated over evolutionary time. Second, their linear arrangement on chromosomes corresponds to the temporal expression of these genes in a developing embryo: the first gene is expressed first, the second gene second, and so on. In addition, each Hox gene contains a highly conserved 180-base pair stretch of DNA called a homeobox. The proteins coded for by Hox genes use the homeobox to bind to other genes and thus regulate their transcription, that is, their ability to make messenger RNA. Therefore, Hox genes are near the beginning of cascades of gene networks that determine the body plans and organ formation in animals. For example, we now understand well how a structure like the heart has evolved, from a simple contractile tube as found in the sea squirt to the four-chambered heart of reptiles, birds, and mammals. This increase in complexity is reflected in the increased complexity of at least five gene networks, each composed of several gene families. One of these gene families, Tbx, is composed of a single gene in the cnidarians, three in insects, five in amphibians, and seven in animals having a four-chambered heart.

Figure 7. Hox genes in Drosophila, Amphioxus, a hypothetical ancestor, and the mouse. Top: the 8 Drosophila Hox genes arranged in a single cluster. Lines connect each Hox gene to the segment of the fly embryo where it is expressed; Middle: the 10 Hox genes found in Amphioxus. Amphioxus is a small (a few centimeters in length) animal that lives buried in the sand of shallow areas of temperate seas. It is seen as a descendant of the precursor of vertebrates. Based on DNA homologies, a putative common ancestor of insects and vertebrates may have had 6 Hox genes as shown. Bottom: the 38 mouse Hox genes arranged in four clusters. These clusters are located on different chromosomes. (Source: http://www.press.uchicago.edu/books/gee/carroll1.jpeg).

83

If indeed the Hox gene families result from the duplication over time of one or several Hox genes, animals evolutionarily closer to the common ancestor should contain fewer Hox genes than animals that appeared later. This is indeed the case: for example, cnidarians and sponges have five Hox genes (which are not clustered), tube worms have ten, mice have 38, and zebra fish have 52. In most vertebrates, Hox genes are arranged in four clusters, whereas Drosophila has one single cluster of eight Hox genes (figure 7). As we just saw, the same principle applies to other gene families that code for the development of the heart. It should be noted that the equivalent of animal Hox genes also exist in plants, where some of these genes are called MADS-box genes. There too, MADS-box genes are responsible for the development of structure, such as the formation of petals, sepals, and sexual organs. The emerging take-home message of evo-devo studies is that complex structures in eukaryotes are under the control of gene clusters (Hox genes, MADS-box genes, etc.) that direct the expression of gene networks whose protein products end up composing the highly differentiated organs of animals and plants. Evo-devo is still in its infancy, and it will take time to unravel the genetic cascades at work in complex eukaryotes, as well as the evolutionary path of these cascades. All in all, evo-devo studies powerfully buttress the notion that complexity is at least in part explained by the duplication of regulatory genes that acquired new functions by mutation, and hence allowed the diversification of structure and function seen today in the living world.

THE LAST FRONTIER (for now): HUMAN EVOLUTION AND THE FOSSIL RECORD Pre-human and anatomically modern human fossils paint an interesting picture of human evolution that goes back more than four million years. Fossil finds strongly suggest that Homo sapiens first appeared in Africa. Humans are part of the phylum “chordates.” We are thus deuterostomes equipped with a body cavity, bilateral symmetry, and differentiated organs. But of course, we are much more than that. In particular, we have developed sophisticated cultures thanks to our big and complex brain, and we invented science. No other animal can make these claims. What, then, does science say about our origins? In the following section, we will focus on information provided by paleoanthropology, genetics, and cultural anthropology. Paleoanthropology is the study of human and pre-human fossils, collectively called hominids. In many cases, fossil bones can be dated with accuracy, using radiometric and non-radiometric techniques. In what follows, the dates were confirmed by several independent techniques and are widely accepted by the scientific community. Humans are characterized by a large cranial capacity and bipedalism. Thus, paleoanthropologists focus on these two traits in particular to determine whether some fossils should be included in the human lineage. The oldest discovered fossil that seems to be that of a bipedal creature is the 5-6 million year-old Orrorin tugenesis discovered in Kenya. The age of this animal puts it close to the chimpanzee/pre-human split which is thought to have occurred between five and seven million years ago, based on rare fossil finds and phylogenetic dating. Much more information about later bipedal animals called Australopithecus afarensis is available because more fossil remains of this genus have been found. Australopithecus appeared about 4.2 million years ago and became extinct about 1.5 million years ago. These animals were about 1 meter tall and weighed about 20 kilos. They had the cranial capacity of a modern chimp. They lived in multiple places in sub-Saharan Africa but they have never been found in the rest of the world. Australopithecus may or may not have evolved from Orrorin. Our own genus, Homo, appeared in Kenya and Ethiopia about 2.5 million years ago. It is thought that this new genus (and its first species) Homo habilis, evolved from Australopithecus. H. habilis was, of course, bipedal, had a bigger cranial capacity than Australopithecus, and was clearly able to make simple tools. H. habilis fossils have been found only in Africa. This species became extinct about 1.5 million years ago. But it is Homo erectus, probably a descendant of H. habilis, that we would probably recognize as “human,” provided that these creatures wore a hat to mask their sloping forehead and prominent brow ridge. H.erectus appeared in Africa about 1.9 million years ago, made more sophisticated stone tools that they used to butcher animals, and had domesticated fire. There is also evidence that H. erectus was able to build simple shelters. These creatures measured up to 1.8 meters tall, weighed up to 70 kilos, and had a cranial capacity twice that of a chimp (modern humans have a cranial capacity about 3.5 times that of a chimp, 1,350 cubic centimeters, see figure 8). H. erectus became extinct, perhaps as late as 150,000 years ago.

84

Figure 8. Skulls of H. sapiens, H. erectus, and A. afarensis (l. to r.). Source: collection of P. Lurquin

H. erectus occupies a special place among ancient hominids. Indeed, these creatures traveled very long distances. In fact, H. erectus fossils have been found not only in Africa, but also in Europe, Southeast Asia (Indonesia), and East Asia (China). Since all the H. erectus fossils found outside Africa are younger than the African fossils, it is reasonable to think that these hominids migrated out of Africa rather than evolving independently in several different locales. Let us also remember that the ancestors of H. erectus are found in Africa only. The current view of paleoanthropologists is that H. erectus then evolved, about one million years ago, into Homo antecessor, the putative common ancestor of Neanderthals and modern humans. Neanderthals became extinct about 30,000 years ago, leaving Homo sapiens, ourselves, the only extant hominids on the planet. It should be noted that the “streamlined” hominid lineage presented above is in all likelihood a great simplification. There is evidence of a complicated hominid radiation in Africa going back several million years, and we do not really know whether all these hominids interbred or not, making our origins somewhat murky. What is known with greater confidence is that anatomically modern humans and Neanderthals interbred to some extent (see later). However, why and how Neanderthals went extinct remains a mystery, but we know that their extinction was not due to massive incorporation into the H. sapiens gene pool. In conclusion, early hominids appeared and evolved in Africa. Is this also true for H. sapiens? The answer is yes. The fact that H.sapiens first appeared in Africa is strongly supported by fossils finds. The oldest H. sapiens fossils known today were found in Ethiopia in 2003. They were dated at 154,000-160,000 years ago. These humans were not fully modern, but only in the sense that they showed a somewhat protruding brow ridge, categorizing them as “archaic.” Their brain size was the same as ours. Completely modern-looking, 130,000-year old human fossils, were also found in Ethiopia, making this corner of East Africa the likely birthplace of our species. Therefore, Africa occupies a very special place in the world: it is only there that we find an uninterrupted line of bipedal fossils, from Orrorin to Homo sapiens. The tale of fossils is that we are all Africans.

WHAT DNA SAYS ABOUT THE ORIGINS OF HUMAN BEINGS Human phylogenetic trees based on Y-chromosome DNA and mitochondrial DNA also show that we originated in Africa.

85

We saw earlier how the concept of the molecular clock allowed scientists to build a universal phylogenetic tree relating all extant species studied so far, going back about 3.5 billion years. Needless to say, the same concept can be used to construct phylogenetic trees connecting human populations. Indeed, existing human populations are sufficiently genetically diverse to allow one to search for subtle DNA base pair changes—mutations—(or, conversely, DNA base pair conservation) in humans inhabiting different continents or parts thereof. As with the universal tree of life, one can thus hope to determine which human populations arose first and how these populations diversified over time. Ideally, looking for base pair changes or conservation should be conducted at the level of the whole human genome. But given the large size of our DNA (in excess of 3.1 billion base pairs) it is so far too expensive to sequence the hundreds of human genomes necessary for phylogenetic tree building. Nonetheless, DNA sequencing techniques are rapidly getting faster and cheaper so that routine whole genome sequencing should be available in the not too distant future. Meanwhile, geneticists concentrate on shorter stretches of human DNA, in particular the male-associated Ychromosome DNA and mitochondrial (mt) DNA found in both men and women. There are excellent reasons to sequence the Y chromosome for phylogenetic purposes. First, the Y chromosome is quite short relative to all other chromosomes, while still containing 60 million base pairs (figure 9).

Figure 9. Scanning electron micrograph of a human Y chromosome (right) next to an X chromosome. Source: http://scitechdaily.com/images/Human-Y-Chromosome-Much-Older-Than-Previously-Thought.jpg

Second, most of the Y chromosome contains no genes, meaning that those sections of the Y devoid of genes are insensitive to natural selection and only evolve by genetic drift. Recall that the molecular clock “ticks” most regularly when natural selection does not influence gene frequencies in populations. Thus, scientists sequence portions of the Y chromosome known not to contain any genes. Third, the Y chromosome is exclusively passed down from fathers to sons. In other words, Y chromosome sequencing studies tell us the history of human males. To understand the following, remember that phylogenies tell us which organisms or populations appeared first and also tell us at which point in time a particular change in a DNA base pair (a mutation) took place. Also, similar DNA sequences are grouped into blocks called haplogroups. Figure 10 represents a simplified human phylogenetic tree based on the Y chromosome. In this tree, the common ancestor of all humans is located at the top. One thing is immediately noticeable, haplogroups A and B, the closest to the common ancestor, are found in Africa only, in particular in East Africa.

86

Figure 10. A highly simplified phylogenetic tree of human Y chromosome haplogroups found in various parts of the world. The symbols in the branches of the tree designate the specific mutations that characterize each branch. Dozens of mutations have been mapped and dated. NG = New Guinea. Source: US National Academy of Sciences

This shows that the most ancient Y chromosomes are found in Africa, which confirms what the fossils also tell us. All other haplogroups derive from African haplogroup B and are also found elsewhere in the world. Haplogroup Q (not shown) is the youngest and is found among Native Americans, making them the last population to differentiate. This is in good agreement with paleontological evidence, which indicates that Native Americans colonized the Americas no earlier than about 35,000 years ago. As mentioned, a phylogenetic tree is also informative regarding the time at which haplogroups differentiated. When, then, does genetics say that the first H. sapiens Y chromosome appeared in Africa? This date is about 103,000 years ago, significantly younger than the 160,000 years provided by the fossil record. We will see below why this is possibly so. Interestingly, the equations of population genetics allow us to estimate how many males composed the group from which are all descended. This group (tribe?) numbered between 1,000 and 10,000 men. But what about female lineages? It turns out that mtDNA is exclusively passed down from mothers to their children, both male and female. This is because a sperm cell fertilizing an egg does not introduce its mitochondria into this egg. Thus, a growing human embryo contains maternal mitochondria—and mtDNA—only. This means that, by studying mtDNA sequences in both males and females, one can construct female lineages going back to the beginning. In contrast to the Y chromosome, mtDNA is very short, since it contains only 16,600 base pairs. It is thus easy to sequence in its entirety (figure 11). Here again, geneticists focus on the portions of mtDNA thought to evolve mostly by drift, such as the D-loop. An mtDNAbased phylogenetic tree is very similar to the Y chromosome tree and confirms its findings. For example, human females also appeared in East Africa. However, the genetic date for these first females is much closer to the paleontological date: 168,000 years. Here also, the size of the female portion of the first ancestral population is estimated to be between 1,000 and 10,000 individuals. How can one reconcile the divergent phylogenetic age with the paleontological age for the initial male population as well as that of the initial female population? On the one hand, it may well be that more data will bring these dates closer together. On the other hand, the discrepancy may be due to an early cultural factor, polygyny, that is, the practice of one man marrying—and having children with—several women at the same time. This practice leads to a male genetic bottleneck—a narrowing of the gene pool—because only a fraction of the males in the population reproduce. It turns out that a genetic bottleneck reduces the apparent 87

age of a population, in this case, the male fraction of the population. To understand this, consider the following anecdote based on a true story. The chief of a village in the Amazon claimed that he only had access to women and that all the children in the village were his exclusively. Assuming this were true, a DNA analysis of the Y chromosomes of the boys in the village would show that the chief was indeed the founding father. But, this analysis would also show that the date of origin of the entire male population of the village was exactly the age of the chief, say perhaps 40 years or so! This very young age is clearly impossible since the chief himself would have had a father, a grandfather, and so on. Therefore, polygyny can reduce the apparent age of a male population. Of course, we do not know whether and for how long the first modern human populations practiced polygyny and perhaps we never will. Now that we have a good grasp on where and when the first humans appeared, let us see how fossils and phylogenetic trees help us understand how humans migrated and occupied the planet, starting about 150,000 years ago. We will also see that while migrating, H. sapiens mated and had progeny with two other types of human populations, the Neanderthals and the Denisovans.

Figure 11. A map of human mitochondrial DNA. This DNA is circular and double helical. Boxes and letters designate some of the 37 genes carried by mtDNA. The D Loop is where mtDNA starts its replication. It varies significantly among individuals. The cox genes code for subunits of cytochrome oxidase, an enzyme involved in oxygen metabolism (respiration).

HOMO SAPIENS FIRST COLONIZED AFRICA AND THEN, PROGRESSIVELY, THE REST OF THE WORLD, STARTING WITH ASIA Prehistoric human migrations can be traced by dating human fossils and looking at the distribution of Y-chromosome DNA and mtDNA haplogroups around the world. By comparing the sequences of modern whole human genomes and the sequences of extinct Neanderthals and Denisovans, two types of populations that coexisted for tens of thousands of years with H. sapiens, it was revealed that some of us carry genes from these extinct humans. As we saw, quasi-modern human fossils dating back about 160,000 years have been found in Ethiopia. Somewhat younger fossils, about 130,000 years old, have been discovered in both Ethiopia and Israel. More recent fossils yet, 90,000-120,000 years old have been found in South Africa. The next older fossils were discovered in Australia and dated at 40,000-60,000 years ago. After about 25,000-40,000 years ago, human fossils are found practically everywhere, except in the Americas and Polynesia. Unfortunately, human fossils are rare, and not enough have been discovered to build a full picture of prehistoric human migrations. However, these migrations can be retraced by looking at DNA haplogroups of known age, as in figure 10. Many human populations all over the world have been DNA-typed, and we have a good idea of the prevalence of some haplogroups over others in many parts of the world. For example, recall that African haplogroups are the most ancient, whereas European and Native American haplogroups, for example, are much more recent. It is also possible to determine which haplogroups derived from which other ones and when. To understand this, recall that, while traveling over thousands of kilometers over ten of thousands of years, hu88

mans continued to mutate and hence generated more derived haplogroups that scientists have been able to pinpoint geographically. Thus, we also know where on the planet new haplogroups appeared. For simplicity, we will not refer in the following to the different haplogroups and their nomenclature. DNA-based evidence shows that human groups left East Africa to colonize sub-Saharan Africa about 100,000 years ago. Then, about 50,000-60,000 years ago, humans groups left Africa, possibly through the Middle East (thus going first in a North and then Northeast direction) or else by crossing the Red Sea, due East from Ethiopia. By following coastal routes, they reached the Indian subcontinent. It should be remembered that back then, the Sahara was not a desert and would not have prevented our hunter-gatherer forebears from feeding themselves. What is more, soon after the first modern humans left Africa, they mated and had children with populations that had preceded them in the Middle East/West Asia by many thousands of years, the Neanderthals. Neanderthal humans—an archaic form anatomically distinct from modern humans-- had settled in Europe and West Asia tens of thousands of years prior to the migration of modern humans out of Africa. Neanderthal DNA was isolated from fossil bones and sequenced in 2010. When compared to modern human DNA, this sequence revealed that non-Africans harbor between 1 and 4% Neanderthal DNA in their genome. Thus, hybridization between modern humans and Neanderthals did not occur in Africa, but took place probably in the Middle East, about 80,000 to 50,000 years ago. Further, it was also shown that another archaic population distinct from the Neanderthals--though closely related to them—the Denisovans of Central/East Asia, contributed 1 to 6% DNA to modern human populations on their way to East Asia and Oceania, perhaps 45,000 years ago. Thus, for example, current Melanesian populations of Oceania carry both Neanderthal and Denisovan DNA. It is not clear what advantage(s) are conferred by the archaic human genes. One possibility is that archaic humans like the Neanderthals and the Denisovans possessed better defenses against pathogens, a trait that would have been favored by natural selection in anatomically modern/archaic hybrids. Going back to the anatomically modern human migrations, DNA typing reveals that, after colonizing North India, then Central Asia, then Southeast Asia and Australasia, about 40,000 years ago, Central Asian subgroups migrated East (towards China) and West (towards Europe). Finally, the Americas were colonized about 35,000-15,000 years ago by Siberians. All these migrations evidently took place mostly on foot, with simple boats needed only a few times. Indeed, our planet was cooler than it is today and much water was locked in glaciers, making sea levels significantly lower. This meant that Australia, for example, was reachable by crossing short stretches of ocean. In addition, Asia and North America were connected by a land bridge which is now under the Bering Strait. Polynesia was colonized much later, during historical times, about 1,000 years ago. The major human migrations are shown in figure 12. In summary, it took humans about 40,000-50,000 years to colonize the world outside Africa, barring Polynesia. As a final point, it would be a mistake to think that throngs of humans were on the move. Population density was quite low everywhere, including in ancestral Africa. Rather, one should imagine small splinter groups leaving small main groups and migrating in different directions. In other words, each subgroup created a bottleneck (a reduction in population) and a founder effect, explaining why haplogroups are still identifiable today in specific locations. Let us now turn to some aspects of cultural anthropology and show how evolutionary thinking can clarify a distinctive human property: culture.

BIOLOGICAL EVOLUTION AND CULTURAL EVOLUTION INTERSECT Milk consumption was developed by some cultures but not by others. We show here what the biological premises of this choice are. Also, it is possible to view human languages, as well as the spread of agriculture during the Neolithic, through an evolutionary lens. Most of us know a person who is lactose-intolerant, that is, a person unable to digest the sugar present in milk, lactose. Conventional wisdom also tells us—wrongly—that lactose intolerance is due to a mutation. In fact, the opposite is true: it is lactose tolerance which is the result of a mutational event which took place during the Neolithic, a period that extended from about 10,000 to 5,000 years ago in the Middle East and Europe. Just like other mammals, humans are able to digest lactose as infants. But later in life, some of us lose that capacity and become lactose-intolerant. Lactose intolerance is not spread evenly in the world. For example, its frequency is 99% in Thailand and 89

it is generally very high in East and Southeast Asia. On the contrary, lactose intolerance is only 2% among the Danes and 0% among the Czechs. How can one explain this? First, let us contrast animal husbandry and Europe and East Asia. In Europe, bovines are used for the meat and their milk. In East Asia, cattle also exist, but they are used as draught animals and sometimes for their meat. They are not used for milk production. In fact, pigs are the traditional source of animal protein and nutrition in East Asia, and pigs are not dairy animals. In other words, European societies are dairy-using cultures whereas East Asian societies are not. What happened during the Neolithic in Europe was the domestication of cattle, in particular for their milk. Indeed, Neolithic Europeans must have discovered the high nutritional value of milk. However, lactose intolerance was present everywhere at high frequency. But meanwhile, some individuals in European populations had spontaneously mutated to become lactose-tolerant. These individuals then benefited more from the nutrition from milk than their non-mutant counterparts. Hence, the former got healthier, stronger, and thus reproduced more frequently because they had better fitness. Over several thousand years, lactose tolerance became prevalent in European cultures. This did not happen in East and Southeast Asia because there, in the absence of a dairy-based culture, lactose-tolerant mutants had no fitness advantage over lactose-intolerant people. Thus, lactose tolerance never spread in this part of the world.

Figure 12. A simple summary of human Paleolithic migrations. The colonization of Polynesia has been omitted. Source: Stone, Lurquin and Cavalli-Sforza. Genes, Culture, and Human Evolution. Blackwell Publishing, 2007.

Another place where genetics, phylogenetic trees in particular, intersects with culture is in the area of linguistics. Linguistics is the study of languages, including their origin and relatedness. It has been known for a long time that languages can be categorized in families based on their similarities. For example, the Indo-European family groups most European languages together with languages spoken in Iran and the Indian subcontinent. This grouping suggests that all these languages are derived from a single language called proto-Indo-European (PIE) that may have been spoken about at least 5,000 years ago. As PIE speakers migrated towards the West (Europe) and East (Iran, India, Pakistan) PIE evolved and diversified, in a manner analogous to the diversification of genetic haplogroups. Some researchers think that the same principle applies to all languages. Figure 13 shows how a genetic tree of human populations corresponds well to a tree of language families. Since human phylogenies show that we are all descended from East Africans, there exists the intriguing possibility that all human languages are descended from the language spoken by our forebears, perhaps as early as 150,000 years ago or so. It should be noted that not all linguists accept this interpretation. Finally, as a last example of nature/culture intersection, let us review what is known about the spread of agriculture during the European Neolithic. From about 150,000 years ago till about 9,000 years ago, our ancestors were hunter-gatherers who survived on game and other animals they hunted, and on wild plants and fruits they collected. This period is called the Paleolithic. A great transition from this way of life to the development of agriculture and the establishment of the first cities then occurred during the Neolithic period. It is generally agreed that this transition first took place in the Middle East, perhaps near today’s border between Turkey and Iraq. Then, this new culture spread from the Middle East to the Westernmost confines of Europe over a pe90

riod of about 3,000 years. The timing of this spread is well documented by carbon-14 dating. But what was the mechanism of this spread? Was this a case of cultural diffusion, in which neighboring populations learned by imitation? Or was this a case of demic diffusion, where people on the move take their culture—including agriculture—with them and establish it in the new locales that they occupy? The study of Middle Eastern and European Y-chromosome DNA haplogroups has shed considerable light on this issue.

Figure 13. Matchup between an evolutionary tree of languages and an evolutionary tree of human populations. Source: Stone, Lurquin and Cavalli-Sforza. Genes, Culture, and Human Evolution. Blackwell Publishing, 2007.

By sequencing (and thus dating) and comparing haplogroups found in Europe, it is estimated that, overall, haplogroups of Middle Eastern Neolithic origin constitute about 50% of the total, the other 50% representing haplogroups of Paleolithic origin. However, this distribution is just an average. When the two haplogroups (Neolithic and Paleolithic) are superimposed on a map of Europe (figure 14), it is apparent that Neolithic haplogroups are much more abundant in Southeastern Europe—thus close to the Middle East—than they are, for example, in Northern Germany. This is consistent with Neolithic people leaving the Middle East and first colonizing Greece and the Balkans, where they mostly displaced Paleolithic people. Farther away from the Middle East, this phenomenon became less frequent, as the Neolithic migrant population became more diluted among the Paleolithic people. Interestingly, the island of Sardinia, shows no Neolithic Middle Eastern influence. To sum up, DNA data support the idea that agriculture spread by demic diffusion without, however, negating a possible contribution of cultural diffusion.

91

Figure 14. A map of Europe and part of the Middle East showing the frequencies of Neolithic Y haplogroups (white circles and sectors) vs. Paleolithic Y haplogroups (black circle and sectors). This pie chart gives the most probable distribution. Source: Stone, Lurquin and Cavalli-Sforza. Genes, Culture, and Human Evolution. Blackwell Publishing, 2007.

CONCLUDING REMARKS The enormous variety and complexity of the eukaryotic world can now be analyzed and sorted out thanks to the science of genomics, our ability to sequence genomes and learn how they are expressed. The branch of science called evolution and development, though young, has already enabled us to understand some aspects of body plan and organ evolution. Scientists know very well that they have only scratched the surface, however. Nevertheless, our findings strongly support the notion of common ancestry of all things alive. Not only that, but we have also discovered much about human origins, with our common ancestors living in Africa a long time ago and, from there, colonizing the world. Last but not least, an evolutionary perspective also allows us to revisit some old questions of cultural anthropology. In short, evolutionary ideas enable scientists to cross disciplines and thus study our world in a multi-pronged fashion that was not widely available even in the recent past.

CONTROVERSIES Which is correct: the uniregional model or the multiregional model for human evolution? The model of human evolution presented above can be called a uniregional model with hybridization. This is because it supports the idea that anatomically modern human beings appeared in one location only, Africa. Recent findings indicate that some subpopulations that left Africa then hybridized to a low extent with two populations of archaic humans, the Neanderthals and the Denisovans. This model is accepted by most scientists, but not by all. Some paleoanthropologists and geneticists favor a multiregional model for human evolution in which H. sapiens evolved quasi simultaneously in at least four different locations: Africa, Europe, Asia, and perhaps also Australasia. The multiregional model also posits that modern humans evolved from four geographically different branches of H. erectus. At least, this is one incarnation of the multiregional model, of which there exist several shifting variants. So far, the genetic evidence does not much favor the multiregional model. For example, a massive study of Asian Y chromosomes shows that the original Asian Y-DNA sequence appeared in Africa, not in Asia, thereby weakening the hy-

92

pothesis that Asians evolved from an Asian H. erectus population. On the other hand, the original uniregional model, which did not take into account hybridization with archaic humans, was not completely correct either. As usual, research continues.

FURTHER RESEARCH What differentiates us from chimps? The chimp and human genomes have been fully sequenced. This knowledge should in principle tell us what differentiates us most from our closest cousins. For now, we already know that humans possess a gene variant called FOXP2 that is not present in chimps. Interestingly, this gene controls brain structures necessary for language and speech. As interestingly, Neanderthals harbored the same FOXP2 variant as modern humans. Furthermore, the human senses of hearing and smell (and the genes that control them) have evolved more rapidly in humans than in chimps. The meaning of these findings is not clear at the present time. Finally, there is some preliminary evidence that the human brain might still be evolving. Further research will definitely concentrate on aspects of the development and neurogenetics of our brain.

DISCUSSION QUESTIONS Describe the timeline and the major innovative steps in eukaryote evolution. Discuss the various natural processes that lead to speciation in animals and plants. What are gene duplication and gene networks? What is the relationship between evolutionary studies and the study of embryo development? What is the role of Hox genes? How do DNA studies help us understand human origins? Can you think about other nature/nurture intersections in humans?

REFERENCES Balter, M. 2005. Are humans still evolving? Science 309:234-237 Cavalli-Sforza, L. L. 2000. Genes, People, and Languages. New York, NY: North Point Press De Duve, C. with N. Patterson. 2012. Genetics of the Original Sin. New Haven & London: Yale University Press Freeman, S. and J. C. Herron. 2007. Evolutionary Analysis, 4th ed. Upper Saddle River, NJ: Pearson Prentice Hall Hammer, M. F. 2013. Human hybrids. Scientific American 308(5):66-71 (May 2013) Lurquin, P. F. and L. Stone. 2007. Evolution and Religious Creation Myths: How Scientists Respond. New York, NY: Oxford University Press Pennisi, E. 2006. Mining the molecules that made our mind. Science 313:1908-1913 Stone, L., P. F. Lurquin and L. L. Cavalli-Sforza. 2007. Genes, Culture, and Human Evolution: A Synthesis. Malden. MA: Blackwell Wells, S. 2002. The Journey of Man. New York, NY: Random House This section is based in part on chapter 3 from “The origins of life and the universe” by Paul F. Lurquin, Columbia University Press, 2003.

93

L. Luca Cavalli-Sforza (1922- ) L. Luca Cavalli-Sforza (henceforth Cavalli) has had a distinguished career in genetics. He was born in Italy, where he earned an MD degree just before the end of World War II. His passion, however, was not in medicine, it was in pure science.

L. L. Cavalli-Sforza. Source: http://www.nature.com/news/2007/071017/images/news.2007.166.jpg

This is where he made his landmark contributions. First, his early work in the 1950s was instrumental in the discovery and explanation of horizontal gene transfer by conjugation in bacteria. But in the early 1960s, Cavalli shifted his emphasis to human genetics. As he said, “humans are more charismatic than bacteria.” Having gained a strong background in statistics, both in Italy and in England (with Sir Ronald Fisher), Cavalli turned his attention to human population genetics. He soon discovered that human populations evolve mostly by drift, a concept that has been useful in our understanding of early human evolution. His research also led soon thereafter to the notion that agriculture spread through demic diffusion in Neolithic Europe. Cavalli then relocated to Stanford University in 1970. There, he started a research program based on Y chromosome and mtDNA studies that was used to build models of human origins in Africa and showed how humans spread all over the globe. Cavalli is also interested in cultural evolution and linguistics. With his Stanford colleague, Marcus Feldman, he has developed a theory of cultural evolution that attempts to unite biological and cultural factors. This theory is still in its infancy and awaits empirical confirmation. Some have called Cavalli “the greatest human geneticist alive.” His first scientific biography was written by Linda stone and this author1. Cavalli now lives in Italy.

Linda Stone and Paul F. Lurquin. 2005. A Genetic and Cultural Odyssey: The Life and Work of L. Luca Cavalli-Sforza. New York, NY: Columbia University Press.

94