Discover more from Razib Khan's Unsupervised Learning
Current status: it’s complicated
Getting over Out-of-Africa, our rebound in the meantime and the quest for a forever theory
Related: Yo mama's mama's mama's mama... etc., Our African origins: the more we understand, the less we know and Out-of-Africa's midlife crisis.
If you followed human evolution in the second half of the 20th century, your image of a scientist uncovering the mysteries of our origins might have featured a figure in a pith hat, brush in hand, crouched on a sun-baked African plain. The gold standard was finding well-articulated fossilized remains of ancient human bones by laboriously sifting through the soil. It was usually thankless work; year after year of digging could yield absolutely nothing to report to your institution or granting agencies. But when intrepid researchers hit pay dirt, the insights and the rewards were immense. Donald Johanson, who discovered “Lucy,” the first member of Australopithecus afarensis to be described in the literature, went from being an obscure professor at an Ohio university to a celebrity scientist extensively written up in The New York Times and profiled on national television like 60 Minutes.
The honest image of a scientist plumbing our species’ origins today bears no resemblance to this because science has utterly changed; star geneticists Svante Pääbo and David Reich are found in front of the computer, not out in the field. And yet fossils do still matter. The 2017 discovery of an individual with modern human features at Jebel Irhoud in Morocco dating to 300,000 years ago overturned previous orthodoxies in one fell swoop. But with only 6,000 fossilized human remains in various states of completeness ever discovered in over a century of paleoanthropology, the statistical power of these data to explore the range of evolutionary pathways leading to Homo sapiens is starkly limited. With only a few exceptions, like the site at the Rising Star Cave in South Africa, where Lee Berger’s team discovered 15 Homo naledi individuals and 1,500 fossil components, sample sizes of one individual are the expectation. And even the more complete fossils are usually fragmentary; Johanson retrieved only 40% of Lucy. Though you might measure and quantify hundreds of features in a single complete skeleton, more often than not the entire prize find of a career find might comprise as little as a tooth or one incomplete portion of a skull. This is one reason that over the last few decades, analysis of DNA has graduated from underpowered supplement to paleoanthropology to full partner in the enterprise of human evolutionary science.
In the 20th century, molecular biologists tested evolutionary questions through primitive comparisons of proteins or phylogenetic reconstructions of the ~1,000 positions in mitochondrial DNA’s hypervariable sequence. But in the 21st century the whole of the human genome has been mapped, providing millions of positions that reveal unique information about our species’ genealogy. And, in 2010, Pääbo and his collaborators published whole-genome sequences of two extinct human lineages, Neanderthals and Denisovans. Now, not only could geneticists reconstruct human evolutionary history by drawing on the surfeit of data yielded by present-day populations, they could minutely sample key points in our evolutionary past, reducing the number of alternative pathways in our range of hypothesized phylogenetic trees.
The pile of data injected into the field from paleogenetics provides researchers with new tools to satisfactorily answer simple long-standing questions. When geneticists first looked at the genome sequence of Neanderthals more than a decade ago, they immediately noticed that compared to modern samples, the Neanderthal was genetically closer to everyone outside of Africa than to any modern humans within that continent. When they sequenced another 60,000-year-old sample from Denisova cave in Siberia, they discovered an individual who was genetically closer to Neanderthals than to modern humans, but distinct enough to represent a wholly different lineage. This new individual inexplicably exhibited a closer genetic affinity to the people of New Guinea than to any other human population alive today. These trivially simple comparisons were only possible because of multiple decades of advances in molecular biological techniques that enabled DNA extraction from partially fossilized remains over 40,000 years old. Synchronously with these breakthroughs, computing power mushroomed to be equal to the gargantuan new task of processing and analyzing the unending gusher of fresh data.
Evolutionary biologists had been arguing for decades about whether modern human populations carried ancestry from other lineages like Neanderthals, or whether we were solely descended from Africans. One group of paleoanthropologists, buttressed by genetic findings from mitochondrial DNA, argued for a recent and massive expansion within and from Africa, the Out-of-Africa event. Others argued for some continuity, pointing to similarities in skeletal features between specific regional ancient human populations and modern ones (for example, Neanderthals and modern Europeans), and the risks of extrapolating from a single genetic locus, mtDNA. With hindsight, before paleogenetics and genomics scaled up the information we could extract from the past to millions of markers, researchers clearly lacked the granularity of data to discern any dynamic more subtle than massive levels of admixture or hybridization. Today we know from many published prehistoric DNA results that all humans outside of Africa have Neanderthal ancestry, averaging around 2% (and most within Africa do so as well, but a far smaller fraction in direct proportion to Eurasian ancestry). For Denisovans, Melanesians carry 3-5% of this ancestry, while most other Asians have 0.1-0.3%. Such low fractions would be impossible to detect, let alone precisely measure, via mere comparisons between skeletal features of a few individuals here or there, or with 20th-century genetic techniques reliant on a few variant positions across hundreds of modern humans. Even the simplest of questions, whether modern humans have any ancestry from Neanderthals, required millions and millions of genetic variables to definitively answer. It’s easy to forget that it’s only been 13 years since this has been considered settled fact.
Maybe if we were lucky, our species’ past would be simple. Easy to understand, easy to model and easy to test. Twenty years ago a simple model of our origins was the scientific consensus, but in hindsight, simple models were also the only level of complexity we had the power to test. In retrospect, researchers were condemned to hunting for the proverbial key solely under the streetlight’s glow because it was the only place they could truly see. Today, Instead of rather arbitrary and sharply delimited single genetic regions like mtDNA or Y chromosomes, or the morphological features of a few hundred high-quality skeletons, researchers have millions and millions of genetic markers to analyze and compare, some from humans who lived hundreds of thousands of years ago. The time has come to update our models; but in the process of gaining vastly greater insight, we can expect to lose any semblance of simplicity or tidiness. The human past may no longer be summarized in a succinct meme you can put on t-shirts, but a less elegant truth is ultimately superior to a misleading oversimplification. Not to mention, likely more enduring. Our windfall of data has overturned what we knew to be true, but the long, hard work of building models to capture reality as it truly is remains in its infancy.
It’s raining models
By design, any model ignores many of the world’s messier details, but in exchange, it clarifies and helps us grasp reality. Geneticists employ the Hardy-Weinberg equilibrium (p2 + 2pq + q2 = 1) even if all of its simplifying assumptions: zero migration, zero mutation, zero selection, zero drift and only purely random mating, literally never occur in nature. Why? Because it’s a powerfully insightful and robust model for answering questions about evolutionary processes. It clearly doesn’t explain everything, but it explains the important things. This appeal to simplicity rests on Occam’s razor, or the law of parsimony. The best, most useful and often most correct explanations are usually ultimately those with the fewest fiddly conditions. But at times that reasonable bias towards parsimony risks impeding our best understanding. Five hundred years ago, ornate Ptolemaic geocentric astronomical models were still useful in predicting the trajectory of astral bodies, but they had become exceedingly complex, with many additional epicycles required over time in order to correctly predict planetary movements. The attraction of the heliocentric Copernican worldview was less its predictive power than its elegance (some scholars have argued it was actually initially weaker at prediction than the ultra-elaborate Ptolemaic models then current). It’s worth recalling that in its infancy, Heliocentrism offered a “good enough” explanation at a much lower cost of complexity. Meanwhile, we instead find ourselves today in quite the opposite situation; there is no longer any way to massage or stretch the parsimonious explanation to fit the data on hand. Simplicity was nice when it explained what we knew. But we know vastly more now and there’s no avoiding the conclusion that for the time being… “it’s complicated.”
In human evolutionary models, the primary features in phylogenetic trees are 1. nodes where populations bifurcate and 2. the length of the descendent branches. Prior to the discovery of Lucy in 1974 and the advent of molecular evolutionary studies, paleoanthropology held that the human lineage split from the other apes as early as 10 million years ago. Models at the time often posited Ramapithecus, an Indian prehistoric ape dated to 12 million years ago, as the precursor of the human lineage. Through fossils and genetics, it has since become clear that the divergence between chimpanzees and humans is 6 to 7 million years ago, which requires reconfiguring the position of prehistoric Miocene apes like Ramapithecus in the phylogenetic tree. In the more recent human evolutionary story, we now know that Neanderthals universally contributed to the modern non-African genome, turning our genealogical tree into more of a lattice, with branches grafted back on each other. When the Neanderthal admixture into modern humans was initially detected in 2010, archaeological evidence and the equitable distribution of Neanderthal ancestry from Europe to Australia pointed to the likelihood that the crucial admixture event occurred about 50,000 years ago in the Middle East. This is where a cluster of Initial Upper Paleolithic (IUP) sites was located, eventually spreading east across Eurasia. The Middle East 50,000 years ago is also the last time and place there was a single human population that would have given rise to non-Africans, from Australia to Europe. If Neanderthal admixture instead occurred in Europe or East Asia, it would be difficult to explain why it was so consistently spread in the same fraction everywhere without adding complex migration scenarios. A single admixture in the Middle East before the separation of non-African lineages is here the most parsimonious model.
Happily, more data and sophisticated methods have since turned a tentative supposition into an almost certain fact. When distinct populations first mix, the hybrid genome is defined by signature long stretches of unitary ancestry. Through recombination events during sexual reproduction, these segments slowly break up generation by generation which results in progressively smaller alternating tracts of ancestry. Just like half-lives in radioactive decay, the length of these segments allows scientists to reconstruct the time since admixture; the shorter the lengths, the longer since mixing. Several ancient DNA sequences of prehistoric humans from 40,000 years ago or earlier exhibit clear evidence of much longer Neanderthal segments than in current humans. Comparing the length of the segments with a model of admixture, researchers narrowed down the window for the mixing event to a bit earlier than 50,000 years ago (so only 10,000 years would have elapsed by the time of the prehistoric genomes).
We are now accelerating our understanding of the human evolutionary past through both the generation of exponentially greater amounts of data and the application of more powerful analytic methods that leverage the horsepower of modern computing. In 2010, simply counting up the similarities between the genomes of Neanderthals to Yoruba Africans and British Europeans showed more matching variants between Neanderthals and Europeans than between Neanderthals and Africans. The analysis was trivially simple, even if getting a viable Neanderthal genome was a serious feat. A simple question (Which modern human populations’ ancestors mixed with Neanderthals?) required a simple model and a simple analysis.
But if the past was more complicated than we once imagined, then simplicity will not always best serve us. Rather than an explosively branching tree with a single origin, scholars now confront a lattice of human gene flow stretched over millions of years, with our modern lineage’s origins the end product of complex recurrent dynamics, rather than a single punctuated speciation event.
Let us simulate!
I adapted the figure above from a landmark 2023 paper published in Nature, A weakly structured stem for human origins in Africa. Even a quick glance at the confoundingly complex illustration of their most statistically likely model belies the milquetoast title. We’ve obviously come a long way from the year 2000 when a single origin for modern humans was the orthodoxy. Then, the argument went, an East African tribe spread all across the world in a single burst starting 50-60,000 years ago, replacing our Neanderthal cousins in Europe as well as all other human lineages in Asia and Africa. The decades since have seen progressive and incremental modifications and complexifications, not out of caprice but necessity. Scientists now agree on at least two major introgressions into Homo sapiens from Eurasian human groups that pre-date the Out-of-Africa migration, Neanderthals and Denisovans. As copious data flooded in from whole genomes of people from all over the world, it was evident non-Africans are a branch of Africans genetically (confirming what was already known from the study of mtDNA and Y chromosomes), but non-Africans went through a massive bottleneck, an effective breeding population that collapsed to around 1,000 humans for 100 generations starting some 60,000 years ago. A population bottleneck was suspected from the shallow mtDNA and Y chromosomal genealogies in the 1980’s and 1990’s, with “Mitochondrial Eve” and “Y chromosomal Adam,” converging back to a common ancestor 150-200,000 years ago before rapidly expanding all across the world. But looking at the whole genomes of thousands of humans, an extreme population crash is nowhere evident in the reconstructions of African population history. A clearly anatomically modern human discovered at Omo Kibish in Ethiopia in 2004 dating to 195,000 years ago was the definitive nail in the coffin of the belief that both the bottleneck and expansion 60,000 years ago were central to our species’ emergence as a whole. Homo sapiens predates the Out-of-Africa event by hundreds of thousands of years; to understand our origins, we need to know what happened then in Africa. Could there have been a single modern human population? Several? Did modern humans and other human lineages coexist at length? What were their relationships? The past was clearly even more uncharted and messy than we had thought.
A decade ago, a geneticist friend confidently predicted to me that the story of humanity’s recent origin would turn out to be the collapse of ancient population structure. In this paradigm of deep structure within Africa, the root of our lineage dates back 200,000 years, when anatomically modern humans like Omo Kibish flourished and the divergence between Khoisan and other populations is clear in the genomic data. Subsequently, admixture between lineages would have resulted in hybrid daughter populations, the true precursors of modern groups like East Africans, Eurasians or Khoisan.
By the early 2010’s, ancient DNA was clearly pointing to Neanderthal and Denisovan introgression outside of Africa. Additionally, wholly within Africa, anomalies in the data suggested admixture from a very different “ghost lineage” into modern populations, analogous to what occurred in Eurasia. A ghost lineage is simply a group that has long since been totally absorbed into a contemporary population, and for which we lack archaeological evidence. It’s all in the genes. Imagine if we hadn’t long had artifacts and fossils of Neanderthals; they would then be a ghost lineage for all non-African humans. Over the last decade, the interpretive framework was simple: assume distinct geographically separated human lineages (Africans vs. Neanderthals) with bursts of gene flow between the branches to explain deviations from the tree (Neanderthals → Africans). The work of explanation would be to assign the gene flows between the branches of these disparate human groups. But what if the human past wasn’t as clean and neat as in our models?
The authors of A weakly structured stem for human origins in Africa used the latest simulation methods to arrive at a more complicated, but likely truer, understanding of our lineage’s origins. They modeled the hypothetical interrelationships of two African populations that split a million years ago, before the separation of Neanderthals from our own ancestors 550,000 years ago. They labeled these two lineages “stem 1” and “stem 2.” The term stem just indicates that they are at the root of both modern and “archaic” populations, like Neanderthals and Denisovans. Otherwise, the labels are generic because the researchers built an abstract model first, free from any expectations drawn from archaeology as to where these populations might have lived.
They used only British whole genomes as their proxy for non-Africans. Adding more Eurasian populations would complicate the model without adding much insight; it would add input and output variables, making it harder to interpret the results. The most accurate model of the universe would, after all, be a perfect simulation of the universe, but that much specificity and fidelity undermines the whole point of constructing models. To represent Africans broadly, the team used the Mende, a group in the far west of the continent familiar to geneticists from the 1000 Genomes Project and the Nama, the last of southern Africa’s Khoisan pastoralists. They also sequenced three Ethiopian ethnicities, the Amhara, Oromo and Gumuz. These Ethiopian populations represent ancient East Africans as a group. The Amhara and Oromo are Afro-Asiatic speakers with substantial West Eurasian ancestry from the Middle East. The Oromo speak a Cushitic language, related to Somali, while Amharic is Semitic, most closely related to Yemen’s old pre-Arabian languages. The Gumuz speak a Nilo-Saharan language, and until recently were hunter-gatherers. The chosen Ethiopian populations uniquely preserve an earlier Sub-Saharan African heritage; over the last 2,500 years, eastern Africa has been overwhelmed by Bantu-speakers from Cameroon who have overwritten much of the region’s earlier genetic structure.
So how did the authors arrive at the baroque figure above, with its abundant forks, gene flows and population mergers? First, whole-genome data, with millions and millions of genetic markers, allowed them to estimate statistics that describe the West Africans (Mende), East Africans (Ethiopians), South Africans (Nama) and non-Africans (British). These statistics measure the diversity of the genome or the correlation between markers within the genome. The team then ran demographic simulations under variable conditions that generated a range of artificial population-genetic statistics. They were looking for parameter values and models in the simulations that produced statistics approximating those in modern populations. The aim was to arrive at a simulated history that might give insights into our species’ real past within Africa through a model that could generate results similar to real data.
What they found is that the more complex model above is more likely for the genomic data we have than the earlier simple model of introgression between archaic and modern lineages within Africa. Exploring various simulations, they also found that an assumption of admixture occurring in large pulses and through population fusions between stem 1 and stem 2 was superior to only continuous gene flow. Pitting their models against each other, simulations always indicated that about 550,000 years ago the ancestors of Neandersovans, the Eurasian population that would give rise to Neanderthals in western Eurasia and Denisovans in eastern Eurasia, left Africa and branched off from stem 1 rather than stem 2. This indicates that stem 1 occupied Africa’s east, as pre-modern humans seem to invariably exit the continent from this direction. Then, 500,000 years ago, stem 1 itself bifurcates into stem 1s and stem 1e, representing the tentative southern and eastern branches of modern humanity. About 120,000 years ago, stem 1s absorbs a great deal of gene flow from a stem 2 population (70% stem 1s and 30% stem 2), and this subgroup becomes ancestral to the Nama Khoisan. Subsequently, the model estimates that gene flow between the proto-Khoisan and all other human populations is very low, in line with contemporary genetic surveys that place these populations as the most distinct from other modern lineages.
Since stem 1 and stem 2 populations have exchanged genes repeatedly over the last nearly million years, the two lineages are not nearly as distinct as they would be in archaic-introgression models. Only 1-4% of the variation in modern African populations is due to early differences between these two populations, dating back to a million years ago. At about the same time the proto-Khoisan group that led to the Nama was formed, stem 1e and stem 2 combined in equal proportions to give rise to the proto-West/East African population. This population then split in two 60,000 years ago, with western and eastern branches. The Out-of-Africa tribes branched off from the proto-East African population 50,000 years ago in the model. Finally, about 25,000 years ago, at the height of the Last Glacial Maximum, the ancestors of the Mende absorbed the last stem 2 population in totality (at a ratio of about 20% of the ancestry of the proto-West Africans). This is an important result because it explains the inference from archaic introgression models that West Africans harbor a very distinct “ghost population” within them.
The framework outlined above can explain all of the genetic features that archaic introgression explains and is statistically more likely when both paradigms are explicitly compared in a simulation framework. This does not mean that this is the last word. But it demonstrates how complexity and richness can be introduced into our understanding of human population history through more subtle model-building. It is inevitable that eventually ancient DNA will illuminate aspects of Africa’s deep history, clarify inferences and resolve alternative hypotheses that the models cannot adjudicate. Currently the oldest genomes from the continent date to 18,000 years ago, long after the seminal period prior to the out-of-Africa event 50-60,000 years ago. Even in Eurasia, with colder conditions more conducive to ancient DNA preservation, only a handful of genomes predate 30,000 years ago. Until paleogenetics catches up, leveraging the massive amount of data from modern populations, now verging on millions of whole-genome-sequenced individuals, and building computationally tractable models, are our only realistic interim short-term options to further refine our understanding of what seems to have been a bewilderingly complex past. When we are lucky enough to access and sequence ancient African genomes, if our models have been well enough built, they will fill in an interpretative landscape already roughly sketched by the model builders, rather than requiring fresh hypotheses from whole cloth.
On becoming human
The key argument in the paper I have highlighted above is not about the specific two-stem population model and its subsequent dynamics, ingenious though that may be. Though the authors argue that their model is more statistically supported than a simple single-origin of modern humans with archaic introgression, they make the case that we should reconceptualize the origin of our species in Africa and also update the sophistication of our methodological toolkit. The two are closely related, as the power of the tools can reconfigure the space of possible scenarios we entertain. Rather than a burst out of a single population that then absorbed stray archaic lineages, the authors argue that Homo sapiens was the end product of repeated instances of what they call “population fragmentation-and-coalescence.” This frameshift has several implications relevant outside population genetics, and can inform the evolutionary understanding of modern human origins.
The archaic introgression model implied that a distinct modern human population developed at least 200,000 years ago, with some theories pushing it as early as 300,000 years ago. This is an outgrowth of estimating the split between the Khoisan and all other human lineages. In phylogenetic parlance, the Khoisan are “basal” to all other humans. They split off our common stem first. The separation of the Khoisan from other populations this early means that by definition modern tribes were present within Africa earlier and that the modern human population would leave remains and an archaeological footprint. This is why Jebel Irhoud in Morocco, with dates older than 300,000 years, was so sensational; some went so far as to argue for the origins of humanity in northwest Africa.
A weakly structured model of human origins obviates the need for an exceedingly ancient ur-modern population specifically somewhere in Africa at a precise time. Not unlike the multi-regional model, our species’ lineage in Africa may have been characterized by several distinct and related populations over hundreds of thousands of years, perhaps all the way back to the root of the genus Homo two million years ago. Instead of attempting to deduce the original human homeland within Africa, we might have to think of the whole continent as the canvas on which our evolutionary history has been painted layer upon layer over the past few million years. In this understanding, the Out-of-Africa event was the exception, not the rule. The single massive wave of expansion that swallowed our Eurasian cousins, the Neanderthals and Denisovans, each in one swift gulp, is important, and left a tremendous footprint in the archaeological and genetic record, but is not something from which we should generalize to our whole history.
Rather than looking for the specific place in Africa where our species arose, a weakly structured framework shifts the focus to when the dynamics that precipitated our recent worldwide expansion occurred. The most supported model in the paper implies the emergence of the proto-Khoisan, and their subsequent isolation from other human populations, approximately 120,000 years ago. This falls in the Eemian interglacial, dated from 130,000 to 115,000 years ago. These results suggest that the coalescence of modern humans began during the last warm period on Earth before our own, when hippos and hyenas roamed Britain.
This will be a more complex and messy narrative than the simple “African Eden” model for which the Out-of-Africa event preconditioned us. Perhaps for cultural and psychological reasons, many humans remain wedded to the idea that our origin was singular, exceptional and explosive. But it may be that it was multi-regional, prosaic and characterized by a long, slow-burning fuse and gradual flux. The origin of Homo sapiens in a gradual manner through recurrent mixing of geographically distinct populations should also make us consider the framing that we are humans qua humans, and the rest of Homo, from Neanderthals to Denisovans, occupy a humanoid gray zone. If our origins were weakly structured, and the whole of Africa was the playground for our lineage for millions of years, there was never a first human population. Homo was human from the beginning, and our Neanderthal and Denisovan cousins were human as well. There was simply becoming Homo sapiens, a long and gradual process, the evolution, not of the first humans, but the last.
From this vantage
It’s a shame to think that no one in the literate public witnessing Charles Darwin’s bombshell of a new paradigm in the 1850’s was still alive in the 1950’s to behold the unveiling of Watson, Crick, Wilkins and Franklin’s work on DNA. It took nearly a century to characterize the exact molecular mechanism prophesied by Darwin’s audacious theory.
And yet our time is not Darwin’s. Our paradigm shifts are a bit more modest, our pace of both innovation and applying human ingenuity undeniably more breakneck. Darwin’s personal observations and data collection aboard the Beagle, like Alfred Russell Wallace’s shortly after on the ground in Southeast Asia, impelled them to wholly upend long-standing assumptions explaining life on earth. Both plainly saw something didn’t, and wouldn’t ever, square between their findings and existing theories. We are at an analogous point. For a time, Out-of-Africa adequately fit the scanty data and the means of analysis at our disposal.
But like Darwin reviewing his copious physical specimens once they reached Down House after the voyage or Wallace surveying all he had observed over his sojourn, those working in human population genetics today cannot ignore the prodigious jackpot of long-sought data that has deluged their field in the space of just two short decades.
Like a 6,000-year-old planet, Out-of-Africa is undeniably in our rearview mirrors; a piece that captures the origin of non-Africans, not the whole puzzle of our species’ emergence. Physical specimens of human remains that have improbably survived for eons under the African continent’s unforgiving conditions will likely be our jackpot to rival Franklin’s pioneering images of DNA. But until that lucky day, the field will creep inexorably closer to a theory for all time, its incremental refinements enabled by the incessant tinkering and inventiveness of insatiably curious humans at the field’s forefront today. And for my money, odds are pretty good that those theorists active in the field today pushing doggedly towards a next theory of best fit, see a day when that Rosetta Stone of physical data out of Africa will be fitted into the edifice they are now constructing in anticipation.