Homo with a side of sapiens: the brainy silent partner we co-opted 300,000 years ago
Unsupervised Learning Journal Club #4

For 2025, I’m trying out a new occasional feature for paying subscribers, the Unsupervised Learning Journal Club: I’ll offer a brisk review and consideration of an interesting paper in human population genomics.
In the spirit of a conventional journal club, at the end of each post, interested subscribers can vote on next papers to review. I’m open both to covering the latest papers/preprints and reflecting back on seminal publications from across these first decades of the genomic era.
If your lab has work we might like or you otherwise want to suggest a paper for me to cover, feel free to respond to this email or comment on this post.
The first three editions are here:.
Wealth, war and worse: plague’s ubiquity across millennia of human conquest
Where Queens Ruled: ancient DNA confirms legendary Matrilineal Celts were no exception
Eternally Illyrian: How Albanians resisted Rome and outlasted a Slavic onslaught
Free subscribers can get a sense of the format from my ungated coverage of two favorite 2024 papers:
The other man: Neanderthal findings test our power of imagination
We were selected: tracing what humans were made for
Unsupervised Learning Journal Club #4
Today’s paper, the overwhelming preference of last edition’s survey voters is: A structured coalescent model reveals deep ancestral structure shared by all modern humans. It comes out of Aylwyn Scally and Richard Durbin’s groups at Cambridge, and appeared in Nature on 18 March 2025. The first author is Trevor Cousins.
Unsupervised Learning Journal Club #4
Today’s paper, the overwhelming preference of last edition’s survey voters is: A structured coalescent model reveals deep ancestral structure shared by all modern humans. It comes out of Aylwyn Scally and Richard Durbin’s groups at Cambridge, and appeared in Nature on 18 March 2025. The first author is Trevor Cousins.
From trees to tangles
One of the most powerful metaphors for understanding evolutionary processes is the phylogenetic tree; a representation of the branching patterns that depict how life has diverged and diversified into many disparate lineages, from a single common ancestor four billion years ago. The idea has medieval origins, and was at the heart of Charles Darwin's argument in On the Origin of Species. Descent with modification along its many branches was how the tree unfurled across evolutionary time.
But a tree metaphor always had limitations, not least because pedigrees have a certain propensity to turn in on themselves; wayward branches fuse together to spawn something new. In Darwin’s age, hybridization across species in the animal world was seen as a relatively rare occurrence, an aberration most famously illustrated with sterile mules, evolutionary dead ends. Yet even Darwin and his contemporaries, avid gardeners, knew that in plants hybridization was not exceptional at all; they were well aware that botany’s phylogenetic tree was more of a tangled bush.
Today, modern genomics and ancient DNA are exposing the reality that even in mammals, tree structures are crisscrossed with connections, a wild lattice of hybridizations. Recently, paleogenomics has raised the possibility that the dire wolf George R. R. Martin’s Game of Thrones made famous, might have been a mixture of two canid lineages as long separated as humans have been from chimpanzees: some six million years. And indeed, that separation between chimpanzee and human lineages itself may not have been as simple as a single divergence date would indicate. Before proto-humans and proto-chimpanzees finally parted ways for all time, they underwent an intense bout of hybridization.
And what was true at the root of the entire bipedal lineage now appears indisputably true later on in human evolution, as we trace the tree structure out along its network of branches. In 2010, paleogeneticists discovered that modern humans still carry both Neanderthal and Denisovan heritage, the latter an ancient East Asian hominin discovered purely through a DNA sequence. Other researchers, using more indirect means leveraging mass volumes of human genomes and the new power to test extremely complex models in a computational framework, have now inferred similar events of archaic admixture into the proto-modern lineage within Africa. What we had sketched as a gracefully branching human family tree turns out to be a messy tangle, with branches twisting together to form new lineages, before again unbundling outward via demographic expansions and replacements that radiated populations out across the entire Old World.
And yet the proportion of Neanderthal DNA in modern populations is about 2% at most, and Denisovan input rises marginally higher solely among New Guineans (otherwise appearing in only trace amounts across most of South, East and Southeast Asia). Regardless of race, most of humanity’s heritage goes back to one population resident in Africa 100,000 years ago. Outside of Africa, all extant populations’ lineages coalesce back to 60,000 years ago, to a small tribe of as few as 1,000 who survived a sharp and prolonged bottleneck before mixing with some Neanderthals and sweeping outward across Eurasia, and into Oceania. But what if these ancestral Africans, from which the vast preponderance of our ancestry comes, were themselves the product of a recent admixture event, and one that left a much more appreciable legacy than the later marginal genetic absorption of mere traces of archaic Neanderthal and Denisovan lineages? Instead of an assimilation, imagine an amalgamation. Perhaps the first modern humans were freaks, a race of hybrid monsters born of what we would today perceive to be abominable unions?
That is what today’s Nature paper, A structured coalescent model reveals deep ancestral structure shared by all modern humans, with first author Trevor Cousins, argues. The paper concludes that our modern human lineage arose out of an admixture event dating to some 300,000 years ago between two hominin species who had split 1.5 million years ago. One lineage, “population A” contributed about 80% of our species’ ancestry. It emerged out of the same lineage that begat Neanderthals and Denisovans. The other, “population B” contributed the remaining 20% of our ancestry. It was only after the mixing between these two human species, evolutionarily as distant as wolves are from coyotes, that the proto-modern human lineage arose, diversified, and went on to conquer Africa, and then the world. So how did population B alter us? The evidence suggests that population B contributed a major software upgrade to be run on hardware mostly from population A, B’s brains hitched to A’s brawn.

Coalescing back into the past
Perhaps surprisingly at first blush, Cousins et al. arrived at the audacious conclusion without leaning on ancient DNA. Upon reflection, this stands to reason, on these time scales, what passes for ancient DNA to us is scarcely any closer to 300,000 years ago than we ourselves are. The oldest African DNA we’ve found dates to about 15,000 years ago, so paleogenomics sheds very little light on our species' deeper African history. To get around this, Cousins et al. focused on deeper, more subtle analyses of the vast troves of whole genome sequences from contemporary populations now available to researchers. If 2025’s mountain of data is their gold mine, then a new method, cobraa, is the game-changing extractive device. The researchers extended a method devised in 2011, PSMC (Pairwise Sequentially Markovian Coalescent). Over the last decade and a half, PSMC has equipped researchers to take whole-genome-sequence data, and reconstruct an individual’s demographic past, and more generally, the population sizes and dynamics of their ancestral lineage as reflected in their pedigree fanning out into the past.
The 21st-century computational heavy lifting of PSMC, and now cobraa, rests on the foundation of a spare, elegant 20th-century population-genetic concept, the coalescent. Pioneered in the early 1980’s, the coalescent emerged in a time when we finally had a small, but still paltry amount of DNA data, plus some basic computational resources, but nothing truly powerful yet. In theory, with computers, population geneticists could simulate evolutionary dynamics like selection and drift, observing their impact on allele frequencies forward in time, running experiments in silico. But practically, the branching possibilities were too many for that era’s primitive computational power (the top-of-the-line IBM mainframe computers then had 16 or 32 MB of RAM; today 8 GB of RAM is a modest specification for a desktop computer, so 250-500 times more memory than a top-of-the-line mainframe 45 years ago). Faced with these limitations, some geneticists realized that reversing the arrow of time would make their analyses much more tractable; rather than project numerous evolutionary possibilities forward, they could start with the genetic endpoints, and converge back along the pedigree to a common ancestor. In a coalescent framework, the genealogy has fewer and fewer nodes every generation back, meaning that the analyses become less resource-intensive as they proceed.
The original coalescent focused on a single locus and its genealogy; this was the age of genetics, not genomics. In an effective population of size Ne (Ne being the number of individuals who pass genes to the next generation) the expected number of generations for two distinct gene copies to coalesce is 2Ne generations. By looking at the time depths of the branch lengths to coalescence within a genealogy, you can estimate the Ne that would produce these patterns. The genetic variation is the data you load into the coalescent model, and ancestral population sizes are a parameter whose values you allow the model to tinker with in an effort to reproduce the data’s patterns. The genomic framework extends simple coalescent logic by looking at patterns across thousands of genes; instead of a single genealogy we now test thousands of coalescents. Rather than a single locus, as in the pre-genomic era, PSMC and cobraa scan the entire genome, inching along to examine sliding windows of 100 sequential DNA bases at a time, assessing whether a genomic region within a single individual is heterozygous (two variations of the gene copies) or homozygous (duplicate of a single variant). By comparing an individual’s two complete copies of the human genome, cobraa can estimate patterns of intra-locus coalescence. Since the two gene copies are contributed from two parents, intra-locus coalescence from a single individual reflects distinct parental histories. A model, which includes population history and mutation rates as parameters, can estimate variables like effective population size by iteratively fitting the data better and better.
As an upgrade that goes beyond PSMC’s computational genomic model, one of cobraa’s major extensions comes by considering possible scenarios of structure within the ancestral population. While PSMC was limited to assuming a single population, thereby smoothing out complex histories into a rough average summary, cobraa is powerful enough to contemplate scenarios where different populations experience different histories, before eventually fusing into the single hybrid populations familiar to us. Cousins et al. validated cobraa by testing it against simulated histories (whose ground truth they knew beforehand, having designed the simulations). Then they applied the method to a small sample of human genomes representing numerous disparate populations. Cousins et al. looked at 26 human populations, sampling a single individual from each group in the 1000 Genomes Project, and so constructed 26 population histories (they replicated the finding by looking at similar populations in the Human Genome Diversity Project).
The figure that opens this piece illustrates their model of best fit. Back to about 300,000 years ago, their results align with earlier publications. But further back in time, novel inferences are generated, because a structured ancestral set of populations turns out to better fit the data than a single randomly mating horde, the best PSMC could hope to offer. Here, the fusion of two populations, at a 4:1 ratio, proves a better fit than a balanced and reciprocal admixture. Population B, at a 20% share, can be thought of as mixing into the background of population A, at 80%. But compared to Neanderthal contribution into out-of-Africa humans, this input is more substantial by an order of magnitude. Populations A and B had diverged about 1.2 million years prior, about twice the distance separating our early out-of-Africa population from Neanderthals and Denisovans. Population A, the majority ancestral contributor, is a sister lineage to Neanderthals and Denisovans.
Additionally, population A went through an extreme bottleneck at its founding; all the estimates of our lineage’s divergence from these archaic Eurasian hominins actually derive solely from our population-A heritage. The 20% of our ancestry from population B, is far more divergent. Given this new insight, we may need to revisit our estimated dates of the splits between Neanderthals and modern humans (because we probably actually diverged significantly more recently than assumed). Also, because it is the minority input, population B’s ancestral contribution appears to have been subject to purifying selection; on average, its contributions are clearer and more copiously represented the further away the DNA segment lies from the 1.5% of the genome that codes for proteins. But, there are telling exceptions. In some classes of genes, population B’s contributions are disproportionate. Most importantly, those implicated in the brain.