"The confusion comes from how percentages are used in different places. What the 2-4% refers to is how much DNA from a Caucasian or Asian is 100% identical to Neanderthal DNA.
In other words, a typical European and a Neanderthal are 99.7% similar over 96-98% of their DNA and 100% similar to a Neanderthal over 2-4% of their DNA. A sub-Saharan African will be 99.7% the same as a Neanderthal over 100% of their DNA.
There are many brilliant numerate people reading Razib that can correct this if wrong. I scrub toilets for a living, but was curious of the numerically concrete answer to your question, so I asked the internet and the above is the answer I received. So take that for what it is worth. There is more at the link about "DNA Similarity Between Species". ie. Humans and chimpanzees.
Thank you very much, Carleton. That article you cited clears it up nicely. I come away with two conclusions:
1. Geneticists have been using sloppy language when they throw these unqualified percentages into their writings targeted at lay persons.
2. More fundamentally, I have to assume that when geneticists talk amongst themselves they have way more precise and nuanced ways of expressing degrees of (dis)similarity. So the use of simple percentages must have been arrived at for the express purpose of popular science writing aimed at laypersons, and in that regard the choice of using percentages was, in my humble opinion, a Fail. The percentages do not intuitively convey anything meaningful to a lay person even if one was to somehow recognize which one of those different ways of measuring was actually being reported. I am wondering if some other unit of measure would resonate more: # of generations since last common ancestor? The fraction of 1 represented by the last common ancestor (i.e., 1 over some power of 2)? Metrics like this would start to embed an intuitive feel for the scales of time we are talking about.
I'm not sure more percentages from that link really cleared it up and elucidated the differences for me. The genome is over 3 billion base pairs. Even a 0.006% difference is a massive number of base pairs. Also, I don't think lay people, myself included, grasp the complexity present within the genome. We had the computational power and knowledge to send a man to the moon 40 years before we were able to sequence the human genome.
"The confusion comes from how percentages are used in different places. What the 2-4% refers to is how much DNA from a Caucasian or Asian is 100% identical to Neanderthal DNA.
In other words, a typical European and a Neanderthal are 99.7% similar over 96-98% of their DNA and 100% similar to a Neanderthal over 2-4% of their DNA. A sub-Saharan African will be 99.7% the same as a Neanderthal over 100% of their DNA.
So the 2-4% of a non-African person’s genome that is of Neanderthal origin will still be 99.7% similar to the matching stretch of DNA in an African person’s genome. In terms of the whole genome, that is only 0.006% of actual difference!" - https://genetics.thetech.org/ask-a-geneticist/human-neanderthal-similarity-africans-europeans
There are many brilliant numerate people reading Razib that can correct this if wrong. I scrub toilets for a living, but was curious of the numerically concrete answer to your question, so I asked the internet and the above is the answer I received. So take that for what it is worth. There is more at the link about "DNA Similarity Between Species". ie. Humans and chimpanzees.
Thank you very much, Carleton. That article you cited clears it up nicely. I come away with two conclusions:
1. Geneticists have been using sloppy language when they throw these unqualified percentages into their writings targeted at lay persons.
2. More fundamentally, I have to assume that when geneticists talk amongst themselves they have way more precise and nuanced ways of expressing degrees of (dis)similarity. So the use of simple percentages must have been arrived at for the express purpose of popular science writing aimed at laypersons, and in that regard the choice of using percentages was, in my humble opinion, a Fail. The percentages do not intuitively convey anything meaningful to a lay person even if one was to somehow recognize which one of those different ways of measuring was actually being reported. I am wondering if some other unit of measure would resonate more: # of generations since last common ancestor? The fraction of 1 represented by the last common ancestor (i.e., 1 over some power of 2)? Metrics like this would start to embed an intuitive feel for the scales of time we are talking about.
LK, the 99% number is confusing to a lot of geneticists and relies i think on old amino acid metrics. i will look up genomewide data to confirm
I'm not sure more percentages from that link really cleared it up and elucidated the differences for me. The genome is over 3 billion base pairs. Even a 0.006% difference is a massive number of base pairs. Also, I don't think lay people, myself included, grasp the complexity present within the genome. We had the computational power and knowledge to send a man to the moon 40 years before we were able to sequence the human genome.