My father was massively into geneology in his retirement. He travelled all over New England pouring over documents. Went to places like Chicago that had large centers for research. Volunteered at the local Mormon church to access their database. He basically became an expert in early American history by doing all the work. And he said it was more rewarding to solve these puzzles than the academic career of philosophy debating idealism versus realism. I often wonder what he would think about this new aspect of his little hobby that ended up with him authoring a 1500 page book of family history.
As a Punjabi South Asian, I got 80% South Asian, 10% West Asian, 4% Central Asian, 5% Finnish, and around 1% Irish/Scottish/Welsh. I took thr test through MyHeritage. Do you think there would be any value in uploading it to another site? Also speak to the potential accuracy of my current results.
The Finnish certainly caught me off guard, my guess someone immigrated from Finland to the UK and one of their descendants ended up in the Royal Army. Or would 5% be too little for something in the past 2-300 years?
Try 23andme. Myheritagedna always gives some random European results that make no sense at all. I'm surprise Indians haven't caught on to this. A Punjabi Khatri should probably not get such numbers that you have got.
punjabis get this because they are steppe enriched vs a vs other indians. it's not "wrong," it's just 23andme "trains" its populations in very precise and demarcated ways
I feel like when people do these, they want to see those results that are "trained" and not results that don't make sense with regards to what they know their history to be.
MyHeritage gives my Irish father 7.5% Finnish ancestry.
I suspect it’s Norwegian ancestry though, as he has about 160 Norwegian relatives, with about 25 in Tromso in the far North - an area which was Finnish during the Viking era.
Aren't the Saami (native people that live in the north of Scandinavia) genetically related to the Finish? Pretty sure that the Saami languages are related to Finish. So I would expect that anyone who has relatives in northern Scandinavia could have some Saami blood, thus being somehow genetically related to the finish.
This is the reason I won’t use any of these services despite wanting to know my genetics and ethnic origin. They own or will keep the data and it will be used in terrible or immoral ways in the future.
My WGS was done by BGI in an IQ study a decade ago and I have the whole file. Unfortunately, i have no idea how to read it. I could search for an individual unit with the standard search feature, but unless I knew what it meant it would be of no value. I certainly keep it, just sitting there on my computer and I hope my children and grandchildren will get use from the data.
That's not raw reads, that's a genotype call, probably from an array. To break it down:
dbSNP record: rs200905204 (using GRCh37 genome build) chromosome: 1, position: 232961 reference base: T, your base: T (ie you have the same as the reference)
So, it won't have any rare variants, only what is on the microarray.
From my nerdy point of view, WGS looks like the coolest thing in the universe and I agree with you that it's a time of wonders when we can get our genome just like that... However, the privacy implications scare me shitless. It's not only that I'm opening my source code to the world but also the one of my close ones! Whether they consent or not.
Even though I think it's cool, same as I don't have an Alexa because I don't want to have a 24/7 microphone at my home, these kind of genetic tests always looked suspect to me. Do you have a write up, or know of one, about what the implications of having your genetic information open to the world are?
I'm really not that interested on the family origins results (ok, I'm very interested but I pass because sating my curiosity is not worth it) but it's the medical stuff that could tip the balance. For an average person that does not have a family story of harcore diseases, do you actually get actionable information on a WGS?
You can usually only get medical tests in hospitals if a specialists thinks you need one (ie you or close family have diagnosis of genetic condition)
Hospitals order genetic tests but they are generally done in dedicated pathology labs.
The results (usually 0-5 classified variants) are put on patient record systems.
Anyone in the medical system can access this (in case you end up unconcious in emergency), however viewing the record is audited and people do lose their jobs for doing that without cause.
I have heard of doctors who looked up tinder dates for STDs so I don't know how often that happens in practice though.
I have not heard of any genetic blackmail from data leaks. I am sure it will happen though
you can google it but there have been data breaches many times in hospitals. they pay a fine and stuff, but it's not a massive deterrent. they're not paying for the finest opsec specialists
The USA has GINA, so insurance companies cannot use your genetic information to charge you higher rates or deny you coverage.
In other countries you may be obliged to disclose any conditions found.
For actionable info on healthy people - possibly screening for recessive disorders.... eg 1 in 25 Caucasians carry cystic fibrosis, if you are a carrier you can get your partner screened and if you both do, screen the fetus or do IVF
I did LivingDNA because I was interested in regions within Britain (as well as Y and mt DNA) but their autosomal DNA interpretation in terms of percentages is not very accurate. I uploaded their data into GEDMatch and got much more reasonable results (c. 25% Italian instead of 58% Italian - I'm 50% Ashkenazi Jewish and 48% Anglo-Celtic Australian). What I was looking for was 2% Sub-Saharan African ancestry which shows up across many analyses on GEDMatch but not on LivingDNA. I now sent off a MyHeritage test as my cousin recommended it... Waiting for result...
livingDNA seems really well trained to britain and has returend very accurate results for NW europeans i know of. but i wouldn't be surprised if it's not so good with other populations
2% seems way too high for your background. in 23andMe and ancestry such a high % is usually a true positive as i said in the piece
Razib, my ancestry is largely from Eastern/South Eastern England, with the exception of two great, great, great grandparents who where Jewish and Irish. My LivingDNA report shows me as 15.7% Northwestern Europe and 84.3% Great Britain and Ireland. The curious thing is that although 16.1% and 3.3% of my ancestry is assigned to the East Anglia and Lincolnshire group respectively, the rest seems to be assigned to western Britain and Irish subgroups. Could you please speculate as to why this might be?
Does a WGS test like Nebula provide information about your Y haplogroup? To a similar level of detail as the Y-focused tests on FamilyTreeDNA? I think I want to do the deepest FamilyTreeDNA test some day, but I'm trying to understand where these tests might overlap.
i think 30x on nebula is 99% of the way there. Y's are tricky and i think the Big Y does have an advantage in that they've been doing it a while. but the difference imo is minor
I signed up for a free Nebula account just to upload my old Ancestry data and see what their website was like. I came across the "Deep Ancestry" page, which indicates some (new?) partnership with FamilyTreeDNA. So, it's possible that you get the benefits of the Big Y with the Nebula 30x.
"We have partnered with FamilyTreeDNA to give our users access to the most comprehensive ancestry reporting based on our 30x Whole Genome Sequencing data. To access the ancestry reports click the button below to transfer your 30x Whole Genome Sequencing data to FamilyTreeDNA.
The FTDNA ancestry reports are available with 30x Whole Genome Sequencing only."
Thanks, Razib for pointing me earlier this week on Clubhouse to Nebula and their Black Friday deal. My kit is waiting back home! Snapped up your substack’s Cyber Monday deal too - glad to be a subscriber again! Happy Thanksgiving!
This post came one day too late! Last night I finally succumbed to curiosity and ordered 23andMe Health & Ancestry after being worried about privacy implications for years. I'll likely just order a WGS as well.
One question I've had that wasn't quite addressed here: how useful is DNA testing for answering the health questions of those from lower sampled ethnicities? That question is very broad but I'm curious what your general thoughts are.
For context, I'm Kikuyu. My understanding is that polygenic risk score accuracy is worst for African populations. Should I expect the 23andMe "Health Predisposition Reports" to be bunk? How much will I learn from Promethease? I understand it's highly dependent on what health metric I'm looking at and proprietary details unknown to you; but again, I'm just interested in your general thoughts.
I did a 23andme test and it gave me Mongolian and Native American ancestry at around 2%. The Native American ancestry makes sense since my family has been in Canada for 400 years but I assumed the Mongolian ancestry was simply a proxy to Native American. However, they updated it recently and the Native American has been changed for Anatolian. I really cannot connect the dots between those 2 though...
I've seen papers about sequencing of fetal DNA from the mother's blood sample, but am not aware of companies that actually offer it. (Maybe I'm just googling badly) Are there any that seem legit?
Hi Razib, huge fan. Remember reading your gnxp blog regularly back in the day.
I am straits Chinese from both parents side (both identify as "Hakka") My ancestors arrived from the 1800s onwards until around 1900. My Ancestry results are:
58% Southern Chinese, 14% Vietnam, 13% Central and Eastern China, 12% Dai, 3% southern Philippines.
The southern Philippines makes sense, I think, as my parents' hometown is relatively close to the south of the Philippines, but I had anticipated a lot more Southeast Asian. Any suggestions for what other provider to look at for my specific situation as a Straits Chinese?
This question is probably a week too late to get any response, but Razib threw in a small aside that I've not heard before, and I pay keen attention to all his content regarding the history of our lineage in Africa over the last 200k years. Namely, this article mentions in passing that the lineage of those who left Africa ~60kya actually diverged from the lineage of sub-Saharan Africans some time before that -- ~100kya. Did I miss a podcast or article where this was explored? Inquiring Minds gotta know!!
i have't talked about it in detail. but it does look like the model is a little more complex than a separation 60k bp. like there was a period of separation, or intermediate groups went extinct, or there was population fusion btwn back to african and deep africa
What do you think is the best way to get a good ancestry report if you have WGS data? I did sequencing.com and bought the Eone Ancestry report.
PS.: Eone Ancestry was not worth the $10 (small reference samples). I also strongly recommend against buying from sequencing.com, had a really bad experience with them (took >8 months, lost one sample, low quality for another because of the delays and half the site doesn't work after their big update).
I thought I'd share as a Canadian of donor-conceived origins which sites have been most helpful for me for finding my siblings (27 and counting so far... likely around 150 total out there).
Ancestry: 5 (including 1 overlap with 23andMe)
23andMe: 5 (including 1 overlap with Ancestry)
FamilyTree (upload of my Ancestry kit): 1
MyHeritageDNA (upload of my Ancestry kit): 1
That uploading option is one I would flag as missing from your piece. It's free/cheap to get a lot of info by uploading your 23andMe or Ancestry kits to FamilyTree or MyHeritageDNA, but there is no option to upload from other companies into either Ancestry or 23andMe.
I will say that MyHeritage has been a bit better for "old country" (ie Europe) matches.
Re: looking at your own genetic ancestry directly, what tools are you using for that if you don't mind me asking? R and qpAdm?
The reason I ask is I get pretty widely divergent results for my grandfather (and to a lesser extent for myself) and I'd like to sus out the truth of matters. My grandfather comes back as anywhere between 0% and 10% non-European. Davidski suggested a minority mix of Roma, Tatar and New World Hispanic ancestry alongside a large dose of Central European. Roma and Tatar make sense geographically, but I would love to puzzle out how someone born in 1920s Czechoslovakia gets some Hispanic ancestry.
Wonderful counterpoint to the literary & symbolic aspects of genealogy e.g. here:
https://www.plough.com/en/topics/justice/culture-of-life/yearning-for-roots
https://www.plough.com/en/topics/life/parenting/the-name-of-my-forty-sixth-great-grandfather
My father was massively into geneology in his retirement. He travelled all over New England pouring over documents. Went to places like Chicago that had large centers for research. Volunteered at the local Mormon church to access their database. He basically became an expert in early American history by doing all the work. And he said it was more rewarding to solve these puzzles than the academic career of philosophy debating idealism versus realism. I often wonder what he would think about this new aspect of his little hobby that ended up with him authoring a 1500 page book of family history.
As a Punjabi South Asian, I got 80% South Asian, 10% West Asian, 4% Central Asian, 5% Finnish, and around 1% Irish/Scottish/Welsh. I took thr test through MyHeritage. Do you think there would be any value in uploading it to another site? Also speak to the potential accuracy of my current results.
what's your caste? that doesn't look crazy though the finnish is a surprise.
The Finnish certainly caught me off guard, my guess someone immigrated from Finland to the UK and one of their descendants ended up in the Royal Army. Or would 5% be too little for something in the past 2-300 years?
https://en.wikipedia.org/wiki/Ahluwalia_%28caste%29?wprov=sfla1
Try 23andme. Myheritagedna always gives some random European results that make no sense at all. I'm surprise Indians haven't caught on to this. A Punjabi Khatri should probably not get such numbers that you have got.
punjabis get this because they are steppe enriched vs a vs other indians. it's not "wrong," it's just 23andme "trains" its populations in very precise and demarcated ways
I feel like when people do these, they want to see those results that are "trained" and not results that don't make sense with regards to what they know their history to be.
sort of. different things for diff cultures too
MyHeritage gives my Irish father 7.5% Finnish ancestry.
I suspect it’s Norwegian ancestry though, as he has about 160 Norwegian relatives, with about 25 in Tromso in the far North - an area which was Finnish during the Viking era.
sounds they included a lot of norwegians in the finn sample?
Aren't the Saami (native people that live in the north of Scandinavia) genetically related to the Finish? Pretty sure that the Saami languages are related to Finish. So I would expect that anyone who has relatives in northern Scandinavia could have some Saami blood, thus being somehow genetically related to the finish.
Don't all of these services keep copies of your genetic data? I'm not comfortable with that.
you can supposedly delete your data from their platform
This is the reason I won’t use any of these services despite wanting to know my genetics and ethnic origin. They own or will keep the data and it will be used in terrible or immoral ways in the future.
My WGS was done by BGI in an IQ study a decade ago and I have the whole file. Unfortunately, i have no idea how to read it. I could search for an individual unit with the standard search feature, but unless I knew what it meant it would be of no value. I certainly keep it, just sitting there on my computer and I hope my children and grandchildren will get use from the data.
you have raw reads and bam i assume?
I have text pages of this rs200905204 1 232961 TT
Is that a raw read?
That's not raw reads, that's a genotype call, probably from an array. To break it down:
dbSNP record: rs200905204 (using GRCh37 genome build) chromosome: 1, position: 232961 reference base: T, your base: T (ie you have the same as the reference)
So, it won't have any rare variants, only what is on the microarray.
yeah you are right, if it was ten years ago it was an array
(though today he could get a vcf with rsid calls)
Do you have thoughts on genomic data privacy? Both whether it's important, and which services are good for it.
honestly there isn't much. now with peoples' relatives getting typed... and hospitals have traditionally been bad at security.
From my nerdy point of view, WGS looks like the coolest thing in the universe and I agree with you that it's a time of wonders when we can get our genome just like that... However, the privacy implications scare me shitless. It's not only that I'm opening my source code to the world but also the one of my close ones! Whether they consent or not.
Even though I think it's cool, same as I don't have an Alexa because I don't want to have a 24/7 microphone at my home, these kind of genetic tests always looked suspect to me. Do you have a write up, or know of one, about what the implications of having your genetic information open to the world are?
I'm really not that interested on the family origins results (ok, I'm very interested but I pass because sating my curiosity is not worth it) but it's the medical stuff that could tip the balance. For an average person that does not have a family story of harcore diseases, do you actually get actionable information on a WGS?
the key is 'average' person. most ppl are pretty boring. so you can probably pass if you have privacy concerns.
the main issue honesty re privacy is hospitals are bad about op sec. you should probably worry about being involved in the system generally *shrug*
You can usually only get medical tests in hospitals if a specialists thinks you need one (ie you or close family have diagnosis of genetic condition)
Hospitals order genetic tests but they are generally done in dedicated pathology labs.
The results (usually 0-5 classified variants) are put on patient record systems.
Anyone in the medical system can access this (in case you end up unconcious in emergency), however viewing the record is audited and people do lose their jobs for doing that without cause.
I have heard of doctors who looked up tinder dates for STDs so I don't know how often that happens in practice though.
I have not heard of any genetic blackmail from data leaks. I am sure it will happen though
you can google it but there have been data breaches many times in hospitals. they pay a fine and stuff, but it's not a massive deterrent. they're not paying for the finest opsec specialists
The USA has GINA, so insurance companies cannot use your genetic information to charge you higher rates or deny you coverage.
In other countries you may be obliged to disclose any conditions found.
For actionable info on healthy people - possibly screening for recessive disorders.... eg 1 in 25 Caucasians carry cystic fibrosis, if you are a carrier you can get your partner screened and if you both do, screen the fetus or do IVF
they can discriminate against you for life insurance tho
I did LivingDNA because I was interested in regions within Britain (as well as Y and mt DNA) but their autosomal DNA interpretation in terms of percentages is not very accurate. I uploaded their data into GEDMatch and got much more reasonable results (c. 25% Italian instead of 58% Italian - I'm 50% Ashkenazi Jewish and 48% Anglo-Celtic Australian). What I was looking for was 2% Sub-Saharan African ancestry which shows up across many analyses on GEDMatch but not on LivingDNA. I now sent off a MyHeritage test as my cousin recommended it... Waiting for result...
livingDNA seems really well trained to britain and has returend very accurate results for NW europeans i know of. but i wouldn't be surprised if it's not so good with other populations
2% seems way too high for your background. in 23andMe and ancestry such a high % is usually a true positive as i said in the piece
Razib, my ancestry is largely from Eastern/South Eastern England, with the exception of two great, great, great grandparents who where Jewish and Irish. My LivingDNA report shows me as 15.7% Northwestern Europe and 84.3% Great Britain and Ireland. The curious thing is that although 16.1% and 3.3% of my ancestry is assigned to the East Anglia and Lincolnshire group respectively, the rest seems to be assigned to western Britain and Irish subgroups. Could you please speculate as to why this might be?
well there is some work on lots of gene flow between west and east. tho i'm kind of at a loss cuz the pobi data set they use is pretty good
Thank you.
Does a WGS test like Nebula provide information about your Y haplogroup? To a similar level of detail as the Y-focused tests on FamilyTreeDNA? I think I want to do the deepest FamilyTreeDNA test some day, but I'm trying to understand where these tests might overlap.
i think 30x on nebula is 99% of the way there. Y's are tricky and i think the Big Y does have an advantage in that they've been doing it a while. but the difference imo is minor
I signed up for a free Nebula account just to upload my old Ancestry data and see what their website was like. I came across the "Deep Ancestry" page, which indicates some (new?) partnership with FamilyTreeDNA. So, it's possible that you get the benefits of the Big Y with the Nebula 30x.
https://portal.nebula.org/reporting/deep-ancestry
"We have partnered with FamilyTreeDNA to give our users access to the most comprehensive ancestry reporting based on our 30x Whole Genome Sequencing data. To access the ancestry reports click the button below to transfer your 30x Whole Genome Sequencing data to FamilyTreeDNA.
The FTDNA ancestry reports are available with 30x Whole Genome Sequencing only."
Thanks Razib!
Thanks, Razib for pointing me earlier this week on Clubhouse to Nebula and their Black Friday deal. My kit is waiting back home! Snapped up your substack’s Cyber Monday deal too - glad to be a subscriber again! Happy Thanksgiving!
This post came one day too late! Last night I finally succumbed to curiosity and ordered 23andMe Health & Ancestry after being worried about privacy implications for years. I'll likely just order a WGS as well.
One question I've had that wasn't quite addressed here: how useful is DNA testing for answering the health questions of those from lower sampled ethnicities? That question is very broad but I'm curious what your general thoughts are.
For context, I'm Kikuyu. My understanding is that polygenic risk score accuracy is worst for African populations. Should I expect the 23andMe "Health Predisposition Reports" to be bunk? How much will I learn from Promethease? I understand it's highly dependent on what health metric I'm looking at and proprietary details unknown to you; but again, I'm just interested in your general thoughts.
yeah it's not great for africans for various reasons. but it will get better and better as more africans come online in the training sets
I did a 23andme test and it gave me Mongolian and Native American ancestry at around 2%. The Native American ancestry makes sense since my family has been in Canada for 400 years but I assumed the Mongolian ancestry was simply a proxy to Native American. However, they updated it recently and the Native American has been changed for Anatolian. I really cannot connect the dots between those 2 though...
mongolian is still there?
0.9% Mongolian and used to be 0.7 or 0.9% native but now it's 1.8% Anatolian instead of native
i think it's picking up mixed native+european segments as turkic?
I’ve been getting Mongolian or Japanese on my father’s kit for years, and my mother’s kit used to get Native American. Both are Irish.
I suspect it’s ancestry from the Huns who settled in Scandinavia.
I've seen papers about sequencing of fetal DNA from the mother's blood sample, but am not aware of companies that actually offer it. (Maybe I'm just googling badly) Are there any that seem legit?
Ps, bring back the hot sauce reviews!
Hi Razib, huge fan. Remember reading your gnxp blog regularly back in the day.
I am straits Chinese from both parents side (both identify as "Hakka") My ancestors arrived from the 1800s onwards until around 1900. My Ancestry results are:
58% Southern Chinese, 14% Vietnam, 13% Central and Eastern China, 12% Dai, 3% southern Philippines.
The southern Philippines makes sense, I think, as my parents' hometown is relatively close to the south of the Philippines, but I had anticipated a lot more Southeast Asian. Any suggestions for what other provider to look at for my specific situation as a Straits Chinese?
This question is probably a week too late to get any response, but Razib threw in a small aside that I've not heard before, and I pay keen attention to all his content regarding the history of our lineage in Africa over the last 200k years. Namely, this article mentions in passing that the lineage of those who left Africa ~60kya actually diverged from the lineage of sub-Saharan Africans some time before that -- ~100kya. Did I miss a podcast or article where this was explored? Inquiring Minds gotta know!!
i have't talked about it in detail. but it does look like the model is a little more complex than a separation 60k bp. like there was a period of separation, or intermediate groups went extinct, or there was population fusion btwn back to african and deep africa
Would you say the ancestry report from Nebula is comparable to those of Ancestry/23andMe?
no
What do you think is the best way to get a good ancestry report if you have WGS data? I did sequencing.com and bought the Eone Ancestry report.
PS.: Eone Ancestry was not worth the $10 (small reference samples). I also strongly recommend against buying from sequencing.com, had a really bad experience with them (took >8 months, lost one sample, low quality for another because of the delays and half the site doesn't work after their big update).
if you have VCF you can convert to array text files. then upload to stuff like mytrueancestry?
I thought I'd share as a Canadian of donor-conceived origins which sites have been most helpful for me for finding my siblings (27 and counting so far... likely around 150 total out there).
Ancestry: 5 (including 1 overlap with 23andMe)
23andMe: 5 (including 1 overlap with Ancestry)
FamilyTree (upload of my Ancestry kit): 1
MyHeritageDNA (upload of my Ancestry kit): 1
That uploading option is one I would flag as missing from your piece. It's free/cheap to get a lot of info by uploading your 23andMe or Ancestry kits to FamilyTree or MyHeritageDNA, but there is no option to upload from other companies into either Ancestry or 23andMe.
I will say that MyHeritage has been a bit better for "old country" (ie Europe) matches.
Re: looking at your own genetic ancestry directly, what tools are you using for that if you don't mind me asking? R and qpAdm?
The reason I ask is I get pretty widely divergent results for my grandfather (and to a lesser extent for myself) and I'd like to sus out the truth of matters. My grandfather comes back as anywhere between 0% and 10% non-European. Davidski suggested a minority mix of Roma, Tatar and New World Hispanic ancestry alongside a large dose of Central European. Roma and Tatar make sense geographically, but I would love to puzzle out how someone born in 1920s Czechoslovakia gets some Hispanic ancestry.