Microarray genotyping platforms report high accuracy. Of course, this is given that you use their protocols, ideal conditions, etc. Depending on the genotyping facility, this accuracy may be even more tenuous. I recently set out to get some good estimates for the rate of genotyping errors from the Illumina assay employed by 23andMe.
First let me refer back to a previous post about determining haplotype when you have two parents and a child. By determining haplotypes for all of my siblings and I, I am able to compare regions where we share both parental haplotypes, or shared diplotypes. In my family, there are four siblings (including myself), and I have had us all tested at 23andMe. To analyze genotyping errors, I decided to compare informative SNPs from shared haplotypes between my three brothers and I.
- Determine Transmitted Parental Haplotypes from all four siblings.
- Determine Where Both alleles are Shared for all four Siblings.
- Find out how many genotyping errors occurred in this region.
Determining the parental haplotypes is simple, and my method for doing this has already been described. Determining where the four siblings share alleles was accomplished using a program I wrote called NucleOlap. This program compares informative SNPs from paternal and maternal haplotypes between children and produces a nice output. It is designed to recognize candidate regions responsible for dominant or recessive genetic mutations given each child’s affected or unaffected status. However, if all children are affected, it is the same thing as analyzing haplotype sharing. Check out the documentation.
For every pair of siblings, it is expected that, on average, they will share both parental haplotypes for 25% of their genome. Add a 3rd sibling, and that 25% falls to 6.25%. For four siblings, it is expected that both haplotypes are shared for only 1.56% of their genome. According to release 36.1 of the Human Genome, the haploid length of the autosomes is 2,864,255,922 base pairs! I can expect that my three brothers and I share both parental haplotypes for 44,753,999 haploid base pairs.
The NucleOlap analysis found that my siblings and I all shared both haplotypes in six regions for a total of 47,656,130 haploid base pairs. Pretty good! The ideogram to the left shows the regions where my three brothers and I share the exact same genes from both parents. Shared regions occur on chromosomes 1, 2, 6, 13, 16, and 18. NucleOlap also provided me with the starting and ending positions (and SNPs) for each region. To determine where genotyping errors occured, I compared the raw data for these regions with each child (the program output is not affected by genotyping errors because it is able to recognize and ignore them).
The analysis occurred by gathering the SNPs in the identified shared regions and lining them up parallel to one another in Microsoft Excel. I then checked to see that all four siblings had the same genotype for each SNP (as they are expected to). A sample of how this worked is shown in the picture to the right.
My analysis revealed that 10,079 SNPs were contained within the regions where my brothers and I share diplotypes. Of these 10,079 SNPs, only 86 of them had any genotyping errors! This means that the genotyping calling was 99.15% accurate for these regions. Moreover, of the errors recorded, 79 of them occurred when there was a genotype call for some of the siblings and a null call (–) for others. Only 7 errors occurred where there was inconsistency in the genotype assigned to the siblings. The results are summarized in the table to the left.
My conclusion: the genotyping error rate is very low, less than 1% for the Illumina platform used by 23andMe. Even taking null calls into account, this number is still below 1%. My siblings and I shared 99.15% genotype identity in a region where we all share both parental haplotypes. I am very pleased with the accuracy.