Photo by: liber
So I’ve already written a post about the challenges of phasing genotype data, but now I’m here to help you accomplish that task. Let’s go through a checklist of what will be needed:
- Your personal SNP information (through either 23andMe, Navigenics, deCODEme, etc.)
- The SNP information for your parents (preferably through the same company/microarray platform)
If you want to use my specific method you also need:
- The most recent Java Runtime Environment.
Of course, you can implement your own version of this phasing protocol with basic familiarity in a programming language or (more tediously) with some macros and if statements in Microsoft Excel.
How to Phase Your Genome: A Conceptual Overview
With information from both parents, it is possible to phase your genome (for the vast majority of SNP calls). We rely on the fact that for most situations, you can identify exactly what was inherited from your father and exactly what was inherited from your mother.
For example: If at a particular position, your genotype call is AT, your father’s genotype call is AA, and your mother’s genotype call is TT, then you know that the A must have come from your father, and the T must have come from your mother. Simple! We will refer to situations where phase can be determined as informative.
The chart to the right outlines exactly
which situations are informative. The good news is that every situation is informative with the exception of one: when both parents and the child are heterozygous. Here, we are unable to say for certain what allele was inherited from each parent.
A sample implementation of how to phase a child’s DNA is illustrated below:

Implementing this Phasing Strategy: I’m Here to Help
If you have all the files mentioned above and would like to phase your genome, then I am more than happy to provide you with a Java archive that will allow you to accomplish this task. Even more, I will provide detailed instructions as to how to use this archive (it’s really simple, I swear).
You can download the Java program here as a zip file. Once finished, unzip the contents into the same fold. You should see Launcher.jar and PhaseME.jar. To launch the program, click Launcher.jar (I thought that was pretty obvious), and the GUI pictured below should appear:
![]()
The input files need to contain four columns in this order: rsid, chromosome, position, genotype. The genotype data needs to simply be AA, AT, TT, etc. without any slashes or quotation marks.
Before running the program, you do need to make sure that your data is the same length and contains the same SNPs between both parents and the child. I have not incorporated any checks into the program for this. My recommendation is to use an IF statement in Microsoft Excel (version 2007) to make sure that all three files line up. Also, make sure that only one row of headers exists in the files.
Finally, select the files to be compared (father, mother, child), and select an output location and choose a name for the outputs. There will be 23 outputs with the following filenames: <yourchosenname>.chr<chromosome>.phased.txt. Each chromosome has its own output that shows you the haplotype inherited from the father and the haplotype inherited from the mother. This program just ignores Y and MT data. However, the program does have the ability to recognize whether the child is male or female, and it assigns the X chromosome haplotypes accordingly.
Let me know about any problems with the program (ex. If it does not produce any output), and I will check to see (1. If your input files are the problem, 2. If the program is the problem).
Popularity: 27%




1. thanks
2. 22 files were created, there is no results for X chromosome
3. how we know that the processing is finished?
Hi Leon,
I’ve checked the program, and it turns out that for 23andMe data (which I assume you used) the X chromosome does not process properly since they only report 1 letter for male chromosomes (and the program expects two letters…even for males, they would all look homozygous on the X chromosome).
I have not put in a bar to monitor the process of the program, but when the Process button becomes depressed it is finished.
I’ll work on a version that handles the 23andMe X chromosome, but if you would like to try on your own, edit your 23andMe data so that for males, when only one allele is reported for the , ex. A, it is viewed as an apparent homozygosity “AA”.
Thanks,
Alex
Thanks very much for putting this together, Alex. I have written a little BASIC program to work with my family’s data, and the results are basically (ha!) similar except for a couple of variations:
1) I flag “Mendelian inconsistencies,” which could indicate genotyping error, or more interestingly, microdeletions when several inconsistencies are clustered close together.
Father CC
Mother CC
Child CT
2) I make no attempt to phase the genotypes when one of the triad has a no-call, and I’m sure I could derive some haplotypes if I added some rules. I see you handle some no-calls well, but there are exceptions. For instance, I saw one where G was listed for the paternal haplotype:
Father AG
Mother –
Child AA
3) I go one step further and derive the maternal and paternal alleles that are *not* passed on to the child. These aren’t truly haplotypes, but they do cover long stretches of a chromosome. I found this useful for my purposes, where my son did not inherit an autosomal dominant trait from me, and I could compare my other alleles with cousins who did inherit the familial hearing impairment.
4) I didn’t work with the X chromosome data myself, but I don’t seem to get any output for the X, even though I doubled the base call for the males.
I look forward to exploring all of your current and future tools!
Thanks for your comment Ann. Your basic program sounds very interesting, and you’ve spotted some good holes for me to plug in my app.
I realize that for null calls, I was assuming that the data would be NN in my programming since the data I used had that. However, in my next version I’ll be sure to include –.
Identification of hemizygous microdeletions was actually one of the next entries that I’m working on. For that, I’m putting together a program that provides some visual output along with an explanation of methods. Maybe that might be another interesting tool for personal genome/family genome analysis?
Thanks for reading!