Photo by: liber
So I’ve already written a post about the challenges of phasing genotype data, but now I’m here to help you accomplish that task. Let’s go through a checklist of what will be needed:
- Your personal SNP information (through either 23andMe, Navigenics, deCODEme, etc.)
- The SNP information for your parents (preferably through the same company/microarray platform)
If you want to use my specific method you also need:
- The most recent Java Runtime Environment.
Of course, you can implement your own version of this phasing protocol with basic familiarity in a programming language or (more tediously) with some macros and if statements in Microsoft Excel.
How to Phase Your Genome: A Conceptual Overview
With information from both parents, it is possible to phase your genome (for the vast majority of SNP calls). We rely on the fact that for most situations, you can identify exactly what was inherited from your father and exactly what was inherited from your mother.
For example: If at a particular position, your genotype call is AT, your father’s genotype call is AA, and your mother’s genotype call is TT, then you know that the A must have come from your father, and the T must have come from your mother. Simple! We will refer to situations where phase can be determined as informative.
The chart to the right outlines exactly which situations are informative. The good news is that every situation is informative with the exception of one: when both parents and the child are heterozygous. Here, we are unable to say for certain what allele was inherited from each parent.
A sample implementation of how to phase a child’s DNA is illustrated below:
Implementing this Phasing Strategy: I’m Here to Help
If you have all the files mentioned above and would like to phase your genome, then I am more than happy to provide you with a Java archive that will allow you to accomplish this task. Even more, I will provide detailed instructions as to how to use this archive (it’s really simple, I swear).
You can download the Java program here as a zip file. Once finished, unzip the contents into the same fold. You should see Launcher.jar and PhaseME.jar. To launch the program, click Launcher.jar (I thought that was pretty obvious), and the GUI pictured below should appear:
The input files need to contain four columns in this order: rsid, chromosome, position, genotype. The genotype data needs to simply be AA, AT, TT, etc. without any slashes or quotation marks.
Before running the program, you do need to make sure that your data is the same length and contains the same SNPs between both parents and the child. I have not incorporated any checks into the program for this. My recommendation is to use an IF statement in Microsoft Excel (version 2007) to make sure that all three files line up. Also, make sure that only one row of headers exists in the files.
Finally, select the files to be compared (father, mother, child), and select an output location and choose a name for the outputs. There will be 23 outputs with the following filenames: <yourchosenname>.chr<chromosome>.phased.txt. Each chromosome has its own output that shows you the haplotype inherited from the father and the haplotype inherited from the mother. This program just ignores Y and MT data. However, the program does have the ability to recognize whether the child is male or female, and it assigns the X chromosome haplotypes accordingly.
Let me know about any problems with the program (ex. If it does not produce any output), and I will check to see (1. If your input files are the problem, 2. If the program is the problem).