Use Family SNP Data to Phase Your Own Genome

DNAStairs

Photo by: liber

So I’ve already written a post about the challenges of phasing genotype data, but now I’m here to help you accomplish that task. Let’s go through a checklist of what will be needed:

  • Your personal SNP information (through either 23andMe, Navigenics, deCODEme, etc.)
  • The SNP information for your parents (preferably through the same company/microarray platform)

If you want to use my specific method you also need:

Of course, you can implement your own version of this phasing protocol with basic familiarity in a programming language or (more tediously) with some macros and if statements in Microsoft Excel.

How to Phase Your Genome: A Conceptual Overview

Phased v UnphasedWith information from both parents, it is possible to phase your genome (for the vast majority of SNP calls). We rely on the fact that for most situations, you can identify exactly what was inherited from your father and exactly what was inherited from your mother.

For example: If at a particular position, your genotype call is AT, your father’s genotype call is AA, and your mother’s genotype call is TT, then you know that the A must have come from your father, and the T must have come from your mother. Simple! We will refer to situations where phase can be determined as informative.

The chart to the right outlines exactly Informative SNPs Chartwhich situations are informative. The good news is that every situation is informative with the exception of one: when both parents and the child are heterozygous. Here, we are unable to say for certain what allele was inherited from each parent.

A sample implementation of how to phase a child’s DNA is illustrated below:

Phasing Data

Implementing this Phasing Strategy: I’m Here to Help

If you have all the files mentioned above and would like to phase your genome, then I am more than happy to provide you with a Java archive that will allow you to accomplish this task. Even more, I will provide detailed instructions as to how to use this archive (it’s really simple, I swear).

You can download the Java program here as a zip file. Once finished, unzip the contents into the same fold. You should see Launcher.jar and PhaseME.jar. To launch the program, click Launcher.jar (I thought that was pretty obvious), and the GUI pictured below should appear:

PhaseME

The input files need to contain four columns in this order: rsid, chromosome, position, genotype. The genotype data needs to simply be AA,  AT, TT, etc. without any slashes or quotation marks.

Before running the program, you do need to make sure that your data is the same length and contains the same SNPs between both parents and the child. I have not incorporated any checks into the program for this. My recommendation is to use an IF statement in Microsoft Excel (version 2007) to make sure that all three files line up. Also, make sure that only one row of headers exists in the files.

Finally, select the files to be compared (father, mother, child), and select an output location and choose a name for the outputs. There will be 23 outputs with the following filenames: <yourchosenname>.chr<chromosome>.phased.txt. Each chromosome has its own output that shows you the haplotype inherited from the father and the haplotype inherited from the mother. This program just ignores Y and MT data. However, the program does have the ability to recognize whether the child is male or female, and it assigns the X chromosome haplotypes accordingly.

Let me know about any problems with the program (ex. If it does not produce any output), and I will check to see (1. If your input files are the problem, 2. If the program is the problem).

26 Responses to “Use Family SNP Data to Phase Your Own Genome”


  1. 1 Leon Kull

    1. thanks
    2. 22 files were created, there is no results for X chromosome
    3. how we know that the processing is finished?

  2. 2 admin

    Hi Leon,

    I’ve checked the program, and it turns out that for 23andMe data (which I assume you used) the X chromosome does not process properly since they only report 1 letter for male chromosomes (and the program expects two letters…even for males, they would all look homozygous on the X chromosome).

    I have not put in a bar to monitor the process of the program, but when the Process button becomes depressed it is finished.

    I’ll work on a version that handles the 23andMe X chromosome, but if you would like to try on your own, edit your 23andMe data so that for males, when only one allele is reported for the , ex. A, it is viewed as an apparent homozygosity “AA”.

    Thanks,
    Alex

  3. 3 Ann Turner

    Thanks very much for putting this together, Alex. I have written a little BASIC program to work with my family’s data, and the results are basically (ha!) similar except for a couple of variations:

    1) I flag “Mendelian inconsistencies,” which could indicate genotyping error, or more interestingly, microdeletions when several inconsistencies are clustered close together.

    Father CC
    Mother CC
    Child CT

    2) I make no attempt to phase the genotypes when one of the triad has a no-call, and I’m sure I could derive some haplotypes if I added some rules. I see you handle some no-calls well, but there are exceptions. For instance, I saw one where G was listed for the paternal haplotype:

    Father AG
    Mother –
    Child AA

    3) I go one step further and derive the maternal and paternal alleles that are *not* passed on to the child. These aren’t truly haplotypes, but they do cover long stretches of a chromosome. I found this useful for my purposes, where my son did not inherit an autosomal dominant trait from me, and I could compare my other alleles with cousins who did inherit the familial hearing impairment.

    4) I didn’t work with the X chromosome data myself, but I don’t seem to get any output for the X, even though I doubled the base call for the males.

    I look forward to exploring all of your current and future tools!

  4. 4 admin

    Thanks for your comment Ann. Your basic program sounds very interesting, and you’ve spotted some good holes for me to plug in my app.

    I realize that for null calls, I was assuming that the data would be NN in my programming since the data I used had that. However, in my next version I’ll be sure to include –.

    Identification of hemizygous microdeletions was actually one of the next entries that I’m working on. For that, I’m putting together a program that provides some visual output along with an explanation of methods. Maybe that might be another interesting tool for personal genome/family genome analysis?

    Thanks for reading!

  5. 5 Lesley

    Trying your program now, no output files created?
    Will this work on Affy platform?
    Thanks.
    Lesley

  6. 6 admin

    Hi Lesley,

    It should work with an affy platform. I’m wondering if your input files are 1. all the same length, 2. Formatted properly. If you want I can check them for you.

    -Alex

  7. 7 pay per install affiliate

    I came across your site, i think your blog is interesting, keep us posting.

  8. 8 unlock iPhone

    I want to subscribe to your blog, do you have newsletter ?

  9. 9 Samira Tripplett

    Great article and straight to the point. I am not sure if this is actually the best place to ask but do you guys have any ideea where to employ some professional writers? Thanks in advance :)

  10. 10 Laurel

    I’ve been working on writing my own program to phase my mother’s data (but her parents are dead), though since I don’t know as much about it, I’ve been going about it in a sideways fashion of taking regions where me and my parent match a given relative, and then comparing me and both my parents’ genomes there. Anyways, I’ve tried to give your program a go, but it’s not creating any output. I made sure the data files had exactly the same SNPs (had to do it for my own program’s purposes as well), so I’m wondering if it’s an issue with formatting…

  11. 11 John

    Oh, you already did exactly what I did. Even Ann has done the micro-deletion/inconsistencies thing. Reinventing the wheel is always fun :P

  12. 12 Tom Johnston

    Anyone working with FAMILYTREEDNA data? More specifically, utilizing Y-DNA raw data AND familyfinder (autosomal data) to learn more about the mtdna?

  13. 13 recycleyourfashions.

    Sasha Grey’s Anatomy Sasha Grey is in two group scenes in this highly produced 2006 blockbuster. Here’s a young woman without implants, without much makeup and without
    even a typical porn starlet body.

  14. 14 Gregory Anderson

    Program won’t run. I’m using Windows 7-64bit. Is this the problem?

    Result after clicking: asks is I want to save.

    Thanks, Greg

  15. 15 flashgamesnew.com

    What i do not understood is in reality how you’re no longer really a lot more smartly-liked than you might be now. You’re
    so intelligent. You realize thus significantly relating to this subject, produced me individually consider it from numerous various
    angles. Its like women and men aren’t fascinated unless it is something to accomplish with Woman gaga! Your individual stuffs nice. All the time take care of it up!

  16. 16 golf course in

    Amazing blog! Do you have any tips and hints for
    aspiring writers? I’m hoping to start my own website soon but I’m a little lost on everything.
    Would you propose starting with a free platform like
    Wordpress or go for a paid option? There are so many choices out there
    that I’m totally overwhelmed .. Any recommendations? Thank you!

  17. 17 android market 4.3.3

    Hey there, You’ve done a fantastic job. I’ll certainly digg it and personally suggest to my friends.
    I’m confident they will be benefited from this website.

  18. 18 Database Consulting

    This is really interesting, You are a very skilled blogger.
    I’ve joined your rss feed and look forward to seeking more of your great post. Also, I’ve
    shared your web site in my social networks!

  19. 19 League of Legends

    As a family group, you start to doubt anything your family member
    has ever told you. Online possesses numerous books (of varied quality) which will try to advise somebody inside taking away these kinds of dangers on your personal.
    (Suffice to express, Sony won’t be putting up any roadblocks to buying
    stuff.

  20. 20 Michael

    Link exchange is nothing else but it is only placing the other person’s blog link on your page at appropriate place and other person will also
    do similar in favor of you.

  21. 21 http://

    I was wondering if you ever thought of changing the layout of your blog?
    Its very well written; I love what youve got to say. But
    maybe you could a little more in the way of content so people could connect with it better.
    Youve got an awful lot of text for only having 1 or two pictures.

    Maybe you could space it out better?

  22. 22 Bettina

    Hi there, every time i used to check website posts here early in the break of day, as i enjoy to gain
    knowledge of more and more.

  23. 23 Dessie

    I know this web site provides quality based articles or
    reviews and extra material, is there any other web page which provides such
    data in quality?

  24. 24 fat loss factor

    This text is worth everyone’s attention. How can
    I find out more?

  25. 25 http://www.pedestre.

    The motor fleet insurance market is dysfunctional.
    She said that while his prison sentence for dangerous driving was two years.
    As a child, Ion returned to the United States African Squadron — has
    been visited by every vessel of that Squadron, with one exception.

  1. 1 The Ambrosini Critique » Blog Archive » 23andMe data are in…

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

Spam Protection by WP-SpamFree

RSS for Posts RSS for Comments