<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Chromosome Chronicles &#187; Haplotypes</title>
	<atom:link href="http://www.chromosomechronicles.com/tag/haplotypes/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.chromosomechronicles.com</link>
	<description>Genetics 2.0: Intelligent design and unnatural selection...</description>
	<lastBuildDate>Wed, 28 Sep 2011 03:48:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Analysis of 23andMe&#8217;s Genotyping: High Accuracy of Illumina Platform Confirmed by Comparing Siblings</title>
		<link>http://www.chromosomechronicles.com/2010/03/27/analysis-of-23andmes-genotyping-high-accuracy-of-illumina-platform-confirmed-by-comparing-siblings/</link>
		<comments>http://www.chromosomechronicles.com/2010/03/27/analysis-of-23andmes-genotyping-high-accuracy-of-illumina-platform-confirmed-by-comparing-siblings/#comments</comments>
		<pubDate>Sun, 28 Mar 2010 01:09:36 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Analyze Your Own SNPs]]></category>
		<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Consumer Genetics]]></category>
		<category><![CDATA[23andMe]]></category>
		<category><![CDATA[Genotyping]]></category>
		<category><![CDATA[Haplotypes]]></category>
		<category><![CDATA[Illumina]]></category>
		<category><![CDATA[SNPs]]></category>

		<guid isPermaLink="false">http://www.chromosomechronicles.com/?p=412</guid>
		<description><![CDATA[An analysis of 23andMe genotype data between siblings to check for genotyping errors reveals high accuracy (99.15%) of genotype calls. ]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fwww.chromosomechronicles.com%2F2010%2F03%2F27%2Fanalysis-of-23andmes-genotyping-high-accuracy-of-illumina-platform-confirmed-by-comparing-siblings%2F" onclick="pageTracker._trackPageview('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fwww.chromosomechronicles.com_2F2010_2F03_2F27_2Fanalysis-of-23andmes-genotyping-high-accuracy-of-illumina-platform-confirmed-by-comparing-siblings_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fwww.chromosomechronicles.com%2F2010%2F03%2F27%2Fanalysis-of-23andmes-genotyping-high-accuracy-of-illumina-platform-confirmed-by-comparing-siblings%2F&amp;source=chromchron&amp;style=normal&amp;service=bit.ly&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p style="text-align: center;"><a href="http://www.chromosomechronicles.com/wp-content/uploads/2010/03/iscan.jpg"><img class="aligncenter size-full wp-image-421" title="iScan" src="http://www.chromosomechronicles.com/wp-content/uploads/2010/03/iscan.jpg" alt="" width="265" height="290" /></a></p>
<p>Microarray genotyping platforms report high accuracy. Of course, this is given that you use their protocols, ideal conditions, etc. Depending on the genotyping facility, this accuracy may be even more tenuous. I recently set out to get some good estimates for the rate of genotyping errors from the <a title="Illumina" href="http://illumina.com/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/illumina.com/?referer=');">Illumina</a> assay employed by <a title="23andMe" href="http://23andme.com" target="_blank" onclick="pageTracker._trackPageview('/outgoing/23andme.com?referer=');">23andMe</a>.</p>
<p>First let me refer back to a previous post about <a title="Phasing: Determining Which SNPs are Inherited together" href="http://www.chromosomechronicles.com/2009/09/08/phasing-determining-which-snps-are-inherited-together/" target="_blank">determining haplotype</a> when you have two parents and a child. By determining haplotypes for all of my siblings and I, I am able to compare regions where we share both parental haplotypes, or shared diplotypes. In my family, there are four siblings (including myself), and I have had us all tested at 23andMe. To analyze genotyping errors, I decided to compare informative SNPs from shared haplotypes between my three brothers and I.</p>
<h3>Experimental Outline:</h3>
<ol>
<li>Determine Transmitted Parental Haplotypes from all four siblings.</li>
<li>Determine Where Both alleles are Shared for all four Siblings.</li>
<li>Find out how many genotyping errors occurred in this region.</li>
</ol>
<p>Determining the parental haplotypes is simple, and my method for doing this has already been described. Determining where the four siblings share alleles was accomplished using a program I wrote called <a title="Chromosoft - NucleOlap" href="http://chromosoft.org/products/nucleolap/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/chromosoft.org/products/nucleolap/?referer=');">NucleOlap</a>. This program compares informative SNPs from paternal and maternal haplotypes between children and produces a nice output. It is designed to recognize candidate regions responsible for dominant or recessive genetic mutations given each child&#8217;s affected or unaffected status. However, if all children are affected, it is the same thing as analyzing haplotype sharing. Check out the <a title="NucleOlap Analysis Method" href="http://chromosoft.org/wp-content/uploads/2010/03/NucleOlap.Analysis.Method.pdf" target="_blank" onclick="pageTracker._trackPageview('/outgoing/chromosoft.org/wp-content/uploads/2010/03/NucleOlap.Analysis.Method.pdf?referer=');">documentation</a>.</p>
<p>For every pair of siblings, it is expected that, on average, they will share both parental haplotypes for 25% of their genome. Add a 3rd sibling, and that 25% falls to 6.25%. For four siblings, it is expected that both haplotypes are shared for only 1.56% of their genome. According to release 36.1 of the Human Genome, the haploid length of the autosomes is 2,864,255,922 base pairs! I can expect that my three brothers and I share both parental haplotypes for 44,753,999 haploid base pairs.</p>
<p><a href="http://www.chromosomechronicles.com/wp-content/uploads/2010/03/siblings.png"><img class="alignleft size-medium wp-image-413" title="Sibling Diplotype Sharing" src="http://www.chromosomechronicles.com/wp-content/uploads/2010/03/siblings-300x300.png" alt="" width="240" height="240" /></a>The NucleOlap analysis found that my siblings and I all shared both haplotypes in six regions for a total of 47,656,130 haploid base pairs. Pretty good! The ideogram to the left shows the regions where my three brothers and I share the exact same genes from both parents. Shared regions occur on chromosomes 1, 2, 6, 13, 16, and 18. NucleOlap also provided me with the starting and ending positions (and SNPs) for each region. To determine where genotyping errors occured, I compared the raw data for these regions with each child (the program output is not affected by genotyping errors because it is able to recognize and ignore them).</p>
<p>The analysis occurred by gathering<a href="http://www.chromosomechronicles.com/wp-content/uploads/2010/03/Error-Example.png"><img class="alignright size-medium wp-image-415" title="Error Example" src="http://www.chromosomechronicles.com/wp-content/uploads/2010/03/Error-Example-300x111.png" alt="" width="300" height="111" /></a> the SNPs in the identified shared regions and lining them up parallel to one another in Microsoft Excel. I then checked to see that all four siblings had the same genotype for each SNP (as they are expected to). A sample of how this worked is shown in the picture to the right.</p>
<p>My analysis revealed that 10,079 SNPs were contained within the regions where my brothers and I share diplotypes. Of these 10,079 SNPs, only 86 of them had any genotyping errors! <a href="http://www.chromosomechronicles.com/wp-content/uploads/2010/03/Error-Summary-Table.png"><img class="alignleft size-full wp-image-417" title="Error Summary Table" src="http://www.chromosomechronicles.com/wp-content/uploads/2010/03/Error-Summary-Table.png" alt="" width="218" height="148" /></a>This means that the genotyping calling was 99.15% accurate for these regions. Moreover, of the errors recorded, 79 of them occurred when there was a genotype call for some of the siblings and a null call (&#8211;) for others. Only 7 errors occurred where there was inconsistency in the genotype assigned to the siblings. The results are summarized in the table to the left.</p>
<p>My conclusion: the genotyping error rate is very low, less than 1% for the Illumina platform used by 23andMe. Even taking null calls into account, this number is still below 1%. My siblings and I shared 99.15% genotype identity in a region where we all share both parental haplotypes. I am very pleased with the accuracy.</p>
<td width="129" height="20" align="right"><span style="font-size: small;"><span><br />
</span></span></td>
<img src="http://www.chromosomechronicles.com/?ak_action=api_record_view&id=412&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://www.chromosomechronicles.com/2010/03/27/analysis-of-23andmes-genotyping-high-accuracy-of-illumina-platform-confirmed-by-comparing-siblings/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Identity by State SNP Analysis: Find Relatives, Test Paternity, and Determine Allele Sharing</title>
		<link>http://www.chromosomechronicles.com/2009/10/22/identity-by-state-snp-analysis-find-relatives-test-paternity-and-determine-allele-sharing/</link>
		<comments>http://www.chromosomechronicles.com/2009/10/22/identity-by-state-snp-analysis-find-relatives-test-paternity-and-determine-allele-sharing/#comments</comments>
		<pubDate>Fri, 23 Oct 2009 00:52:36 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Analyze Your Own SNPs]]></category>
		<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Consumer Genetics]]></category>
		<category><![CDATA[23andMe]]></category>
		<category><![CDATA[Analysis Tools]]></category>
		<category><![CDATA[Family Genetics]]></category>
		<category><![CDATA[Genetic Analysis]]></category>
		<category><![CDATA[Haplotypes]]></category>
		<category><![CDATA[Navigenics]]></category>
		<category><![CDATA[Personal Genetics]]></category>
		<category><![CDATA[Recombination Analysis]]></category>

		<guid isPermaLink="false">http://www.chromosomechronicles.com/?p=324</guid>
		<description><![CDATA[Identity by State SNP Analysis can be used to find relatives, test paternity, examine inbreeding, and look at recombinations between siblings.]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fwww.chromosomechronicles.com%2F2009%2F10%2F22%2Fidentity-by-state-snp-analysis-find-relatives-test-paternity-and-determine-allele-sharing%2F" onclick="pageTracker._trackPageview('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fwww.chromosomechronicles.com_2F2009_2F10_2F22_2Fidentity-by-state-snp-analysis-find-relatives-test-paternity-and-determine-allele-sharing_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fwww.chromosomechronicles.com%2F2009%2F10%2F22%2Fidentity-by-state-snp-analysis-find-relatives-test-paternity-and-determine-allele-sharing%2F&amp;source=chromchron&amp;style=normal&amp;service=bit.ly&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-328" title="IBS Comparison of Siblings" src="http://www.chromosomechronicles.com/wp-content/uploads/2009/10/siblings1.jpg" alt="IBS Comparison of Siblings" width="512" height="512" /></p>
<p>This has to be the longest title of any post, but it covers the diverse functions of the identity-by-state method of analyzing high-density SNP data.</p>
<p>Elisha Roberson and Jonathan Pevsner published a paper this August titled, <a title="PLoS ONE" href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0006711" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.plosone.org/article/info_doi/10.1371/journal.pone.0006711?referer=');">&#8220;Visualization of Shared Genomic Regions and Meiotic Recombination in High-Density SNP Data.&#8221;</a> This paper describes methods and results for comparing the genome between any two individuals in search of shared genomic regions.</p>
<p>While many methods for determining genomic similarity rely on knowledge of family tree structure (identity by descent methods), the method described by Roberson and Pevsner compares SNPs based on identity by state, which allows them to compare any two individuals without knowledge of the family structure.</p>
<h3>Genomic Comparison via Identity by State: A Brief Overview</h3>
<p>Identity by State examines pairs of SNPs between two individuals and puts them into one of three categories:</p>
<p>1. <strong>Identical: </strong>Both individuals have the same genotype call (Ex. AA and AA; BB and BB; AB and AB).</p>
<p>2. <strong>One-Allele Shared:</strong> Only one call is shared between both individuals (Ex. AA and AB; AB and BB).</p>
<p>3. <strong>No alleles shared: </strong>No alleles are the same (Ex. AA and BB).</p>
<p>For individual SNPs, this type of analysis really does not provide any extra information. The real advantage is gained when high-density SNP information is taken for the whole genome (<a title="23andMe" href="http://23andme.com" target="_blank" onclick="pageTracker._trackPageview('/outgoing/23andme.com?referer=');">23andMe </a>customers have this, and many other Illumina and Affymetrix platforms cover the whole genome).</p>
<p>In the paper, <strong>identical </strong>Identity by State (IBS) is referred to as situation 2; <strong>one-allele shared</strong> is situation 1; and <strong>no alleles shared</strong> is situation 0. For each chromosome, these three different &#8220;situations&#8221; can be plotted along an ideogram separated based on IBS. Here are a few images from the paper.</p>
<p style="text-align: center;"><img class="aligncenter size-large wp-image-340" title="Father V Mother" src="http://www.chromosomechronicles.com/wp-content/uploads/2009/10/fathermother3-1024x178.png" alt="Father V Mother" width="491" height="86" /></p>
<p style="text-align: center;">
<p style="text-align: center;">
<p style="text-align: left;">As we can see, when we plot the various IBS calls across chromosome 10 between two parents (unrelated individuals &#8211; as far as we know), there are tons of 2&#8242;s and 1&#8242;s, and even large numbers of 0&#8242;s. The data does not seem to have any rhyme to it. In fact, a distribution of 1&#8242;s 2&#8242;s and 0&#8242;s like this is evidence that the two are very unrelated. I can only see one small region that seems to be devoid of 0&#8242;s (arrow), which signifies something special!</p>
<p style="text-align: left;">Whenever there is a <em>consistent</em> lack of 0&#8242;s (a lack of unshared SNPs), it is evidence that the two individuals share actual haplotypes. In the example above, the region without the 0&#8242;s does contain a good number of 1&#8242;s and 2&#8242;s. This indicates that these two individuals share one haplotype i<img class="size-medium wp-image-343 alignleft" title="IBS Chart" src="http://www.chromosomechronicles.com/wp-content/uploads/2009/10/ibscallschart-300x96.png" alt="IBS Chart" width="210" height="67" />n that region. If there were no 1&#8242;s in that region, that would be evidence that the individuals shared both haplotypes. The chart to the left summarizes how to interpret what is shared given the IBS calls in a region.</p>
<p style="text-align: left;">Roberson and Pevsner continue to show what type of IBS call distributions are made by various combinations of relations. For example:</p>
<p style="text-align: center;"><strong>Mother V. Son</strong></p>
<p style="text-align: center;"><strong><img class="aligncenter size-full wp-image-345" title="Mother V Son" src="http://www.chromosomechronicles.com/wp-content/uploads/2009/10/motherson.png" alt="Mother V Son" width="516" height="88" /></strong></p>
<p style="text-align: center;"><strong>Siblings</strong></p>
<p style="text-align: center;"><strong> </strong></p>
<p style="text-align: center;"><strong><img class="aligncenter size-full wp-image-349" title="Siblings" src="http://www.chromosomechronicles.com/wp-content/uploads/2009/10/siblings1.png" alt="Siblings" width="508" height="74" /><br />
</strong></p>
<p style="text-align: left;">The results from comparing a mother to her child show that only 2&#8242;s and 1&#8242;s are expected to be visible (the occasional 0 might slip in due to genotyping error, or more rarely, mutation). This is consistent with the fact that the child receives one of his/her chromosomes from the mother.</p>
<p style="text-align: left;">The situation when the siblings are compared is a bit more complicated, but also more informative! We can see regions where 1&#8242;s, 2&#8242;s, and 0&#8242;s are all present, which indicates that the children inherited completely different alleles from the parents. Regions where only 1&#8242;s and 2&#8242;s are present indicate that the same allele was inherited from one parent, but a different allele from another. Finally, regions where only 2&#8242;s are present indicate that the children inherited the same allele from each parent in that region. Transitions from regions with 2,1,0 to 2,1 to 1 and vice versa indicate a meiotic recombination event had occurred. Very cool!</p>
<h3 style="text-align: left;"><strong>Exploring What We Can Learn with IBS Analysis</strong></h3>
<p>It has been shown pretty well that IBS analysis can be used to look at recombination events and allele sharing between siblings. This is extremely useful. However, IBS Analysis can apply towards a few other items:</p>
<ul>
<li><strong>Examination of Hemizygous Deletions:</strong> Normally, parent-child comparisons only present with regions of IBS-2 and 1. We wouldn&#8217;t expect any 0&#8242;s since the child HAS TO inherit something from the parent. Except, of course, if the parent does not pass on any genes in that region. That is why, if a region of heavy IBS-0 is seen in a parent-child analysis, it indicates that a hemizygous deletion has been passed on from that parent to the child.</li>
<li><strong>Identification of Relatives/Potential Inbreeding:</strong> 23andMe recently released their relative finder. As a bad scientist and a worse journalist, I have done little investigation into how it works, and even less playing around with it. However, IBS is certainly one way relatives may be identified. Potential inbreeding will not show up as pronounced as relatives because there is likely more distance between people considering marriage than closer relatives, but IBS analysis on a couple who come from the same population produced the following picture:</li>
</ul>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-351" title="Father V Mother" src="http://www.chromosomechronicles.com/wp-content/uploads/2009/10/fathermother2.jpg" alt="Father V Mother" width="410" height="410" /></p>
<p style="text-align: left;">The program which I wrote to create the above output colors the chromosome gray in regions with no SNP information, black in regions with no allele sharing, red in regions where one allele is shared, and green in regions where both alleles are shared. Two completely unrelated individuals would have completely black chromosomes. The picture above is of unrelated individuals with a more recent common ancestor, likely due to the fact that they come from the same population.</p>
<h3 style="text-align: left;">Implementing IBS Analysis: GenomicRelator!</h3>
<p>Again, I got bored around 3 am one night and decided to write a program. Now I&#8217;m going to be giving it away for free, because I&#8217;m cool like that (I have been told to write the following: If you wish this program (or a version of it) for commercial use, you must contact me). Sweet.</p>
<p>Genome Relator is the program I have for you. I decided to not write a GUI, but instead make it double-clickable within a working directory (don&#8217;t worry, I&#8217;ll cover all this). It does two things:</p>
<ol>
<li><strong>General IBS Calls:</strong> It compares any two genome files and it draws pictures of the actual IBS calls on each chromosome (Green/Red/Black).</li>
<li><strong>IBS Data Smoothing:</strong> IBS Calls are not as perfect in real life as they are in the world of published papers (for some reason), but we can look at the frequency of different calls across a fixed number of SNPs and decide whether or not the region contains 0&#8242;s, 1&#8242;s, or 2&#8242;s. It paints a prettier, easier to understand picture this way.</li>
</ol>
<h3>The Program</h3>
<p>Download <a title="GenomicRelator" href="http://bit.ly/2fTolG" target="_blank" onclick="pageTracker._trackPageview('/outgoing/bit.ly/2fTolG?referer=');">here</a>. It&#8217;s a ZIP file with four JAR files inside: GenomeRelator, GenomeRelatorRAW, LaunchGenomeRelator, LaunchGenomeRelatorRAW. To use this program, simply extract the files to a folder, and place both genomes you would like to compare (named file1.txt and file2.txt respectively) into that same folder. Then just double click the proper JAR file.</p>
<p>You will need the most recent Java Runtime Environment on your computer as well. You should always click on the file with Launch in front of the name when using since it helps the computer allocate more memory (otherwise you run out of RAM and nothing happens). Read a little more to find out what the program does, the difference between RAW and normal, and the rules for the input files.</p>
<p style="text-align: left;">GenomeRelatorRAW compares the two files and processes them according to the IBS calls producing the following picture:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-352" title="RAW Siblings" src="http://www.chromosomechronicles.com/wp-content/uploads/2009/10/siblings2.jpg" alt="RAW Siblings" width="410" height="410" /></p>
<p style="text-align: left;">This just presents the data to you in a cool picture without really interpreting it, although you can visually interpret quite a bit from it. This takes much longer to render than the other version of the program since each SNP is actually evaluated and then illustrated (23andMe files might take 20 minutes to process!). I find these useful to compare to the output from the regular version of the program.</p>
<p style="text-align: left;">The regular version, GenomeRelator, produces output that &#8220;smooths&#8221; the data. The following picture comes from the regular (not RAW) version of the program:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-353" title="IBS Siblings" src="http://www.chromosomechronicles.com/wp-content/uploads/2009/10/siblings21.jpg" alt="IBS Siblings" width="410" height="410" /></p>
<p style="text-align: left;">Essentially, it moves throughout the genome with some preset rules: 250 SNPs at a time, the region must have greater than 1% IBS-0 and greater than 5% IBS-1, then it is assumed to have 2,1,0 (no relation). If it has less than 1% IBS-0 and greater than 5% IBS-1, then it is assumed to have 2, 1 (one allele shared). If it has less than 1% IBS-0 and less than 5% IBS-1, then it is assumed to have 2 (both alleles shared). These numbers <strong>250</strong>, <strong>1%</strong>, and <strong>5%</strong>, are the variables that are at play in the program, and hopefully I will come out with a GUI version that allows you to decide what you want to set it to.</p>
<p style="text-align: left;"><strong>The rules for the input files:</strong></p>
<ol>
<li>The input files must be called file1.txt and file2.txt, and they must be placed within the same folder as the program (the jar files).</li>
<li>The input files must have one line of headers (no more no less).</li>
<li>The input files must have the same SNPs. For 23andMe customers, I am not entirely sure, but in my experience, male files have contained four extra SNPs (rs3091244, rs1229984, rs4420638, and rs34276300). Remove any SNPs necessary until both files have only the same ones.</li>
<li>The input files must contain four columns in the following order: rsid, chromosome, position, genotype. The genotype column must be one or two letters (yes, I fixed it so that male X chromosomes can survive when only one letter is there).</li>
</ol>
<h3>Program Output</h3>
<p>The program will not only draw a picture, but if you run the regular version, it will also generate a number of files. A text file will be generated for each chromosome. This is just an intermediary file that the program needs in the middle of the process. A file called &#8220;IBS Summary&#8221; will also be generated for each chromosome. This file contains the list of regions (250 SNPs at a time) by their starting and ending genomic positions (the position in the genotype file). These are useful if you want to more closely examine regions of overlap (or deletion).</p>
<h3>No Hosting On Your Own</h3>
<p>I&#8217;m sorry, but I only want people to be able to download this from me, so just send them a link here if you want to distribute the program to others. It&#8217;s free for &#8220;academic&#8221; use and for &#8220;fun&#8221; use. No making money. Thanks to any who&#8217;ve read and used the program. I&#8217;d appreciate any feedback and please report all errors to me so I can do my best to fix them!</p>
<img src="http://www.chromosomechronicles.com/?ak_action=api_record_view&id=324&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://www.chromosomechronicles.com/2009/10/22/identity-by-state-snp-analysis-find-relatives-test-paternity-and-determine-allele-sharing/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Phasing: Determining Which SNPs are Inherited Together</title>
		<link>http://www.chromosomechronicles.com/2009/09/08/phasing-determining-which-snps-are-inherited-together/</link>
		<comments>http://www.chromosomechronicles.com/2009/09/08/phasing-determining-which-snps-are-inherited-together/#comments</comments>
		<pubDate>Wed, 09 Sep 2009 04:25:08 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Analyze Your Own SNPs]]></category>
		<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[DNA Test]]></category>
		<category><![CDATA[Haplotypes]]></category>
		<category><![CDATA[Phasing]]></category>

		<guid isPermaLink="false">http://www.chromosomechronicles.com/?p=170</guid>
		<description><![CDATA[Biallelic DNA microarray data SNPs, can be made much more useful if it haplotype information is also available. However, the process of phasing biallelic data is not so simple.]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fwww.chromosomechronicles.com%2F2009%2F09%2F08%2Fphasing-determining-which-snps-are-inherited-together%2F" onclick="pageTracker._trackPageview('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fwww.chromosomechronicles.com_2F2009_2F09_2F08_2Fphasing-determining-which-snps-are-inherited-together_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fwww.chromosomechronicles.com%2F2009%2F09%2F08%2Fphasing-determining-which-snps-are-inherited-together%2F&amp;source=chromchron&amp;style=normal&amp;service=bit.ly&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><img class="aligncenter size-medium wp-image-173" title="DNA" src="http://www.chromosomechronicles.com/wp-content/uploads/2009/08/DNA-300x225.jpg" alt="DNA" width="300" height="225" /></p>
<h2><strong>The Phasing Problem</strong></h2>
<p>The human genome contains two copies of every gene&#8230;most of the time. Personal genetics products (23andMe, Navigenics, deCODEme) reflect this by the fact that two alleles (A, T, C or G) are reported at each SNP site (identified by an rs# &#8211;&gt; ex. rs1234). 23andMe customers who have downloaded their raw data file (around 15 megabytes of letters and numbers) will have seen this in their genotype column.</p>
<p>While the microarray platforms employed by direct to consumer genetics companies are capable of determining both variants (one on each chromosome) inherited by an individual at each locus, these <strong>microarray platforms are unable to assign each variant to a specific chromosome</strong>. Let me explain:</p>
<p>Lets assume that four different positions were analyzed on chromosome 1: positions 1-4. For each position, two alleles were determined by a microarray (the technology used by 23andMe and friends). The data would appear as follows:</p>
<p><strong>Position</strong> <strong>Genotype</strong></p>
<p>1                 AG</p>
<p>2                 CT</p>
<p>3                 CT</p>
<p>4                 CC</p>
<p>The problem with this data is that we do not know which nucleotide (A,T,C,G) belongs to which chromosome. One possible arrangement is as follows:</p>
<p><strong>Position</strong> <strong>ChromosomeA</strong> <strong>ChromosomeB</strong></p>
<p>1                      A                      G</p>
<p>2                      T                      C</p>
<p>3                      C                      T</p>
<p>4                      C                      C</p>
<p>However, we do not know if this arrangement is correct since there is no information about whether or not certain alleles are linked together.</p>
<p>The ability to distinguish which alleles belong to which chromosome is important when considering how genes are inherited. Generally, a parent passes one of the two copies of each chromosome on to their offspring. While the two chromosomes might both contribute genetic information via a process called recombination, the genes received by a child are typically &#8220;linked&#8221; and inherited together since they are located on the same chromosome.</p>
<p>To determine which genes of yours are linked together (and therefore likely to be inherited together by your child), it is first necessary to figure out which alleles (indicated by the variant SNPs) exist together on the same chromosome. This process has been termed &#8220;phasing&#8221; in the bioinformatics world.</p>
<h2><strong>Phasing: Simple in Theory, Complex in Practice</strong></h2>
<p>I just lied, phasing is not a simple process, not even in theory. It has its roots in computer science and statistics, and it is typically accomplished by employing <a title="MCMC" href="http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo" target="_blank" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/Markov_chain_Monte_Carlo?referer=');">Markov-chain Monte Carlo</a> algorithms, which I won&#8217;t even attempt to explain here (people with many more letters after their name give courses on this). Phasing is also a very computing intensive process. Moreover, it takes a long time, even with our super, duper, post millenial computers. Current phasing protocols have been developed for the analysis of large population datasets. This is troublesome for the individual consumer since researchers have very little reason to tailor phasing protocols to the needs of an individual. However, <strong>the ability to phase your own genome would provide you with valuable information about which genes of yours are likely to be inherited together in your children.</strong></p>
<p>To phase chromosomes, researchers often rely on family duo and trio data (data from a parent and a child, or both parents and a child) to help get more accurate results. However, phasing can also be accomplished by aggregating the genotype data from unrelated individuals (of the same, or similar, ethnicities for more accuracy). By phasing population data, researchers identify <strong>haplotypes</strong>, which are essentially segments of DNA that are common to a particular ethnic group. A number of freely available (through academic license) programs exist to help you analyze population data to determine phase: <a title="PHASE" href="http://depts.washington.edu/ventures/UW_Technology/Express_Licenses/PHASEv2.php" target="_blank" onclick="pageTracker._trackPageview('/outgoing/depts.washington.edu/ventures/UW_Technology/Express_Licenses/PHASEv2.php?referer=');">PHASE</a>, <a title="fastPHASE" href="http://depts.washington.edu/ventures/UW_Technology/Express_Licenses/fastPHASE.php" target="_blank" onclick="pageTracker._trackPageview('/outgoing/depts.washington.edu/ventures/UW_Technology/Express_Licenses/fastPHASE.php?referer=');">fastPHASE</a>, and <a title="shapeIT" href="http://www.griv.org/shapeit/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.griv.org/shapeit/?referer=');">ShapeIT</a> all accomplish this task (though at varying speeds and with different levels of accuracy).</p>
<p>I am currently interested in &#8220;Personal Genome Phasing&#8221; as a means of modeling inheritance across generations. Not immediately, but in the not too distance future, I expect I will have a post on how one might accomplish the task of phasing your own genome (this will likely require you to have information from your parents/children/siblings as well as your own). After I have properly demonstrated my preferred phasing method, I will move on to SNP imputation!</p>
<img src="http://www.chromosomechronicles.com/?ak_action=api_record_view&id=170&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://www.chromosomechronicles.com/2009/09/08/phasing-determining-which-snps-are-inherited-together/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

