
This has to be the longest title of any post, but it covers the diverse functions of the identity-by-state method of analyzing high-density SNP data.
Elisha Roberson and Jonathan Pevsner published a paper this August titled, “Visualization of Shared Genomic Regions and Meiotic Recombination in High-Density SNP Data.” This paper describes methods and results for comparing the genome between any two individuals in search of shared genomic regions.
While many methods for determining genomic similarity rely on knowledge of family tree structure (identity by descent methods), the method described by Roberson and Pevsner compares SNPs based on identity by state, which allows them to compare any two individuals without knowledge of the family structure.
Genomic Comparison via Identity by State: A Brief Overview
Identity by State examines pairs of SNPs between two individuals and puts them into one of three categories:
1. Identical: Both individuals have the same genotype call (Ex. AA and AA; BB and BB; AB and AB).
2. One-Allele Shared: Only one call is shared between both individuals (Ex. AA and AB; AB and BB).
3. No alleles shared: No alleles are the same (Ex. AA and BB).
For individual SNPs, this type of analysis really does not provide any extra information. The real advantage is gained when high-density SNP information is taken for the whole genome (23andMe customers have this, and many other Illumina and Affymetrix platforms cover the whole genome).
In the paper, identical Identity by State (IBS) is referred to as situation 2; one-allele shared is situation 1; and no alleles shared is situation 0. For each chromosome, these three different “situations” can be plotted along an ideogram separated based on IBS. Here are a few images from the paper.

As we can see, when we plot the various IBS calls across chromosome 10 between two parents (unrelated individuals – as far as we know), there are tons of 2’s and 1’s, and even large numbers of 0’s. The data does not seem to have any rhyme to it. In fact, a distribution of 1’s 2’s and 0’s like this is evidence that the two are very unrelated. I can only see one small region that seems to be devoid of 0’s (arrow), which signifies something special!
Whenever there is a consistent lack of 0’s (a lack of unshared SNPs), it is evidence that the two individuals share actual haplotypes. In the example above, the region without the 0’s does contain a good number of 1’s and 2’s. This indicates that these two individuals share one haplotype i
n that region. If there were no 1’s in that region, that would be evidence that the individuals shared both haplotypes. The chart to the left summarizes how to interpret what is shared given the IBS calls in a region.
Roberson and Pevsner continue to show what type of IBS call distributions are made by various combinations of relations. For example:
Mother V. Son

Siblings

The results from comparing a mother to her child show that only 2’s and 1’s are expected to be visible (the occasional 0 might slip in due to genotyping error, or more rarely, mutation). This is consistent with the fact that the child receives one of his/her chromosomes from the mother.
The situation when the siblings are compared is a bit more complicated, but also more informative! We can see regions where 1’s, 2’s, and 0’s are all present, which indicates that the children inherited completely different alleles from the parents. Regions where only 1’s and 2’s are present indicate that the same allele was inherited from one parent, but a different allele from another. Finally, regions where only 2’s are present indicate that the children inherited the same allele from each parent in that region. Transitions from regions with 2,1,0 to 2,1 to 1 and vice versa indicate a meiotic recombination event had occurred. Very cool!
Exploring What We Can Learn with IBS Analysis
It has been shown pretty well that IBS analysis can be used to look at recombination events and allele sharing between siblings. This is extremely useful. However, IBS Analysis can apply towards a few other items:
- Examination of Hemizygous Deletions: Normally, parent-child comparisons only present with regions of IBS-2 and 1. We wouldn’t expect any 0’s since the child HAS TO inherit something from the parent. Except, of course, if the parent does not pass on any genes in that region. That is why, if a region of heavy IBS-0 is seen in a parent-child analysis, it indicates that a hemizygous deletion has been passed on from that parent to the child.
- Identification of Relatives/Potential Inbreeding: 23andMe recently released their relative finder. As a bad scientist and a worse journalist, I have done little investigation into how it works, and even less playing around with it. However, IBS is certainly one way relatives may be identified. Potential inbreeding will not show up as pronounced as relatives because there is likely more distance between people considering marriage than closer relatives, but IBS analysis on a couple who come from the same population produced the following picture:

The program which I wrote to create the above output colors the chromosome gray in regions with no SNP information, black in regions with no allele sharing, red in regions where one allele is shared, and green in regions where both alleles are shared. Two completely unrelated individuals would have completely black chromosomes. The picture above is of unrelated individuals with a more recent common ancestor, likely due to the fact that they come from the same population.
Implementing IBS Analysis: GenomicRelator!
Again, I got bored around 3 am one night and decided to write a program. Now I’m going to be giving it away for free, because I’m cool like that (I have been told to write the following: If you wish this program (or a version of it) for commercial use, you must contact me). Sweet.
Genome Relator is the program I have for you. I decided to not write a GUI, but instead make it double-clickable within a working directory (don’t worry, I’ll cover all this). It does two things:
- General IBS Calls: It compares any two genome files and it draws pictures of the actual IBS calls on each chromosome (Green/Red/Black).
- IBS Data Smoothing: IBS Calls are not as perfect in real life as they are in the world of published papers (for some reason), but we can look at the frequency of different calls across a fixed number of SNPs and decide whether or not the region contains 0’s, 1’s, or 2’s. It paints a prettier, easier to understand picture this way.
The Program
Download here. It’s a ZIP file with four JAR files inside: GenomeRelator, GenomeRelatorRAW, LaunchGenomeRelator, LaunchGenomeRelatorRAW. To use this program, simply extract the files to a folder, and place both genomes you would like to compare (named file1.txt and file2.txt respectively) into that same folder. Then just double click the proper JAR file.
You will need the most recent Java Runtime Environment on your computer as well. You should always click on the file with Launch in front of the name when using since it helps the computer allocate more memory (otherwise you run out of RAM and nothing happens). Read a little more to find out what the program does, the difference between RAW and normal, and the rules for the input files.
GenomeRelatorRAW compares the two files and processes them according to the IBS calls producing the following picture:

This just presents the data to you in a cool picture without really interpreting it, although you can visually interpret quite a bit from it. This takes much longer to render than the other version of the program since each SNP is actually evaluated and then illustrated (23andMe files might take 20 minutes to process!). I find these useful to compare to the output from the regular version of the program.
The regular version, GenomeRelator, produces output that “smooths” the data. The following picture comes from the regular (not RAW) version of the program:

Essentially, it moves throughout the genome with some preset rules: 250 SNPs at a time, the region must have greater than 1% IBS-0 and greater than 5% IBS-1, then it is assumed to have 2,1,0 (no relation). If it has less than 1% IBS-0 and greater than 5% IBS-1, then it is assumed to have 2, 1 (one allele shared). If it has less than 1% IBS-0 and greater than 5% IBS-1, then it is assumed to have 2 (both alleles shared). These numbers 250, 1%, and 5%, are the variables that are at play in the program, and hopefully I will come out with a GUI version that allows you to decide what you want to set it to.
The rules for the input files:
- The input files must be called file1.txt and file2.txt, and they must be placed within the same folder as the program (the jar files).
- The input files must have one line of headers (no more no less).
- The input files must have the same SNPs. For 23andMe customers, I am not entirely sure, but in my experience, male files have contained four extra SNPs (rs3091244, rs1229984, rs4420638, and rs34276300). Remove any SNPs necessary until both files have only the same ones.
- The input files must contain four columns in the following order: rsid, chromosome, position, genotype. The genotype column must be one or two letters (yes, I fixed it so that male X chromosomes can survive when only one letter is there).
Program Output
The program will not only draw a picture, but if you run the regular version, it will also generate a number of files. A text file will be generated for each chromosome. This is just an intermediary file that the program needs in the middle of the process. A file called “IBS Summary” will also be generated for each chromosome. This file contains the list of regions (250 SNPs at a time) by their starting and ending genomic positions (the position in the genotype file). These are useful if you want to more closely examine regions of overlap (or deletion).
No Hosting On Your Own
I’m sorry, but I only want people to be able to download this from me, so just send them a link here if you want to distribute the program to others. It’s free for “academic” use and for “fun” use. No making money. Thanks to any who’ve read and used the program. I’d appreciate any feedback and please report all errors to me so I can do my best to fix them!
VN:F [1.8.1_1037]
Rating: 9.8/10 (5 votes cast)
VN:F [1.8.1_1037]
Rating: +3 (from 3 votes)
Popularity: 45%