Genetic Identity Theft: Will You Need to Protect Your Genome?

Consumer genetics is here, and it seems unlikely to disappear anytime soon. In fact, the next generation of consumer genetics products may very well be complete genomic sequencing, as promised by Complete Genomics, and even Illumina.

Although we may be decades away from it (or maybe only years), we will one day be confronted by some form of genetic identity theft. What does this mean, and how will this happen? Let me explain.

How can my genes be stolen?

My genes are unique to me, and there is essentially a 0% chance that anyone else in the world has the exact set of As Cs Gs and Ts as I do (with the exception of identical twins). How could someone possibly steal my genetic profile?

In the not so distant future, any cellular sample may be viable for complete DNA sequencing. For instance, after enjoying a nice lunch at the diner, leaving behind just one strand of hair may be enough for a stalker/mad scientist to determine your entire DNA sequence. They would have the entire blueprint of you.

What can be done with stolen genes?

Okay, so someone may steal and sequence my DNA, but what good does that do them?

Right now, nothing. They might learn that you carry a mutated version of the HFE gene and may potentially have a child afflicted with Hemochromatosis. Even worse, they could find out that there is a 70% chance that you are lactose intolerant!

In the future, the chimeric child of DNA sequencing, Stem Cell Research, Developmental Biology, and Cellular Reprogramming would allow for someone to derive stem cells from your one strand of hair. These stem cells might then be transformed into sperm or egg precursors, and  these cells would essentially allow anyone to have a child with you (without you). Think of the market for a service that advertises to prospective mothers: “Have a baby with Brad Pitt!” It’s actually kind of creepy.

It gets worse, and more obsessive. What if said “crazy Brad Pitt fan” decided that  having Brad Pitt’s child was not enough. No. She wanted more…she wants Brad Pitt for a child! Stealing Brad’s DNA, sequencing it, and reprogramming cells with his DNA would allow for the creation of a Brad Pitt embryo. An in vitro fertilization procedure later, and Brad Pitt’s ultimate fan can now also be his mother (to crazy fan: seriously, don’t do this…running a fan website is enough devotion).

Holy $h*t That’s Messed Up! How Can I Prevent This (esp. if I am Mr. Pitt)?

Truthfully, this is not something you have to worry about right now. The technology just is not there yet, although all methods necessary for something like this to happen are either developed or in development.

When we do reach the point where this is a real possibility, I cannot think of any way to stop someone who was committed enough to having your child (or you as their child). Maybe you can hire a personal assistant to walk around and make sure that all strands of hair, saliva and any other DNA containing materials are properly collected and destroyed.

Only with the extreme case of celebrities can I imagine there being a “black market” for stolen DNA, and even then, I doubt demand would be high enough to fuel such an industry. However, I will never say never when predicting the future in this field. To all celebrities out there: Let me know when you get that phone call, “So I’m your mother…sort of.” I give it 5-10 years.

VN:F [1.8.1_1037]
Rating: 8.5/10 (2 votes cast)
VN:F [1.8.1_1037]
Rating: +1 (from 1 vote)

Popularity: 10%

Identity by State SNP Analysis: Find Relatives, Test Paternity, and Determine Allele Sharing

IBS Comparison of Siblings

This has to be the longest title of any post, but it covers the diverse functions of the identity-by-state method of analyzing high-density SNP data.

Elisha Roberson and Jonathan Pevsner published a paper this August titled, “Visualization of Shared Genomic Regions and Meiotic Recombination in High-Density SNP Data.” This paper describes methods and results for comparing the genome between any two individuals in search of shared genomic regions.

While many methods for determining genomic similarity rely on knowledge of family tree structure (identity by descent methods), the method described by Roberson and Pevsner compares SNPs based on identity by state, which allows them to compare any two individuals without knowledge of the family structure.

Genomic Comparison via Identity by State: A Brief Overview

Identity by State examines pairs of SNPs between two individuals and puts them into one of three categories:

1. Identical: Both individuals have the same genotype call (Ex. AA and AA; BB and BB; AB and AB).

2. One-Allele Shared: Only one call is shared between both individuals (Ex. AA and AB; AB and BB).

3. No alleles shared: No alleles are the same (Ex. AA and BB).

For individual SNPs, this type of analysis really does not provide any extra information. The real advantage is gained when high-density SNP information is taken for the whole genome (23andMe customers have this, and many other Illumina and Affymetrix platforms cover the whole genome).

In the paper, identical Identity by State (IBS) is referred to as situation 2; one-allele shared is situation 1; and no alleles shared is situation 0. For each chromosome, these three different “situations” can be plotted along an ideogram separated based on IBS. Here are a few images from the paper.

Father V Mother

As we can see, when we plot the various IBS calls across chromosome 10 between two parents (unrelated individuals – as far as we know), there are tons of 2’s and 1’s, and even large numbers of 0’s. The data does not seem to have any rhyme to it. In fact, a distribution of 1’s 2’s and 0’s like this is evidence that the two are very unrelated. I can only see one small region that seems to be devoid of 0’s (arrow), which signifies something special!

Whenever there is a consistent lack of 0’s (a lack of unshared SNPs), it is evidence that the two individuals share actual haplotypes. In the example above, the region without the 0’s does contain a good number of 1’s and 2’s. This indicates that these two individuals share one haplotype iIBS Chartn that region. If there were no 1’s in that region, that would be evidence that the individuals shared both haplotypes. The chart to the left summarizes how to interpret what is shared given the IBS calls in a region.

Roberson and Pevsner continue to show what type of IBS call distributions are made by various combinations of relations. For example:

Mother V. Son

Mother V Son

Siblings

Siblings

The results from comparing a mother to her child show that only 2’s and 1’s are expected to be visible (the occasional 0 might slip in due to genotyping error, or more rarely, mutation). This is consistent with the fact that the child receives one of his/her chromosomes from the mother.

The situation when the siblings are compared is a bit more complicated, but also more informative! We can see regions where 1’s, 2’s, and 0’s are all present, which indicates that the children inherited completely different alleles from the parents. Regions where only 1’s and 2’s are present indicate that the same allele was inherited from one parent, but a different allele from another. Finally, regions where only 2’s are present indicate that the children inherited the same allele from each parent in that region. Transitions from regions with 2,1,0 to 2,1 to 1 and vice versa indicate a meiotic recombination event had occurred. Very cool!

Exploring What We Can Learn with IBS Analysis

It has been shown pretty well that IBS analysis can be used to look at recombination events and allele sharing between siblings. This is extremely useful. However, IBS Analysis can apply towards a few other items:

  • Examination of Hemizygous Deletions: Normally, parent-child comparisons only present with regions of IBS-2 and 1. We wouldn’t expect any 0’s since the child HAS TO inherit something from the parent. Except, of course, if the parent does not pass on any genes in that region. That is why, if a region of heavy IBS-0 is seen in a parent-child analysis, it indicates that a hemizygous deletion has been passed on from that parent to the child.
  • Identification of Relatives/Potential Inbreeding: 23andMe recently released their relative finder. As a bad scientist and a worse journalist, I have done little investigation into how it works, and even less playing around with it. However, IBS is certainly one way relatives may be identified. Potential inbreeding will not show up as pronounced as relatives because there is likely more distance between people considering marriage than closer relatives, but IBS analysis on a couple who come from the same population produced the following picture:

Father V Mother

The program which I wrote to create the above output colors the chromosome gray in regions with no SNP information, black in regions with no allele sharing, red in regions where one allele is shared, and green in regions where both alleles are shared. Two completely unrelated individuals would have completely black chromosomes. The picture above is of unrelated individuals with a more recent common ancestor, likely due to the fact that they come from the same population.

Implementing IBS Analysis: GenomicRelator!

Again, I got bored around 3 am one night and decided to write a program. Now I’m going to be giving it away for free, because I’m cool like that (I have been told to write the following: If you wish this program (or a version of it) for commercial use, you must contact me). Sweet.

Genome Relator is the program I have for you. I decided to not write a GUI, but instead make it double-clickable within a working directory (don’t worry, I’ll cover all this). It does two things:

  1. General IBS Calls: It compares any two genome files and it draws pictures of the actual IBS calls on each chromosome (Green/Red/Black).
  2. IBS Data Smoothing: IBS Calls are not as perfect in real life as they are in the world of published papers (for some reason), but we can look at the frequency of different calls across a fixed number of SNPs and decide whether or not the region contains 0’s, 1’s, or 2’s. It paints a prettier, easier to understand picture this way.

The Program

Download here. It’s a ZIP file with four JAR files inside: GenomeRelator, GenomeRelatorRAW, LaunchGenomeRelator, LaunchGenomeRelatorRAW. To use this program, simply extract the files to a folder, and place both genomes you would like to compare (named file1.txt and file2.txt respectively) into that same folder. Then just double click the proper JAR file.

You will need the most recent Java Runtime Environment on your computer as well. You should always click on the file with Launch in front of the name when using since it helps the computer allocate more memory (otherwise you run out of RAM and nothing happens). Read a little more to find out what the program does, the difference between RAW and normal, and the rules for the input files.

GenomeRelatorRAW compares the two files and processes them according to the IBS calls producing the following picture:

RAW Siblings

This just presents the data to you in a cool picture without really interpreting it, although you can visually interpret quite a bit from it. This takes much longer to render than the other version of the program since each SNP is actually evaluated and then illustrated (23andMe files might take 20 minutes to process!). I find these useful to compare to the output from the regular version of the program.

The regular version, GenomeRelator, produces output that “smooths” the data. The following picture comes from the regular (not RAW) version of the program:

IBS Siblings

Essentially, it moves throughout the genome with some preset rules: 250 SNPs at a time, the region must have greater than 1% IBS-0 and greater than 5% IBS-1, then it is assumed to have 2,1,0 (no relation). If it has less than 1% IBS-0 and greater than 5% IBS-1, then it is assumed to have 2, 1 (one allele shared). If it has less than 1% IBS-0 and greater than 5% IBS-1, then it is assumed to have 2 (both alleles shared). These numbers 250, 1%, and 5%, are the variables that are at play in the program, and hopefully I will come out with a GUI version that allows you to decide what you want to set it to.

The rules for the input files:

  1. The input files must be called file1.txt and file2.txt, and they must be placed within the same folder as the program (the jar files).
  2. The input files must have one line of headers (no more no less).
  3. The input files must have the same SNPs. For 23andMe customers, I am not entirely sure, but in my experience, male files have contained four extra SNPs (rs3091244, rs1229984, rs4420638, and rs34276300). Remove any SNPs necessary until both files have only the same ones.
  4. The input files must contain four columns in the following order: rsid, chromosome, position, genotype. The genotype column must be one or two letters (yes, I fixed it so that male X chromosomes can survive when only one letter is there).

Program Output

The program will not only draw a picture, but if you run the regular version, it will also generate a number of files. A text file will be generated for each chromosome. This is just an intermediary file that the program needs in the middle of the process. A file called “IBS Summary” will also be generated for each chromosome. This file contains the list of regions (250 SNPs at a time) by their starting and ending genomic positions (the position in the genotype file). These are useful if you want to more closely examine regions of overlap (or deletion).

No Hosting On Your Own

I’m sorry, but I only want people to be able to download this from me, so just send them a link here if you want to distribute the program to others. It’s free for “academic” use and for “fun” use. No making money. Thanks to any who’ve read and used the program. I’d appreciate any feedback and please report all errors to me so I can do my best to fix them!

VN:F [1.8.1_1037]
Rating: 9.8/10 (5 votes cast)
VN:F [1.8.1_1037]
Rating: +3 (from 3 votes)

Popularity: 45%

Genetic Engineering: Reanimate the Dead and Bring Fantasy to Life

In 1993, Steven Spielberg captured audiences with his prehistoric thriller, “Jurassic Park.” Using fossilized DNA, scientists were able to bring back to life the once extinct dinosaurs. The amazing thing about this movie is that the scientific achievement that was a prerequisite to reanimating extinct creatures was not realized until a full three years later.

The Advent of Cloning

In July of 1996, Ian Wilmut and colleagues atdolly the Roslin Institue in Scotland successfully cloned Dolly the Sheep, by reprogramming an adult mammary gland cell into an embryo through a process known as somatic cell nuclear transfer. The creation of an organism from an adult cell opened the door for a whole new variety of techniques and processes that have become essential to modern biology and genetics.

Dolly proved to the scientific community that a complete organism can be created from the genetic material obtained from ANYWHERE in the organism’s body. The basic method for “cloning” can be broken into three steps: 1. Obtain DNA that specifies the organism you wish to clone. 2. Transfer DNA into enucleated oocyte for reprogramming. 3. Transfer reprogrammed cell into surrogate mother for implantation and fetal development.

Bring Back the Dead: Consumer Cloning

What did I just say?! It’s not like you think, I swear. You cannot bring back your lost relatives. Even if you clone your dead loved ones, there is no way (currently) to recreate the memories and experiences that will have shaped the person that you once knew. So, even though the clone will look exactly alike, they are not the same person as your loved one. Also, when a cloning takes place, the person is “born” just like any other baby, and they must grow and mature just like any human being. There is currently no way of speeding up the process of growing up.

Is cloning publicly available? Yes. Not for humans. Congress, republicans and democrats alike, would have a field day with that one. However, commercial cloning is available for man’s best friend. Up until September of 2009, BioArts International offered commercial cloning of a pet cat or dog to niche consumers for the lofty price of $150,000. However, they recently closed their doors because of, among many reasons, competition from RNL Bio, a South Korean Stem Cell Company which has begun to offer the same service.

As explained by Lou Hawthorne, the CEO of BioArts International, the market size for cloned pets does not seem to be that large right now. However, it is still a developing technology, and one day it may be offered at the right price with the right market exposure. BioArts International, successfully cloned seven dogs for consumers throughout its product offering. The take home message: cloning is commercially available, although not popularized just yet.

Bring Back the Long Dead: Extinct Species

Jurassic Park was a visionary movie because it described a scientific process that was not yet possible at its time. However, we are closer now than ever to being able to bring back dead species. Let’s look at the Wooly Mammoth as a case study.

Wooly MammothThe Wooly Mammoth, also known as the tundra mammoth, is suspected to have vanished around 8,000 BCE likely due to the warming of their climate. Unlike many extinct species, the wooly mammoth remains have, in many cases, been organically preserved due to their frozen environment and the large size of the animal. Organic preservation has allowed scientists to study much of the mammoth DNA, and leads many to claim that cloning of the mammoth will one day be possible. Despite the preservation of dead mammoth, extracting the DNA and rebuilding the genome is an ongoing process.

How will this cloning occur? Scientists hope that they will be able to salvage whole cells of preserved mammoth DNA from frozen mammoth cells. While more recent attempts at this have not yielded completely salvageable genomes, many parts of the mammoth genome including the complete sequence of a mitochondrial DNA have been determined. From this information alone, it has been concluded that the wooly mammoth is more closely related to the Asian elephant than it is to the African elephant.

Who will be the surrogate mother of the wooly mammoth? The Asian elephant of course. Although the two species diverged several thousand years ago, it is suspected that the Wooly Mammoth and the Asian elephant are still genetically similar enough such that one can carry the offspring of another. Scientists at Penn State University believe they have mapped more than 50% of the mammoth genome. Once this is complete, reprogramming and surrogacy will likely allow for cloning.

When will we see this creature roaming the planet? Pretty soon I hope. We’ll be able to pay admission at Jurassic Park, which will likely be located on a remote island in Japan. Just remember to bring your shotgun!

Fantasy Becomes Reality: Creating Unicorns

Genetic engineering will not be limited to the cloning of dead dogs and the rebirth of extinct species. By attaining an understanding of the development of all species, one day, the creation of new species may be possible. Note: If you believe this counts as “playing God” then you should not be reading this blog.

The unicorn will be our case study here. What is a unicorn? Essentially, it is a white horse with a horn on its head. According to a survey of five year old girls, some unicorns have wings, and some have magical rainbows follow them. We’ll stick with a white horn for now.

We already know where to find white horses, but where do we find a horn? Thousands of creatures have horns: deer, antelope, some lizards. After doing some research, I have decided that the horn that best fits the description of the unicorn horn is the horn of a Narwhal. From here, we get the following equation:

White Horse + Narwhal = Unicorn

Unicorn Creation

Okay, so how do we make this equation into reality? One answer: Get yourself a white horse, capture a narwhal, cut off the narwhal’s horn, and glue it onto the horse’s head! Simple. But not what we’re looking for.

Figuring out how to make a unicorn species will be difficult. It will require knowing exactly what set of genes contribute to the development of the narwhal horn, and exactly where these genes would be able to create a horn (in the proper location) for a unicorn. There will not simply be a copy and paste ability, but eventually through experimentation and trial and error, I believe that a unicorn species can be created which develops a horn on its own. Any horse can be the host species (for surrogacy) since the unicorn will be related enough genetically. This technology is a bit further in the future, but I believe it will be a possibility.

Once someone solves the unicorn, we can move onto other legendary creatures like cerberus, the sphinx, and all manners of chimera. Imagine a chihuahua with wings! It will be studies in genomics, cellular reprogramming, and developmental biology that will unlock pandora’s box and enable legendary creatures to be born.

VN:F [1.8.1_1037]
Rating: 8.7/10 (3 votes cast)
VN:F [1.8.1_1037]
Rating: +1 (from 1 vote)

Popularity: 52%

Use Family SNP Data to Phase Your Own Genome

DNAStairs

Photo by: liber

So I’ve already written a post about the challenges of phasing genotype data, but now I’m here to help you accomplish that task. Let’s go through a checklist of what will be needed:

  • Your personal SNP information (through either 23andMe, Navigenics, deCODEme, etc.)
  • The SNP information for your parents (preferably through the same company/microarray platform)

If you want to use my specific method you also need:

Of course, you can implement your own version of this phasing protocol with basic familiarity in a programming language or (more tediously) with some macros and if statements in Microsoft Excel.

How to Phase Your Genome: A Conceptual Overview

Phased v UnphasedWith information from both parents, it is possible to phase your genome (for the vast majority of SNP calls). We rely on the fact that for most situations, you can identify exactly what was inherited from your father and exactly what was inherited from your mother.

For example: If at a particular position, your genotype call is AT, your father’s genotype call is AA, and your mother’s genotype call is TT, then you know that the A must have come from your father, and the T must have come from your mother. Simple! We will refer to situations where phase can be determined as informative.

The chart to the right outlines exactly Informative SNPs Chartwhich situations are informative. The good news is that every situation is informative with the exception of one: when both parents and the child are heterozygous. Here, we are unable to say for certain what allele was inherited from each parent.

A sample implementation of how to phase a child’s DNA is illustrated below:

Phasing Data

Implementing this Phasing Strategy: I’m Here to Help

If you have all the files mentioned above and would like to phase your genome, then I am more than happy to provide you with a Java archive that will allow you to accomplish this task. Even more, I will provide detailed instructions as to how to use this archive (it’s really simple, I swear).

You can download the Java program here as a zip file. Once finished, unzip the contents into the same fold. You should see Launcher.jar and PhaseME.jar. To launch the program, click Launcher.jar (I thought that was pretty obvious), and the GUI pictured below should appear:

PhaseME

The input files need to contain four columns in this order: rsid, chromosome, position, genotype. The genotype data needs to simply be AA,  AT, TT, etc. without any slashes or quotation marks.

Before running the program, you do need to make sure that your data is the same length and contains the same SNPs between both parents and the child. I have not incorporated any checks into the program for this. My recommendation is to use an IF statement in Microsoft Excel (version 2007) to make sure that all three files line up. Also, make sure that only one row of headers exists in the files.

Finally, select the files to be compared (father, mother, child), and select an output location and choose a name for the outputs. There will be 23 outputs with the following filenames: <yourchosenname>.chr<chromosome>.phased.txt. Each chromosome has its own output that shows you the haplotype inherited from the father and the haplotype inherited from the mother. This program just ignores Y and MT data. However, the program does have the ability to recognize whether the child is male or female, and it assigns the X chromosome haplotypes accordingly.

Let me know about any problems with the program (ex. If it does not produce any output), and I will check to see (1. If your input files are the problem, 2. If the program is the problem).

VN:F [1.8.1_1037]
Rating: 6.3/10 (3 votes cast)
VN:F [1.8.1_1037]
Rating: +2 (from 2 votes)

Popularity: 58%

Gattaca: Why fears about Genetically Superior Babies are Unfounded

Gattaca

In 1997, the movie Gattaca made a splash due to its extension of themes covered in “Brave New World.” It takes place in the future where most children are born through a process (not genetic engineering in the scientific sense) where the optimal combination of genes between two individuals is selected in an embryo to give birth to children that are smarter, better looking, and more athletic than the natural, or “faith born” children. The “genetically superior” are put above the faith born, and a class system emerges based on who has better genes.

Personally, I find this movie fascinating for a number of reasons. First of all, the science they describe is not very far outside the realm of current possibilities. Preimplantation Genetic Diagnosis (PGD), the process where embryos are diagnosed for chromosomal abnormalities and certain diseases, is a common practice for couples undergoing in vitro fertilization (IVF). Secondly, the movie highlights the fact that PGD is a technology that couples will want to use. Given the option, who wouldn’t try to avoid having their child carry a negative version of the ApoE gene (increasing the risk of Alzheimer’s), the BRCA gene (increasing the risk for breast cancer), or the multitude of other simple genetic diseases that are out there (cystic fibrosis, tay sachs, hemochromatosis, sickle-cell anemia, cooley’s anemia, and the list goes on).

So now we are faced with the question: could the situation described in Gattaca, where a higher class of the genetically superior exists above a lower class of the “faith born,” ever become reality? My answer is NO for a number of reasons.

First of all, the number of embryos that can be diagnosed in one round of PGD is limited by the number of eggs a woman can produce. For me, this is the most convincing argument as to why “genetically optimized” babies will never be a realistic phenomenon. Let’s look at gene selection through PGD. If we know that the mother carries a negative version of the BRCA gene (let’s assume the father does not), then there is a 50% chance (basic Mendelian inheritance) that any embryo created by these two individuals will carry a negative BRCA gene.

Okay, so given two embryos, we would expect there to be one embryo that met the requirements of not carrying a negative BRCA gene. That’s fine. Now let’s add another gene to the list. ApoE: the father carries one copy of the negative ApoE allele. Again, 50% of the embryos will carry a negative ApoE gene. But wait: we’re faced with a problem now. By combining the requirement that the embryo we want should not carry a negative BRCA nor a negative ApoE gene, we are reducing the number of embryos that fit the bill to 25%. That means that out of four embryos, statistically we would only expect one of them to be clear of both negative ApoE and BRCA genes.

Now we can see the limitations, selecting for one gene dictates that we need two embryos to choose from. Selecting for two genes dictates that we need 2² = 4 embryos to choose from, and selecting for 3 genes will require that we have at least 2³ = 8 embryos to choose from.

Due to the fact that there is an exponential increase in the number of embryos required in selecting one that has the “preferred” version of each gene, PGD is actually limited to the number of embryos that can be produced. Currently, when a couple goes in for IVF, the woman must undergo hormone therapy to stimulate the production of more eggs. According to one source, on average 5-15 eggs are retrieved with on cycle of treatment from a reproductive endocrinologist. With a limit of up to 15 embryos, it is unlikely that more than four genes can be selected together to produce a “genetically perfect” baby. If you wanted to control the inheritance of a mere 15 genetic loci, the odds of you finding an embryo that matched your requirements would be 1/(2^15), or 0.31%.

In “Gattaca”, the reproductive technologist informed the couple that all genes were optimized. There are thousands upon thousands of genes in the human genome. You have a better chance of winning the lottery than finding one out of 15 embryos that contains every preferred version of every gene from each parent. Rest assured ladies and gentlemen, you will not be seeing any perfect children for quite a while (except for your own of course).

VN:F [1.8.1_1037]
Rating: 5.5/10 (2 votes cast)
VN:F [1.8.1_1037]
Rating: 0 (from 0 votes)

Popularity: 28%

RSS for Posts RSS for Comments