Sequence Variants and the Genomic Databases: Standardizing the Nomenclature

For those interested in clinical diagnostics of genetic diseases, the ability to use the molecular information presented within our various genomic databases is somewhat limited. If you attempt to locate the common p.Q12X mutation within the NCBI Reference Sequence of the AMPD1 gene (which causes Adenosine Monophosphate Deaminase Deficiency), you will find that the 12th codon does not correspond to Glutamic Acid (the mutation, according to the current RefSeq gene (NM_000036.2), is actually p.Q45X) . For researchers and clinicians looking for SNP primers or probes for specific mutations, the lack of fidelity and congruency between reported mutations and the “curated” databases is more than a headache. I have personally spent hours chasing down one mutation only to remain uncertain as to whether or not I had located the proper nucleotide in the end.

What is the issue? Did the original researchers not properly locate the mutation? Is the Reference Sequence incorrect? What can be done to increase the ease with which we can not only locate a specific phenotype-associated variant, but determine the nucleotides immediately surrounding that locus as well?

Understanding the Genomic Databases

The National Center for Biotechnology Information (NCBI) provides an excellent summary of the differences between its two major databases: RefSeq and GenBank. The major distinction for those interested in clinical genetics is the fact that RefSeq is curated and GenBank is not. GenBank is essentially a public repository of all DNA sequences made available by researchers who are responsible for updating and maintaining their own submitted sequences. RefSeq is the database that attempts to provide one sequences as the “Reference Sequence,” usually determined by looking at the most common variants among the GenBank entries. RefSeq sequences are occasionally updated which is why it is always important to include the accession number of the gene when using the Reference Sequence.

RefSeq has greater utility because it provides linked records between the genomic DNA, the mRNA transcript (and cDNA record), and the translated protein. It would seem that locating a non-synonymous mutation should take just a few clicks.

An Arcane Nomenclature System

Okay, I admit, arcane is a bit harsh, but it is time to update our nomenclature system. The Human Genome Variation Society (HGVS) provides the guidelines that are currently considered the best practice in naming sequence variants. They rely on coding DNA (cDNA) and essentially number the cDNA beginning with 1 for the first Adenine in the initiation (ATG) codon. There are obvious drawbacks to using cDNA, the most important being lack of numerical assignments for intronic nucleotides.

More and more intronic mutations are being found to account for deleterious phenotypes. Yet, the original (deprecated) system for naming intron variants looks something like this: IVS3+22G>A (interpretation: the G 22 basepairs from the beginning of intron 3 is changed to an A). There always seems to be some ambiguity when I come across mutations described this way: we can also write IVS3-98G>A (the G 98 basepairs from the end of intron 3 is changed to an A).

More recently, it has been recommend to describe intronic mutations according to the closest possible cDNA location. Our mutation above might now be described as c.195+22G>A (cDNA nucleotide 195 is the last nucleotide found on exon 3 (which immediately precedes intron 3). Or, we might write it as c.196-98G>A.

Let’s be practical for a second. This is terrible indexing. If I want to find an intronic mutation, it will be indexed according to the cDNA record of the relevant gene. Fine, but if I search the cDNA for nucleotide 195, the intron is spliced out of the transcript, so I cannot locate the sequence of interest there. Instead, I have to take the last few base pairs from exon 3 of the cDNA transcript, open up the gDNA (genomic DNA) transcript, and search for these base pairs. Then, once I have discovered where this exon ends in the genomic DNA, I must count 22 base pairs forward to locate the nucleotide of interest (this will all be in vain if the mutation was reported incorrectly in the literature).

The lesson here: although the RefSeq genomic DNA is linked to the cDNA and the protein records, the actual nucleotides/codons/amino acids are not indexed together.

A Standardized Index: Making Sure Mutation Nomenclature is Static

The cDNA naming convention grew out of the fact that the ability to completely sequence genomic DNA for the major part of the 20th century was very limited. It made sense for mutations to be reported based on the cDNA because it reflected the two more easily obtainable (and more static) records of that time: the mRNA transcript and the protein sequence. However, now that the Human Genome Project has paved the way for easier sequencing, and we are constantly improving a standard Reference Sequence, it makes much more sense to name mutations in terms of their genomic DNA (gDNA) location.

As reported by Flicek et al., the Locus Reference Genomic (LRG) sequence format is being developed specifically for the purpose of accurately indexing genomic variants. The LRG will provide a static record for every gene of interest, and this record WILL NOT CHANGE. It may be annotated, etc., but in the interest of preserving reported mutations, the backbone sequence (and subsequent base pair numbers will not change).

I have been using the Reference Sequence in a similar function whenever I locate a mutation for testing purposes. For example, I was searching for the p.G380R mutation in the FGFR3 gene which is a common cause of Achondroplasia. The cDNA variant I was interested in was c.G1138A. After searching for the mutation within the gene transcript, I located the surrounding base pairs, and determined that the genomic name for the mutation is g.G10458A. In my system, I start with the Adenine of the initiation codon as 1 for the gDNA as well as the cDNA. Thus, the gDNA and cDNA index numbers diverge once the first intron begins. For point mutations, I have written a script that automatically accomplishes this by interacting with the UCSC genome browser. An indexing revelation is what helps this script to work.

Developing a Functional, Interactive Reference Assembly

Although the data files are linked between RefSeq gDNA, cDNA and amino acid sequences, the files are not indexed to the extent that it is entirely useful. Ultimately, I would like to be able to browse a cDNA/gDNA sequence, perform a base-pair change, determine if this change is synonymous or non-synonymous, and find out if the change has been associated with any disease phenotypes (which would entail linking individual nucleotides to OMIM records). The development of such a functional variation browser will require a lot of forethought, smart programming, and a great deal of curation. However, I believe that development of the Locus Reference Genomic (LRG) sequence is a step in the right direction.


Dalgleish R, Flicek P, Cunningham F, Astashyn A, Tully RE, Proctor G, Chen Y, McLaren WM, Larsson P, Vaughan BW, Béroud C, Dobson G, Lehväslaiho H, Taschner PE, den Dunnen JT, Devereau A, Birney E, Brookes AJ, & Maglott DR (2010). Locus Reference Genomic sequences: an improved basis for describing human DNA variants. Genome medicine, 2 (4) PMID: 20398331


33 thoughts on “Sequence Variants and the Genomic Databases: Standardizing the Nomenclature

  1. Greetings from Carolina! I’m bored at work so I decided to check out your site on my iphone during lunch break. I enjoy the information you present here and can’t wait to take a look when I
    get home. I’m amazed at how quick your blog loaded on my mobile .. I’m not even
    using WIFI, just 3G .. Anyways, very good site!

  2. I personally Think that blog, “Sequence Variants and the Genomic
    Databases: Standardizing the Nomenclature | The Chromosome Chronicles” was indeed
    correctly written! Icouldn’t agree with u more! Finally seems like Iidentified a web site very well worth reading through. Regards, Lynette withprimeawnings.com/outdoor-awnings

  3. Hey there! I’m at work surfing around your blog from my new iphone 4! Just wanted to say I love reading through your blog and look forward to all your posts! Keep up the great work!

  4. Your style is very unique in comparison to other folks I’ve read stuff from. I appreciate you for posting when you have the opportunity, Guess I’ll just bookmark this site.

  5. Howdy! Quick question that’s entirely off topic. Do you know how to make your site mobile friendly? My blog looks weird when browsing from my iphone4. I’m trying to find a template or plugin that might be able to resolve this issue.
    If you have any recommendations, please share.

    Many thanks!

  6. My spouse and I stumbled over here from a different page and thought I might as well check things out.

    I like what I see so now i’m following you. Look forward to looking over your web page for a second time.

  7. This is really interesting, You’re a very skilled blogger. I’ve joined your feed and look forward to seeking more
    of your magnificent post. Also, I’ve shared your website in my social networks!

  8. Hi there to every single one, it’s genuinely a fastidious for me to pay a visit this site, it contains helpful Information.

  9. Hey! I could have sworn I’ve been to this blog before but after browsing through some of the post I realized it’s new to me.
    Anyways, I’m definitely glad I found it and I’ll be bookmarking and checking
    back frequently!

  10. Oh my goodness! Impressive article dude! Thanks, However I am experiencing troubles with
    your RSS. I don’t understand why I cannot subscribe to it. Is there anybody getting the same RSS problems? Anyone who knows the solution will you kindly respond? Thanks!!

  11. Icariin works by increasing levels of nitric oxide (NO) in the body, which relaxes smooth muscle, and ultimately increases blood flow to the penis.
    Vimax is considered to be in number three since it has accounted side-effects.
    When it comes to buying male organ enlargement pills, it is vital to
    learn whether a thing operates or not before you think of utilizing the supplements.

  12. Make sure the things in your home are insured before you plan any repairs.
    Surprisingly, you can actually save money from hiring contractors as you are ensured that
    the roof is professionally taken care of. The agreement
    should specify whether the contractor is paid on an hourly or
    time and materials basis, or is paid a project fee based on deliverables.

  13. For example, what more credibility could come from
    a person trying to sell you a tennis racket as someone who has won the Wimbledon twice
    and the U. The infrastructural dimensions have been constructed with innovative co
    – Delhi real estate, Real estate Mumbai property, Mumbai real estatencepts
    that instantly attract the investors who are impressed by the international facilities and endless opportunities available in this city.
    By doing so you are being proactive while saving time on the many questions
    your client will ask in regard to the processes.

  14. Undeniably imagine that that you said. Your favourite reason seemed to be at the internet the easiest factor
    to understand of. I say to you, I certainly get annoyed while other folks think about worries that they just don’t understand about.

    You controlled to hit the nail upon the highest and also outlined
    out the entire thing with no need side effect , other
    people can take a signal. Will probably be back to get more.
    Thank you

  15. has for my smart cellphone also, which just helps
    make it even less difficult to stay in touch.
    I am sure you have asked him without getting a satisfied answer.
    The fact is that other than the several advantages of online dating
    there are also some risk factors which should always be kept in
    mind while dating online.

  16. Also do not forget to send the message to social networks.
    And then there is the Internet entertainment world known as pornography, which has filled every corner
    of the Internet and is poisoning the minds of both children and adults.
    Easily cleaned surfaces mean both a savings on cleaning agents and a cleaner environment for homeowners and
    building owners.

  17. Even though this form of communication is superficial, it has helped people remain close that may otherwise would have lost contact all together.
    One is a simple recorded prank call system which plays a
    recording to your victim hoping that it will achieve a response, but in reality is not credible at all and usually gets picked up on right away.
    Wang Longji told reporters that China Printed Circuit Association
    recently held in Zhuhai, “printed e-Seminar,” set up “print e-Academy Union”,
    and divided into equipment groups, material groups,
    craft groups, information groups and planning groups and several other groups.

  18. Nice post. I learn something new and challenging on sites
    Istumbleupon every day. It will always bee exciting to read through articles from other
    writers and use a little someting from their websites.

  19. Good day! I could have sworn I’ve been to this blog before but after checking through
    some of the post I realized it’s new to me. Nonetheless, I’m definitely delighted I found it and I’ll be bookmarking and
    checking back often!

  20. I have been exploring for a little for any high-quality articles or blog posts
    on this sort of area . Exploring in Yahoo I eventually stumbled upon this site.

    Studying this info So i’m happy to express that I’ve an incredibly
    just right uncanny feeling I came upon just what I
    needed. I most for sure will make certain to do
    not put out of your mind this website and provides it a look regularly.

  21. I comment each time I like a article on a website or I
    have something to valuable to contribute to the conversation. It’s triggered by the passion communicated in the article I
    browsed. And on this article Sequence Variants and the Genomic Databases: Standardizing the Nomenclature | The Chromosome Chronicles.
    I was excited enough to post a commenta response 😉 I do have 2
    questions for you if you tend not to mind. Could it be only
    me or does it appear like some of the comments come across like they are coming from brain dead visitors?
    😛 And, if you are writing at other social sites, I would
    like to follow anything new you have to post. Could you make a list every
    one of all your public pages like your linkedin profile, Facebook page or twitter feed?

Leave a Reply

Your email address will not be published. Required fields are marked *

This blog is kept spam free by WP-SpamFree.