Difference between revisions of "Variant annotation"
|Line 49:||Line 49:|
We have very few documented cases of
We have very few documented cases of variants, and our database structure currently cannot display . So, we do not at this time annotate variants. We are working on a database restructure so that we can represent different variants, but it has not yet been implemented.
Revision as of 12:36, 13 November 2007
We are starting to see a few examples of isoform-specific functions and/or localization in the C. elegans literature. In the cases where we can confidently match the isoform to a Wormpep protein identifier (e.g., WP:CE25075), then we make the annotation specifically to that isoform. If, from the paper, we can't determine the specific isoform used, then by deafult we make the annotation to all of the protein isoforms.
We almost never have enough info to curate to the level of a splice variant. Our annotations are applied at the level of the gene.
So far we only have a few genes and publications that described splice variants, and the papers never described different functions for the different variants. Hence, we currently don't capture annotations to different variants of gene products.
At the moment we attach all GO terms to genes. We are in the process of figuring out how to move to change our curation method to annotating proteins. In the meantime we make an internal note for papers that describe isoform specific info so that we can revisit these.
We are using UniProtKB accession IDs wherever possible and this allows us to annotate specific isoforms if required.
The human group annotates to UniProtKB accessions. When a paper provides isoform-specific information, then this data can be captured using the appropriate UniProt isoid. E.g. Q4VCS5-1, Q4VCS5-2. When isoform-specific information is not provided then the top-level UniProt accession number is only annotated to, e.g. Q4VCS5.
For each annotation, MGI has a "notes field" that is not available to the public. That note has a structure as follows:
evidence: anatomy: cell type: gene product: qualifier: target: external ref: text:
If a paper actually specifies a specific isoform, the appropriate refseq is entered into the "gene_product" field
eg, For the annotation of MGI:1341722,Kcnh2,to GO:0005886, plasma membrane,by IDA, the field would look like:
We presently only have about 300 of these with experimental evidence codes, annotated after the adoption of the structured notes. So QC has to be done for some. Annotations done prior to that will not have any entry, as we had no way of capturing the data. We are looking at ways to "back annotate" by identifying having multiple isoforms identified in references that have been used for GO annotation at MGI.
There are not too many splice variants currently in the database. Those that are have their own DB:ID, get the symbol of the parent gene with underscore vnumber followed by variant of symbol in parentheses with symbol hyperlinked to the report page of the parent gene. Example:geneX_v1 (variant of geneX). The variants can also be accessed from the top level gene. The variants may have some mapping, sequence, other external database links, if applicable. They seldom have annotations. It may happen that the information in the literature allows for annotation of the splice variants but that is rather rare.
We have very few documented cases of splicing or processing variants, and our database structure currently cannot display variant gene product forms. So, we do not at this time annotate variants. We annotate only one gene product per gene. We are working on a database restructure so that we can represent different variants, but it has not yet been implemented.