Variant annotation

Arabidopsis thaliana

Caenorhabditis elegans

We are starting to see a few examples of isoform-specific functions and/or localization in the C. elegans literature. In the cases where we can confidently match the isoform to a Wormpep protein identifier (e.g., WP:CE25075), then we make the annotation specifically to that isoform. If, from the paper, we can't determine the specific isoform used, then by deafult we make the annotation to all of the protein isoforms.

Danio rerio

We almost never have enough info to curate to the level of a splice variant. Our annotations are applied at the level of the gene.

Dictyostelium discoideum

So far we only have a few genes and publications that described splice variants, and the papers never described different functions for the different variants. Hence, we currently don't capture annotations to different variants of gene products.

Drosophila melanogaster

At the moment we attach all GO terms to genes. We are in the process of figuring out how to move to change our curation method to annotating proteins. In the meantime we make an internal note for papers that describe isoform specific info so that we can revisit these.

Escherichia coli

Gallus gallus

We are using UniProtKB accession IDs wherever possible and this allows us to annotate specific isoforms if required.

Homo sapiens

The human group annotates to UniProtKB accessions. When a paper provides isoform-specific information, then this data can be captured using the appropriate UniProt isoid. E.g. Q4VCS5-1, Q4VCS5-2. When isoform-specific information is not provided then the top-level UniProt accession number is only annotated to, e.g. Q4VCS5.

Mus musculus

For each annotation, MGI has a "notes field" that is not available to the public. That note has a structure as follows:

evidence:
anatomy:
cell type:
gene product:
qualifier:
target:
external ref:
text:

If a paper actually specifies a specific isoform, the appropriate refseq is entered into the "gene_product" field

eg, For the annotation of MGI:1341722,Kcnh2,to GO:0005886, plasma membrane,by IDA, the field would look like:

gene_product:SPKW:O35219-1

We presently only have about 300 of these with experimental evidence codes, annotated after the adoption of the structured notes. So QC has to be done for some. Annotations done prior to that will not have any entry, as we had no way of capturing the data. We are looking at ways to "back annotate" by identifying having multiple isoforms identified in references that have been used for GO annotation at MGI.

Rattus norvegicus

There are not too many splice variants currently in the database. Those that are have their own DB:ID, get the symbol of the parent gene with underscore vnumber followed by variant of symbol in parentheses with symbol hyperlinked to the report page of the parent gene. Example:geneX_v1 (variant of geneX). The variants can also be accessed from the top level gene. The variants may have some mapping, sequence, other external database links, if applicable. They seldom have annotations. It may happen that the information in the literature allows for annotation of the splice variants but that is rather rare.

Saccharomyces cerevisiae

We have very few documented cases of splice variants, and our database structure currently cannot display splice variants. So, we do not at this time annotate splice variants. We are working on a database restructure so that we can represent different splice variants, but it has not yet been implemented.