Annotation QC

From GO Wiki
Revision as of 07:50, 22 January 2008 by Pascale (talk | contribs)

Jump to: navigation, search

The purpose of this page is to find methods to check the quality of the GO annotations. There are four types of errors that we would like to find easily:


Omission of annotations

A gene has no annotations in one of the three ontologies while other organisms do (see Reference_Genome_Database_Reports); this also includes having ISS annotations without an entry in the 'with' column. Possible causes:

  • No experimental evidence in the organism: Should try using ISS. We need to find ways for the ISS annotations that can safely be transfered easier to find.
  • Original data ia old and difficult to find
  • Original data is from non-RG organisms
  • We propose that IEA annotations would count with respect to 'completeness' of annotation (right?)



Problems in the ontology

When annotations in different organisms are very different, it may reflect problems in the ontology which makes certain terms unusable when curating genes from certain organisms; or it may be due to a complicated branch of the graph that curators have difficulty selecting from.


Varying granularity of annotations

Possible causes:

  • New (more granular) term was created since the annotation was made.

How to address this: Should we warn curator when a more granular term is created an their database have annotations to the parent term?

  • Curator feels they do not have the expertise to annotate a gene.

How to address this: Better communication: SF annotation tracker, email, wiki



Incorrect annotations

Possible causes:

  • Errors during annotation. How to address this:
  1. See graphs; also queries Reference_Genome_Database_Reports, in particular "non-IEA outliers"
  2. See also the list of commonly Misused_terms
  • Different interpretations of results.

Some questions that have come up on annotation list: 1. Doug: For the Ref. Genomes we are annotating the gene p2rx3, a subunit of an ATP activated cation channel. A paper I have shows that adding this gene to hek293 cells results in the generation of an inward current in the presence of ATP.

We know (not from this paper) that p2x receptors are a complex of subunits. Should this be annotated as 'contributes_to' ATP-gated cation channel activity by IDA because it is thought to be part of an ion channel complex, or is it not 'contributes_to' because you get the current by introduction of just the single gene product (even if it is forming a homomeric channel complex)??

Introduction of both zebrafish p2rx3 and rat p2rx5 produces a channel with novel properties....what can be done with that? 'protein heterooligomerization' with the rat p2rx5 by IGI?


2. Rama: We have a question about the use of 'colocalizes with' qualifier. We are curating PMID: 16713564. In the section titled " Separase-Dependent Downregulation of PP2ACdc55 at Anaphase Onset" the authors say that 'Colocalization with Net1 revealed nucleolar enrichment of Cdc55 in metaphase'....

The figure legend for Fig 5A says 'Cdc55 localization in the nucleolus'. Should Cdc55 be annotated to 'nucleolus' directly or to 'colocalizes with nucleolus'? The documentation on how to use this qualifier should be updated with more examples. http://www.geneontology.org/GO.annotation.conventions.shtml#colocalizes_with

Return to [Reference_Genome_Annotation_Project]