Ref Gen pub draft (Retired): Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
Line 32: Line 32:


Each month we curate 5 genes from each area as selected by one of the participating databases.
Each month we curate 5 genes from each area as selected by one of the participating databases.
==How does this project differ from standard GO annotation?==
The reference genome databases have agreed to follow guidelines that are more stringent than those used for standard annotation:
*Experimental evidence codes (IDA, IMP, IGI, IPI, IEP) should be used where possible
*Terms inferred from sequence and structural similarity (ISS) should only be used where the terms are supported by experimental evidence for the similar sequence
*Non-traceable author statements (NAS) should be avoided
*No new annotations should be based on traceable author statements (TAS); existing terms assigned with TAS should gradually be replaced with the appropriate experimental evidence code based on the primary literature


==Graphic Representation==
==Graphic Representation==
A colorful graphic representation of the reference genome effort can be viewed in full here:<br>
A colorful graphic representation of the reference genome effort can be viewed in full here:<br>
http://www.geneontology.org/images/RefGenomeGraphs/  <br>  
http://www.geneontology.org/images/RefGenomeGraphs/  <br>  
There is one graph per curated reference gene.  In addition to the graph there are two informative tables per gene, which either list GO annotations by category, or full experimental annotations in each organism for the given gene. This facilitates the comparison of the curation status in the 12 reference genomes and helps curators to identifty genes that need attention.
There is one graph per curated reference gene.  In addition to the graph there are two informative tables per gene, which either list GO annotations by category, or full experimental annotations in each organism for the given gene. This facilitates the comparison of the curation status in the 12 reference genomes and helps curators to identify genes that need attention.


[[image:POLAgraph.png]] Partial Graph of Gene POLA
[[image:POLAgraph.png]] Partial Graph of Gene POLA

Revision as of 10:29, 5 November 2007

The Reference Genome Annotation Project

Introduction

We are experiencing an explosion of genomic information, where more and more genomes are being sequenced. However, there are limited resources to annotate these growing numbers of genomes, thus automatic annotation will be the method of choice for many. On the other side, several model organism databases have a group of trained and highly skilled GO curators. In an effort to maximize and optimize the GO annotation of a large set of key genomes (called from now on 'the reference genomes') the GO consortium has established the priority goal to completely annotate 12 reference genomes. These reference genomes are:

Purpose

The Reference Genome GO Annotation Team, with representatives from each genome annotation group, will coordinate annotation, facilitate implementation of GO Consortium annotation priorities, and provide metrics to assess progress toward the goal of broad and deep annotation of the reference genomes. This group represents the annotation expertise within the GO consortium and provides key liaisons to the model organism databases the have primary responsibilities for the annotation of the reference genomes.

Priorities for Annotation

Our ultimate aim is to provide comprehensive GO annotation for all gene products in each of the reference genomes. This is a huge task and requires us the prioritise the targets for curation. Our intial annotation efforts (Aug 20076- Sept 2007) focussed on orthologs of human disease genes but in Oct 2007 we widened our list to four priority areas:

  • Orthologs of human disease genes
  • Topical or ‘hot’ genes
  • Genes conserved from E. coli to human that currently lack GO annotation
  • Genes involved in (metabolic?) pathway

Each month we curate 5 genes from each area as selected by one of the participating databases.

How does this project differ from standard GO annotation?

The reference genome databases have agreed to follow guidelines that are more stringent than those used for standard annotation:

  • Experimental evidence codes (IDA, IMP, IGI, IPI, IEP) should be used where possible
  • Terms inferred from sequence and structural similarity (ISS) should only be used where the terms are supported by experimental evidence for the similar sequence
  • Non-traceable author statements (NAS) should be avoided
  • No new annotations should be based on traceable author statements (TAS); existing terms assigned with TAS should gradually be replaced with the appropriate experimental evidence code based on the primary literature


Graphic Representation

A colorful graphic representation of the reference genome effort can be viewed in full here:
http://www.geneontology.org/images/RefGenomeGraphs/
There is one graph per curated reference gene. In addition to the graph there are two informative tables per gene, which either list GO annotations by category, or full experimental annotations in each organism for the given gene. This facilitates the comparison of the curation status in the 12 reference genomes and helps curators to identify genes that need attention.

Partial Graph of Gene POLA

Activities

  • Monthly Conference Calls
  • First Reference Genome Annotation Meeting, Princeton, NJ, Sept 26, 27, 2007