Phylogenetic Annotation Project

The GO consortium has established the complete annotation of nine reference genomes as a priority goal. These reference genomes are:

Arabidopsis thaliana
Caenorhabditis elegans
Danio rerio (zebrafish)
Dictyostelium discoideum
Drosophila melanogaster
Escherichia coli
Homo sapiens
Saccharomyces cerevisiae
Mus musculus

The Reference Genome GO Annotation Team, with representatives from each genome annotation group, will coordinate annotation, facilitate implementation of GO Consortium annotation priorities, provide metrics to assess progress toward the goal of broad and deep annotation of the reference genomes. This group will be responsible for the coordination of the annotation of the nine reference genomes. This group represents the annotation expertise within the GO consortium and provides key liaisons to the model organism databases the have primary responsibilities for the annotation of the reference genomes.

In addition to the organisms listed in the GO grant application, three additional model organism databases have agreed to contribute annotations for their organisms using the priorities established by the Reference Genome Working Group and to provide metrics that will be used to monitor both depth and breadth of annotation. These are:

Schizosaccharomyces pombe
Gallus gallus
Rattus norvegicus

Reference Genome Annotation Project Summary

The Reference Genome Gene List and Summary

The spreadsheet is located at

http://dcn.spreadsheets.google.com/ccc?id=o16926456948884040128.4584390909151853752.07000735126025259412.442372083524637957

Access requires your email to be added to the system. Email Rex if you would like to be added.

This spreadsheet contains links to separate spreadsheets maintained by each of the reference genome groups.

Reference Genome Wiki Pages

The following Wiki Pages are available to reference genome participants for discussions:

Reference Genome Database Requirements Discussion

The purpose of this page is to discuss features and requirements that would be desirable in a database used to replace the existing Google Spreadsheet system for managing target genes, their annotations and metrics.

Orthology discussion page

The purpose of this page is to discuss general principles and problems with establishing orthology between reference genome genes and human disease genes.

Index of wiki pages for Reference Genome Genes

The purpose of these pages are to allow discussions of annotation issues related to particular genes. The individual gene pages will be created as needed.

Reference Genome Mailing list

The Reference Genome GO Annotation Team uses an email discussion list to facilitate communication. The list is open to curators of the identified reference genomes involved in reference genome annotation.

To join the list send an email to:

  refgenome-request@geneontology.org

In the body of the message add the line:

  subscribe <insert your email address>

To access the email archive for this mailing list:

http://www.geneontology.org/GO.list.refgenome.shtml

Guidelines for Characterization of Reference Genome Descriptions

All descriptions based on Sequene Ontology terms

All counts are necessarily estimates, but some can be estimated to the ones digits, while others just to the 1000's. Therefore no need to distinguish, just look at the significant digit. It is recognized that different databases will be currently able to provide different portions of this. A goal should be for each database to provide numbers for each of these categories.

Numbers to be presented.

- CDS: count one per genomic occurrence (mRNA? this might need to be refined, if the group is annotating proteins, not genes)--required

- snoRNA: count one per genomic occurrence

- rRNA: count one per type

- snRNA: count one per genomic occurrence

- tRNA: count one per genomic occurrence

- ncRNA: count one per genomic occurrence and do not double count (i.e. if snoRNA count is supplied, don't double count it here)

- transposable_element: count one per genomic occurrence

- transposable_element_gene: count one per unique mRNA occurrence per transposable_element type

- pseudogene: count one genomic occurrence

Minutes of Reference Genome Phone Conferences

RefGenome24Jan07_Phone_Conference.doc