Reference Genome Annotation Project Summary
GO Reference Genome Annotation Team Report 25nd April 2006 This report outlines the plans for the GO annotation team for Reference Genomes.
- What is the group’s purpose?
- The GO consortium has established the complete annotation of nine reference genomes as a priority goal. These reference genomes are Escherichia coli, Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Arabadopsis thaliana, Danio rerio, Dictyostelium discoideum, and Mus musculus. The Reference Genome GO Annotation Team, with representatives from each genome annotation group, will coordinate annotation, facilitate implementation of GO Consortium annotation priorities, provide metrics to assess progress toward the goal of broad and deep annotation of the reference genomes.
- What makes this group necessary and unique?
- This group will be responsible for the coordination of the annotation of the nine reference genomes. This group represents the annotation expertise within the GO consortium and provides key liaisons to the model organism databases the have primary responsibilities for the annotation of the reference genomes.
- What is the lifespan?
- The availability of fully annotated genomes is essential for the transfer of GO annotations to other genomes. The reference genome working group will be active as long as the complete annotation of reference genomes remains a priority for the GO Consortium.
The Coordinator of Reference Genome Annotation will coordinate the activities of the group. Rex Chisholm currently has been assigned this role. Coordination will occur through regular phone conferences, working group meetings coincident with other Consortium activities (Annotation Camps, GOC meetings). The working group will also interface with the software development working group to develop tracking tools to aid in monitoring progress in Reference Genome annotation.
- What are the key deliverables of this group?
- Perform broad and deep annotation of the nine reference genomes as described in Section A1.
- Establish metrics that enable monitoring progress toward the goal of broad and deep annotation.
- Collect annotation metrics from reference genome databases.
- Produce reports that document progress toward the goal of complete annotation of reference genomes
- What criteria are used to set priorities?
- Currently annotation of human genes involved in disease and their orthologs in reference genomes is the top priority.
- A list of human disease related genes that represent annotation targets for the reference genomes will be generated.
The working group consists of the GO annotation coordinator and GO curators for each of the reference genomes. Currently this includes, Pascale Gaudet (dictyBase), Petra Fey (dictyBase), Susan Tweedie (FlyBase), Tanya Berardini (TAIR), David Hill (MGI), Victoria Petri (RGD), Mary Shimoyama (RGD), Doug Howe (ZFIN), Evelyn Camon (GOA), Emily Dimmer (GOA), Michelle Giglio (Microbial Genomes), Karen Christie (SGD), and Rex Chisholm
The GO annotation team will establish a regular schedule of phone conferences, most likely monthly. The frequency of phone conferences will be reviewed regularly to assure that it is meeting the needs of the group. Communication will also occur via email and through face to face meetings, most likely at the GO Annotation Camp and GO meeting.
Metrics of success
The primary metric of success will be the progress toward the goal of complete annotation of reference genomes. An important aspect of the activities of this working group is the development of metrics to monitor both breadth and depth of genome annotation. The progress of these metrics will be the major measures of success.
- Fortnightly GO Annotation team phone conferences
- Quarterly reports to GO Directors
- Coordination with GO outreach working group
- Coordination with MODs responsible for annotation of reference genomes
The detailed process will be established by a consensus of the working group itself. However two areas will receive attention first.
The first of these is establishment of metrics. Starting from the metrics described in the GO grant application the working group will establish a process to feed the necessary statistics to the GO servers. In addition a task force from the working group will work with the software developers/systems team to develop a means of regularly updating these statistics using a process similar to how GO currently collection annotation statistics.
The second area will be the coordinated annotation of genes important for human disease and their orthologs in the non-human reference genomes. Early discussions with a subset of the group has suggested starting with mouse (MGI), rat (RGD) and GOA (human) annotators producing annotations for papers selected for their relevance to a particular class of human diseases, such as neurological disease. Once genes have been identified in these papers, they would be passed along to the MODs responsible for the reference genomes which would be expected to coordinate the curation of the orthologs of these genes in their databases. Once this process was complete one additional goal would be to assert that the annotations for these genes were “complete” across all of the reference genomes. One additional metric might be the number of human disease genes that have complete annotations across all reference genomes.
By cycling through this process several times it should be possible to improve the quality of the coordination and the process itself.
Question: are we addressing curator consistency anywhere? Or is that another working group's responsibility? It seems to me, by its position at the interface of the nine genomes, this group might want to address inter-database curation consistency.