Goal of the Reference Genome Annotation Project

The GO consortium has established the complete annotation of 12 reference genomes as a priority goal. These reference genomes are:

Arabidopsis thaliana, Caenorhabditis elegans, Danio rerio, Dictyostelium discoideum, Drosophila melanogaster, Escherichia coli, Gallus gallus, Homo sapiens, Mus musculus, Rattus norvegicus, Saccharomyces cerevisiae, Schizosaccharomyces pombe

The Reference Genome GO Annotation Team, with representatives from each genome annotation group, will coordinate annotation, facilitate implementation of GO Consortium annotation priorities, provide metrics to assess progress toward the goal of broad and deep annotation of the reference genomes. This group will be responsible for the coordination of the annotation of the twelve reference genomes. This group represents the annotation expertise within the GO consortium and provides key liaisons to the model organism databases the have primary responsibilities for the annotation of the reference genomes.

Reference Genome Annotation Project Summary

Reference Genome Web Page Draft

Progress Reports

2008-08-13 RefGen RefGenProgress_2008-09-10

2008-08-13 RefGen RefGenProgress_2008-08-13

2008-07-18 RefGen RefGenProgress_2008-07-18

2008-06-18 RefGen RefGenProgress_2008-06-18

2008-06-04 RefGen RefGenProgress_2008-06-04

2008-04-20 Second Reference Genome Annotation Meeting, Salt Lake City, UT [Minutes]

2007-09-27: First Reference Genome Annotation Meeting, Princeton, NJ [Minutes]

Communication

Reference Genome Mailing list

Conference Calls

Meetings

Electronic jamborees

Annotation Targets

From May 2008

Target Gene List (May 2008-)

Target Gene List August 2006-April 2008

Access requires your email to be added to the system. Email Pascale if you would like to be added.
This spreadsheet contains links to separate spreadsheets maintained by each of the reference genome groups.

Procedure for selection of target genes

Procedure for filling Genome-Specific spreadsheets

Gene Annotation

Annotation_pipeline

By Judy, Suzi, Michael

Annotation Quality control

Annotation QC
Annotation completion Source Forge tracker [http://sourceforge.net/tracker/?group_id=36855&atid=1040173]
Reference_Genome_Database_Reports. Those reports are generated with the GOOSE SQL interface and provide lists of potentially mis-annotated genes.

Annotation Consistency Issues

Annotation consistency: IEA, ISS, IC Usage Discussion: Tanya Berardini, Emily Dimmer, Pascale Gaudet, David Hill, Chris Mungall, Kimberly VanAuken
Annotation consistency: IDA or IC for processes: Tanya Berardini, Emily Dimmer, Pascale Gaudet, David Hill, Chris Mungall, Kimberly VanAuken, Ruth Lovering
Annotation consistency: HTP Annotation of high throughput experiments, including microarray data SGD GO HTP guidelines : Stacia Engel, Emily Dimmer, Val Wood, Ruth Lovering
Annotation consistency: Using IEP, including microarray data, and heat shock protein example: Emily Dimmer, Stacia Engel, Pascale Gaudet, Ruth Lovering, Varsha Khodiyar, Val Wood
Annotation consistency: x protein binding and with : Becky Foulger, David Hill
Annotation consistency: xx binding in the context of gene product: example is 'co-enzyme binding' from electronic jamboree : Pascale Gaudet, David Hill, Victoria Petri,
Annotation consistency: 'Response to' terms: Check that all databases use evidence codes correctly for those terms. Tanya Berardini, Emily Dimmer, Pascale Gaudet, Ruth Lovering

Improving GO terms and definitions

Annotation consistency: Clarification of oligomerization, dimerization, protein complex assembly: Debby Siegele
Annotation consistency: chaperone activity definition: clarify the definitions of unfolded/misfolded protein binding and add chaperone activity as a synonym to both of the terms. Also add ‘de novo’ synonym to ‘unfolded protein binding’. Victoria Petri

Misused terms

This page provides a list of often misused terms and (hopefully) an explanation as to how to use the term properly. This information should also be included in the 'comments' of the OBO file.

Variant_annotation

This page describes how each database handles suration of multiple forms of the same gene

Providing annotations to GOA in taxa other than you MOD's

Please follow these instructions if you encounter a gene not from your database that you need to annotate.

Gene Annotation wiki pages

The purpose of these pages are to allow discussions of annotation and orthology issues related to particular genes. The individual gene pages are to be created as needed.

Annotation Progress

Graphical views of the annotations:

Selected refG target sets

PPOD clusters selected since April 2008
Manually curated target sets selected before April 2008

All PPOD clusters with at least one object from each of the twelve refG organisms

Reference Genomes Metrics | Metrics: Discussion on annotation progress measurements

Orthology determination

Running P-POD orthology tool on the reference genomes gene set

by Kara Dolinski at Princeton - Nov2007

This page contains a description of the project and the requirements for providing files

List of potentially problematic families for all vs. all BLAST methods of orthology determination

SOP for determining ortholog (by database)

The purpose of this page is to discuss general principles and problems with establishing orthology between reference genome genes and human disease genes.

Tools for orthology determination

A summary of tools available to identify orthologs.

GFF3 sequence files for reference genome MODs

Reference_Genome_sequence_annotation

Software/database development

Reference Genome Database Requirements Discussion
Reference Genome Software
Software group

The purpose of this page is to discuss features and requirements that would be desirable in a database used to replace the existing Google Spreadsheet system for managing target genes, their annotations and metrics.

Phylogenetic Annotation Project

Contents