Reference Genome Progress Report for December 2008

From GO Wiki
Jump to navigation Jump to search

Members

Members

Group Contact Person
Manager Pascale Gaudet, Northwestern University
SGD Stacia Engel
MGI Li Ni
FlyBase Susan Tweedie
dictyBase Pascale Gaudet
E.coli Jim Hu
TAIR Tanya Beradini
WormBase Kimberly Van Auken
S. pombe Val Wood
RGD Victoria Petri
Human Emily Dimmer
Zebrafish Doug Howe
Chicken Fiona McCarthy
P-POD Kara Dolinksi
Panther Paul Thomas


Target genes

  • There are currently 500 gene sets in the Targets list, corresponding to a total of approximately 4,000 genes.
  • Selection of genes: Since April 2008, we select genes from list generated by the P-POD tool (Heinicke S, Livstone MS, Lu C, Oughtred R, Kang F, Angiuoli SV, White O, Botstein D, Dolinski K: The Princeton Protein Orthology Database (P-POD): a comparative genomics analysis tool for biologists. PLoS ONE 2007, 2:e766).
  • Curation priorities: Since Nov 2007, targets are not only disease genes anymore. We select 20 genes, 5 in each of 4 categories: (1) disease genes, (2) 'hot genes', (3) metabolic pathways, (4) conserved but uncharacterized genes.


Annotation Progress on selected genes for 2008

Organism Number of selected genes with orthologs Number of selected genes curated based on EXP data Number of selected genes curated total (excluding IEA and ND)
Arabidopsis 333 174 174
C. elegans 167 91 91
Chicken 123 3 9
Human 268 177 192
Mouse 148 123 134
S. cerevisiae 182 162 167
Drosophila 168 68 117
Rat 230 115 164
Danio reiro 195 23 24
Dictyostelium 154 29 104
S. pombe 137 123 131
E. coli 62 53 54


Annotation Quality Control

We are trying to address the issue of quality control of the annotations. Some of the concerns are:

  • Omission of annotations
  • Errors in annotations
  • Absence of 'with' for ISS annotations or 'with' object not experimentally characterized
  • Overannotation with ISS to process terms
  • Problems in the ontology that can become evident when comparing annotations from different species

To address annotation consistency issues, we have held two "Electronic Annotation Jamboree" where two genes are annotated in advance by each group, and the annotations are discussed. Minutes are http://wiki.geneontology.org/index.php/Electronic_jamborees


Software development

Currently the targets genes and annotation status are captured using Google spreadsheets (Target genes and links to every group's annotation status page can be found at http://spreadsheets.google.com/ccc?key=pwOksMOra5uq4vIYjPgefPw (1) Ortho set curation status: Siddhartha Basu, Chris Mungall, Seth Carbon and Mary Dolan are working on a database and a tool where target genes (ortho sets) and their curation status will be maintained. A demo version of the tool Is available at: http://berkeleybop.org/RefG/RefGenome.html (2) Graphical displays will be in the next version of AmiGO (3) Panther (Paul Thomas's group, SRI) is developing a tool to allow visualization of protein families, together with the annotations available. The tool will be used to make predictions of gene functions to entire gene families when the experimental evidence is sufficiently strong in a wide range of organisms. Kara Dolinski and Pascale Gaudet will be overseeing the tree-based annotation.


Generating Ortholog sets

We have started to work on a new method for generating ortholog sets based on phylogenetic trees using the Panther software (Mi H, Guo N, Kejariwal A, Thomas PD: PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways. Nucleic Acids Res 2007, 35:D247-252).

We continue to use P-POD as a quality control measure and to define smaller subfamilies when necessary (Heinicke S, Livstone MS, Lu C, Oughtred R, Kang F, Angiuoli SV, White O, Botstein D, Dolinski K: The Princeton Protein Orthology Database (P-POD): a comparative genomics analysis tool for biologists. PLoS ONE 2007, 2:e766).

Communication

The reference genome group holds a monthly phone conference. Minutes can be found at http://wiki.geneontology.org/index.php/Conference_Calls.

There has been one Reference Genome meeting in April 2008 in Salt Lake City. Minutes: http://wiki.geneontology.org/index.php/Reference_Genome_Meeting_Minutes_April_2008

We have held two "Electronic Annotation Jamboree" where two genes are annotated in advance by each group, and the annotations are discussed. Minutes are http://wiki.geneontology.org/index.php/Electronic_jamborees