Reference Genome Progress Report for December 2008
Members
Members
Group | Contact Person |
---|---|
Manager | Pascale Gaudet, Northwestern University |
SGD | Stacia Engel |
MGI | Li Ni |
FlyBase | Susan Tweedie |
dictyBase | Pascale Gaudet |
E.coli | Jim Hu |
TAIR | Tanya Beradini |
WormBase | Kimberly Van Auken |
S. pombe | Val Wood |
RGD | Victoria Petri |
Human | Emily Dimmer |
Zebrafish | Doug Howe |
Chicken | Fiona McCarthy |
P-POD | Kara Dolinksi |
Panther | Paul Thomas |
Target genes
- There are currently 500 gene sets in the Targets list, corresponding to a total of approximately 4,000 genes.
- Selection of genes: Since April 2008, we select genes from list generated by the P-POD tool (Heinicke S, Livstone MS, Lu C, Oughtred R, Kang F, Angiuoli SV, White O, Botstein D, Dolinski K: The Princeton Protein Orthology Database (P-POD): a comparative genomics analysis tool for biologists. PLoS ONE 2007, 2:e766).
- Curation priorities: Since Nov 2007, targets are not only disease genes anymore. We select 20 genes, 5 in each of 4 categories: (1) disease genes, (2) 'hot genes', (3) metabolic pathways, (4) conserved but uncharacterized genes.
Annotation Progress on selected genes for 2008
Organism | Number of selected genes with orthologs | Number of selected genes curated based on EXP data | Number of selected genes curated total (excluding IEA and ND) |
---|---|---|---|
Arabidopsis | 333 | 174 | 174 |
C. elegans | 167 | 91 | 91 |
Chicken | 123 | 3 | 9 |
Human | 268 | 177 | 192 |
Mouse | 148 | 123 | 134 |
S. cerevisiae | 182 | 162 | 167 |
Drosophila | 168 | 68 | 117 |
Rat | 230 | 115 | 164 |
Danio reiro | 195 | 23 | 24 |
Dictyostelium | 154 | 29 | 104 |
S. pombe | 137 | 123 | 131 |
E. coli | 62 | 53 | 54 |
Annotation Quality Control
We are trying to address the issue of quality control of the annotations. Some of the concerns are:
- Omission of annotations
- Errors in annotations
- Absence of 'with' for ISS annotations or 'with' object not experimentally characterized
- Overannotation with ISS to process terms
- Problems in the ontology that can become evident when comparing annotations from different species
To address annotation consistency issues, we have held two "Electronic Annotation Jamboree" where two genes are annotated in advance by each group, and the annotations are discussed. Minutes are http://wiki.geneontology.org/index.php/Electronic_jamborees
Software development
Currently the targets genes and annotation status are captured using Google spreadsheets (Target genes and links to every group's annotation status page can be found at http://spreadsheets.google.com/ccc?key=pwOksMOra5uq4vIYjPgefPw (1) Ortho set curation status: Siddhartha Basu, Chris Mungall, Seth Carbon and Mary Dolan are working on a database and a tool where target genes (ortho sets) and their curation status will be maintained. A demo version of the tool Is available at: http://berkeleybop.org/RefG/RefGenome.html (2) Graphical displays will be in the next version of AmiGO (3) Panther (Paul Thomas's group, SRI) is developing a tool to allow visualization of protein families, together with the annotations available. The tool will be used to make predictions of gene functions to entire gene families when the experimental evidence is sufficiently strong in a wide range of organisms. Kara Dolinski and Pascale Gaudet will be overseeing the tree-based annotation.
Generating Ortholog sets
We have started to work on a new method for generating ortholog sets based on phylogenetic trees using the Panther software (Mi H, Guo N, Kejariwal A, Thomas PD: PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways. Nucleic Acids Res 2007, 35:D247-252).
We continue to use P-POD as a quality control measure and to define smaller subfamilies when necessary (Heinicke S, Livstone MS, Lu C, Oughtred R, Kang F, Angiuoli SV, White O, Botstein D, Dolinski K: The Princeton Protein Orthology Database (P-POD): a comparative genomics analysis tool for biologists. PLoS ONE 2007, 2:e766).
Communication
The reference genome group holds a monthly phone conference. Minutes can be found at http://wiki.geneontology.org/index.php/Conference_Calls.
There has been one Reference Genome meeting in April 2008 in Salt Lake City. Minutes: http://wiki.geneontology.org/index.php/Reference_Genome_Meeting_Minutes_April_2008
We have held two "Electronic Annotation Jamboree" where two genes are annotated in advance by each group, and the annotations are discussed. Minutes are http://wiki.geneontology.org/index.php/Electronic_jamborees