WormBase, September 2009
Progress Report
In Progress: last updated: 09-18-2009
Staff:
WormBase
Juancarlos Chan
Developer, WormBase, Caltech, Pasadena, CA
Ranjana Kishore
Curator, WormBase, Caltech, Pasadena, CA
Paul Sternberg
PI, WormBase, Caltech, Pasadena, CA
Kimberly Van Auken
Curator, WormBase, Caltech, Pasadena, CA
Additional technical support:
Anthony Rogers
WormBase, Sanger Center, Hinxton, UK
Textpresso
Ruihua Fang
Developer, Textpresso, Caltech, Pasadena, CA
Hans Michael Muller
Project Leader, Textpresso, Caltech, Pasadena, CA
Arun Rangarajan
Developer, Textpresso, Caltech, Pasadena, CA
FTE funding from GOC Grant: 1.0
Annotation Progress
Table 1: Number of Genes Annotated to Each GO Ontology
Type of Annotation | Number of Genes Annotated | Number of Unique GO Terms | Total Number of GO Terms |
---|---|---|---|
Manual Annotation | 1684 | 1536 | 10573 |
Phenotype2GO Mappings | 4769 | 53 | 30644 |
IEA/Electronic | 9115 | 418 | 9115 |
Total | 14623 | 1812 | 50332 |
Methods and Strategies for Annotation
Literature Curation
Manual curation of the C. elegans literature remains our highest curation priority, contributing to ~90% of our total curation efforts.
We have implemented a GO curation check-out form that affords curators easy visual access to the curation status of all named C. elegans gene, e.g. vha-6 or egl-9. Genes are displayed in a list that includes the current number of published papers (references) indexed to that gene and the last date for which annotations to either of the three ontologies were made. Curators can query and sort the list according to reference count, gene name, and curation status.
Computational Methods
Our computational methods encompass two main approaches: 1) InterPro2GO mappings for IEA annotations and 2) Phenotype2GO mappings for IMP annotations.
InterPro2GO Mappings
These annotations are annotations of C. elegans proteins to GO terms based on electronic matching of protein motifs/domains to those documented in the Interpro database (http://www.ebi.ac.uk/interpro/), and their mapping to GO terms provided by the Interpro2go file generated by the EBI (PMID:12654719, PMID:12520011). Note that the 'IEA' annotations are not reviewed for accuracy by human curators. As such, all of these annotations use the evidence code 'IEA'.
Phenotype2GO Mappings:
These annotations are obtained by a semi-automated method wherein phenotypes are mapped to a GO term/s by WormBase curators. These mappings are then used by a script to attach GO_terms to genes. These annotations all have the evidence code 'IMP'. Currently, allele phenotypes or phenotypes obtained by large scale RNA interference screens have been used for the mapping. For example, the phenotype 'STErile' (Ste) which is a specialization of 'post-embryonic defect' and 'reproductive defect' is mapped to the GO term 'reproduction' (GO:0000003).
Priorities for Annotation
Our annotation priorities are as follows:
1) Reference Genome genes
2) Genes presented for annotation via our Textpresso-based semi-automated Cellular Component curation pipeline
3) Genes from training set papers used for piloting semi-automated Textpresso-based Molecular Function curation
4) Newly described genes for which previous annotation was not available
5) Phenotype2GO and InterPro2GO annotations are updated with each release.
Presentations and Publications
Publications
Reference Genome Group of the Gene Ontology Consortium. The Gene Ontology's Reference Genome Project: a unified framework for functional annotation across species, PLoS Comput Biol. 2009 Jul;5(7):e1000431. Epub 2009 Jul 3
Van Auken K, Jaffery J, Chan J, Müller HM, Sternberg PW. Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation. BMC Bioinformatics. 2009 Jul 21;10:228.
Presentations including Talks and Tutorials and Teaching
Poster presentations
Other Highlights
Curation Tools
Ontology Annotator (Phenote for GO)