WormBase, September 2009
Progress Report
In Progress: last updated: 09-18-2009
Staff:
WormBase
Juancarlos Chan
Developer, WormBase, Caltech, Pasadena, CA
Ranjana Kishore
Curator, WormBase, Caltech, Pasadena, CA
Paul Sternberg
PI, WormBase, Caltech, Pasadena, CA
Kimberly Van Auken
Curator, WormBase, Caltech, Pasadena, CA
Additional technical support:
Anthony Rogers
WormBase, Sanger Center, Hinxton, UK
Textpresso
Ruihua Fang
Developer, Textpresso, Caltech, Pasadena, CA
Hans Michael Muller
Project Leader, Textpresso, Caltech, Pasadena, CA
Arun Rangarajan
Developer, Textpresso, Caltech, Pasadena, CA
FTE funding from GOC Grant: 1.0
Annotation Progress
Table 1: Number of Genes Annotated to Each GO Ontology
Type of Annotation | Number of Genes Annotated | % Change from October 2008 | Number of Unique GO Terms | Total Number of GO Terms |
---|---|---|---|---|
Manual Annotation | 1684 | +10% | 1536 | 10573 |
Phenotype2GO Mappings | 4769 | +2.2% | 53 | 30644 |
IEA/Electronic | 9115 | -27.7% | 418 | 9115 |
Total | 14623 | +2.0% | 1812 | 50332 |
Methods and Strategies for Annotation
Literature Curation
Manual curation of the C. elegans literature remains our highest curation priority, contributing to ~90% of our total curation efforts.
We have implemented a GO curation check-out form that affords curators easy visual access to the curation status of all named C. elegans gene, e.g. vha-6 or egl-9. Genes are displayed in a list that includes the current number of published papers (references) indexed to that gene and the last date for which annotations to either of the three ontologies were made. Curators can query and sort the list according to reference count, gene name, and curation status.
Computational Methods
Our computational methods encompass two main approaches: 1) InterPro2GO mappings for IEA annotations and 2) Phenotype2GO mappings for IMP annotations.
InterPro2GO Mappings
These annotations are annotations of C. elegans proteins to GO terms based on electronic matching of protein motifs/domains to those documented in the Interpro database (http://www.ebi.ac.uk/interpro/), and their mapping to GO terms provided by the Interpro2go file generated by the EBI (PMID:12654719, PMID:12520011). Note that the 'IEA' annotations are not reviewed for accuracy by human curators. As such, all of these annotations use the evidence code 'IEA'.
Phenotype2GO Mappings:
These annotations are obtained by a semi-automated method wherein phenotypes are mapped to a GO term/s by WormBase curators. These mappings are then used by a script to attach GO_terms to genes. These annotations all have the evidence code 'IMP'. Currently, allele phenotypes or phenotypes obtained by large scale RNA interference screens have been used for the mapping. For example, the phenotype 'STErile' (Ste) which is a specialization of 'post-embryonic defect' and 'reproductive defect' is mapped to the GO term 'reproduction' (GO:0000003). A list of the currently used mappings can be found here:
http://www.wormbase.org/wiki/index.php/Phenotype2GO_Mappings_File
Priorities for Annotation
Our annotation priorities are as follows:
1) Reference Genome genes
2) Genes presented for annotation via our Textpresso-based semi-automated Cellular Component curation pipeline
3) Genes from training set papers used for piloting semi-automated Textpresso-based Molecular Function curation
4) Newly described genes for which previous annotation was not available
5) C. elegans orthologs of human disease genes
6) Phenotype2GO and InterPro2GO annotations are updated with each release.
Presentations and Publications
Publications
Reference Genome Group of the Gene Ontology Consortium. The Gene Ontology's Reference Genome Project: a unified framework for functional annotation across species, PLoS Comput Biol. 2009 Jul;5(7):e1000431. Epub 2009 Jul 3
Van Auken K, Jaffery J, Chan J, Müller HM, Sternberg PW. Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation. BMC Bioinformatics. 2009 Jul 21;10:228.
Presentations including Talks and Tutorials and Teaching
None
Poster presentations
None
Other Highlights
A. Ontology Development Contributions
WormBase curators have contributed to ontology discussion and development in the areas of intraflagellar transport, sex determination and dosage compensation, apoptosis, gastrulation, and drug withdrawal.
B. Annotation Outreach and User Advocacy Efforts
Kimberly Van Auken continues to participate in the gohelp rotation. Ranjana Kishore continues to participate in the efforts of the GO News group.
C. Other Highlights
Curation Tools: Ontology Annotator (Phenote for GO)
We are developing a new, web-based curation tool, the Ontology Annotator, that can be used to annotate genes to any ontology, including the Gene Ontology and the WormBase Phenotype Ontology. The Ontology Annotator incorporates and expands upon much of the functionality of the Phenote Curation tool. Some of the more useful features of the tool include bulk annotation capabilities and autocomplete functions.
Semi-Automated Molecular Function Curation
We continue to explore Textpresso-based GO curation, by developing pipelines for semi-automated Molecular Function curation. Preliminarily, our plans involve a two-tiered approach encompassing: 1) document classification using SVMs (Support Vector Machines) and 2) category searches to identify curatable sentences within documents identified as high confidence for Molecular Function information by SVMs. Our initial efforts are focusing on the binding branch of the MF ontology, including protein-nucleic acid interactions.
Expanded Phenotype2GO Mappings
Working with the WormBase phenotype curators, we have added an additional 146 mappings to our Phenotype2GO mappings, which are used to make GO Biological Process annotations using the IMP evidence code. Allele- or RNAi-based phenotypes are annotated to a term from the WormBase phenotype ontology, which is then mapped to an appropriate GO term. A list of the new mappings can be found here:
http://www.wormbase.org/wiki/index.php/Phenotype2GO_Mappings_Sept._09