WormBase, September 2009

From GO Wiki
Jump to navigation Jump to search

Progress Report

In Progress: last updated: 09-18-2009

Staff:

WormBase

Juancarlos Chan

Developer, WormBase, Caltech, Pasadena, CA

Ranjana Kishore

Curator, WormBase, Caltech, Pasadena, CA

Paul Sternberg

PI, WormBase, Caltech, Pasadena, CA

Kimberly Van Auken

Curator, WormBase, Caltech, Pasadena, CA


Additional technical support:

Anthony Rogers

WormBase, Sanger Center, Hinxton, UK


Textpresso

Ruihua Fang

Developer, Textpresso, Caltech, Pasadena, CA

Hans Michael Muller

Project Leader, Textpresso, Caltech, Pasadena, CA

Arun Rangarajan

Developer, Textpresso, Caltech, Pasadena, CA


FTE funding from GOC Grant: 1.0

Annotation Progress

Table 1: Number of Genes Annotated to Each GO Ontology

Type of Annotation Number of Genes Annotated % Change from October 2008 Number of Unique GO Terms Total Number of GO Terms
Manual Annotation 1684 1536 10573
Phenotype2GO Mappings 4769 53 30644
IEA/Electronic 9115 418 9115
Total 14623 1812 50332

Methods and Strategies for Annotation

Literature Curation

Manual curation of the C. elegans literature remains our highest curation priority, contributing to ~90% of our total curation efforts.

We have implemented a GO curation check-out form that affords curators easy visual access to the curation status of all named C. elegans gene, e.g. vha-6 or egl-9. Genes are displayed in a list that includes the current number of published papers (references) indexed to that gene and the last date for which annotations to either of the three ontologies were made. Curators can query and sort the list according to reference count, gene name, and curation status.


Computational Methods

Our computational methods encompass two main approaches: 1) InterPro2GO mappings for IEA annotations and 2) Phenotype2GO mappings for IMP annotations.

InterPro2GO Mappings

These annotations are annotations of C. elegans proteins to GO terms based on electronic matching of protein motifs/domains to those documented in the Interpro database (http://www.ebi.ac.uk/interpro/), and their mapping to GO terms provided by the Interpro2go file generated by the EBI (PMID:12654719, PMID:12520011). Note that the 'IEA' annotations are not reviewed for accuracy by human curators. As such, all of these annotations use the evidence code 'IEA'.

Phenotype2GO Mappings:

These annotations are obtained by a semi-automated method wherein phenotypes are mapped to a GO term/s by WormBase curators. These mappings are then used by a script to attach GO_terms to genes. These annotations all have the evidence code 'IMP'. Currently, allele phenotypes or phenotypes obtained by large scale RNA interference screens have been used for the mapping. For example, the phenotype 'STErile' (Ste) which is a specialization of 'post-embryonic defect' and 'reproductive defect' is mapped to the GO term 'reproduction' (GO:0000003).


Priorities for Annotation

Our annotation priorities are as follows:

1) Reference Genome genes

2) Genes presented for annotation via our Textpresso-based semi-automated Cellular Component curation pipeline

3) Genes from training set papers used for piloting semi-automated Textpresso-based Molecular Function curation

4) Newly described genes for which previous annotation was not available

5) Phenotype2GO and InterPro2GO annotations are updated with each release.

Presentations and Publications

Publications

Reference Genome Group of the Gene Ontology Consortium. The Gene Ontology's Reference Genome Project: a unified framework for functional annotation across species, PLoS Comput Biol. 2009 Jul;5(7):e1000431. Epub 2009 Jul 3

Van Auken K, Jaffery J, Chan J, Müller HM, Sternberg PW. Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation. BMC Bioinformatics. 2009 Jul 21;10:228.


Presentations including Talks and Tutorials and Teaching

None

Poster presentations

None

Other Highlights

Curation Tools

Ontology Annotator (Phenote for GO)

We are developing a new, web-based curation tool, the Ontology Annotator, that can be used to annotate genes to any ontology, including the Gene Ontology and the WormBase Phenotype Ontology. The Ontology Annotator incorporates and expands upon much of the functionality of the Phenote Curation tool. Some of the more useful features of the tool include bulk annotation capabilities and autocomplete functions.