WB Progress Report December 2008
GO Progress Report
Model Organism Database: WormBase December, 2008
WormBase (www.wormbase.org) is the central repository for genomic and biological data for C. elegans and other related nematode species. WormBase actively annotates genes using the Gene Ontology, using both manual and computational strategies. WormBase is an active member of the Reference Genome project striving to provide broad and comprehensive annotation of the C. elegans genome.
GO Curation, Ontology Development, Outreach:
Ranjana Kishore (funded by GOC NHGRI grant) – manual annotation, management of GO annotation files (gene_association.wb and WormBase .ace files), Phenotype2GO mapping, GO Newsletter
Kimberly Van Auken – manual annotation, development of semi-automated curation methods, management of gp2protein file, Phenotype2GO mapping, gohelp desk rotation
Erich Schwarz – Phenotype2GO mappings
Software Development and Support:
Igor Antoshechkin – software support for gene_association.wb files
Juancarlos Chan – software support for gene_association.wb files, .ace files, gp2protein files, Phenote curation tool development
Automated annotation based on InterPro2GO and Phenotype2GO mappings:
WormBase groups at Caltech and the Wellcome Trust Sanger Institute, UK.
WormBase GO Stats as of December 5, 2008:
Total number of genes with at least one GO annotation: 14,381 (+1.59%)
Total number of genes with manual annotation: 1,551 (+29.25%)
Total number of genes with Phenotype2GO annotation: 4,679 (+16.98%)
Total number of genes with InterPro2GO annotation: 12,665 (-31.20%)
The decrease in the number of genes with InterPro2GO (IEA) annotations is due to the fact that we now map these annotations to genes, rather than protein products of which there may be multiple isoforms for a single gene.
Methods and Strategies for Annotation
90% of our effort is concentrated on manual and semi-automated literature-based annotation, with the remaining 10% of our efforts devoted to maintaining automated (IEA) annotations.
a. Literature curation (All evidence codes except IEA):
Priority for literature-based curation is as follows:
1) Reference Genome Project Orthologs
2) Well-studied C. elegans genes underrepresented with GO annotation
3) Cellular component annotation as part of our semi-automated Textpresso-based curation pipeline (see below)
4) Newly published papers with previously unannotated gene function
5) Orthologs of human disease genes
b. Computational annotation strategies:
WormBase has two types of computational annotations: (i) Phenotype2GO mappings (IMP): This method involves the manual mapping of phenotypes obtained from alleles and RNAi experiments to GO terms. The mappings are used during the database build process to automatically assign GO terms to genes based upon a mutant phenotype. For example, a phenotype of ‘egg-laying defective’ (Egl) is automatically mapped to the GO biological process term ‘oviposition’. About 68 distinct phenotypes have been mapped to GO process terms and these mappings were recently updated jointly by WormBase GO and phenotype curators. (ii) Electronic annotation (IEA): This method uses InterPro2GO mappings (the EBI’s assignment of GO terms to conserved protein sequences, http://www.interpro.ebi.ac.uk) provided by the EBI to automatically assign GO terms to gene products via protein domains (Mulder NJ et. al., 2002; Zdobnoy EM, et. al., 2001; Biswas M, et al., 2002).
c. Semi-Automated Cellular Component Curation Using Textpresso: In an effort to improve the efficiency with which we curate cellular component (CC) information from the literature, we have implemented a Textpresso-based semi-automated curation pipeline. On a weekly basis, new papers added to the Caltech literature curation database are automatically searched using C. elegans gene names and three new Textpresso categories developed solely for GO CC curation. Positive sentences returned by the search are then displayed in a curation form that allows curators to easily assign GO annotations based on the returned sentences.
The cellular component terms from the sentences and the corresponding GO term chosen for annotation are then recorded in a relationship index so that when a previously curated component term is returned in a new sentence the curation form lists, as suggested annotations, all previous annotations made from that term. The relationship index thus removes the need to repeatedly enter the same GO annotation into the form, thus helping to improve the speed of annotation.
Presentations and Publications
a. Papers with substantial GO content:
Rogers A, et al., (2008) WormBase 2007. NAR Jan; 36 (Database Issue): D612-7.
Van Auken KM, Jaffery J, Chan J, Mueller HM, Sternberg P (2008) Semi-Automated Curation of Protein Subcellular Localization: A Text Mining-Based Approach to Gene Ontology (GO) Cellular Component Curation. In preparation.
b. Presentations: No GO-related presentations in 2008.
a. Ontology Development Contributions: WormBase continues to contribute to ontology development by contributing biological process terms regarding piRNA and 21U-RNA metabolic processes, transdifferentiation, regulation of ovulation, and cell communication processes performed by gap junction proteins.
b. Annotation Outreach and User Advocacy Efforts: WormBase has recently performed a pilot experiment for user-solicited ‘first pass’ curation of newly published papers. Users were asked to flag different data types in their papers for more detailed extraction by WormBase curators. We hope that this will help to expedite identification of gene function and expression data for all data types curated by WormBase, including GO annotations.
c. Phenote for GO, Other Nematode Genomes, GO Display in WormBase:
(i) Development of Phenote as a GO Curation Tool: In collaboration with Mark Gibson, WormBase has developed the Phenote annotation tool for use with GO curation. Phenote, a Java-based curation tool originally developed for phenotype and trait annotation using ontologies, has a number of desirable features, such as auto-suggest and bulk annotation, that we felt would be advantageous for GO curation. After several months of development and testing, we have now fully transitioned from our previous web-based GO curation pipeline to Phenote-based curation.
(ii) Annotation of genes from other nematode species: In the early part of 2007, WormBase added InterPro2GO annotations for gene products from the related nematode species C. briggsae. As additional nematode genomes are sequenced and incorporated into WormBase, we will add InterPro2GO annotations to both WormBase and the GO Consortium. New species to be added include: C. remanei and P. pacificus.
D. New GO Display in WormBase We have revised the display of GO data on the gene summary pages of WormBase. Previously, all annotations were grouped by GO aspect and listed alphabetically. We now group annotations within each aspect according to one of three annotation methods: 1) manual annotation, 2) Phenotype2GO mappings, and 3) InterPro2GO mappings. In addition, the annotation details page is being revised to include more information about the supporting evidence (e.g., RNAi phenotype, variation, InterPro domain) for each annotation.