WormBase December 2012

From GO Wiki
Jump to navigation Jump to search

WormBase Summary, December 2012


WormBase has two main tasks. One is to work on Aim 4 (Common Annotation Framework), which is described under Software Group. The second is annotation. Most of our efforts over the past year were directed towards software development; remaining efforts were focused on annotation.



  • Paul Sternberg, PI
  • Juancarlos Chan, software developer
  • Ranjana Kishore, curator
  • Kimberly Van Auken, curator


  • Hans Michael Mueller, PI
  • James Done, software developer
  • Yuling Li, software developer

Annotation Progress

WormBase GO Annotation Statistics as of December 2012

Table 1: Number of Genes Annotated

Type of Annotation Genes Annotated, Dec 2012 % Change from Dec 2011 Number of Unique GO Terms % Change from 2011 Total Number of GO Terms % Change from Dec 2011
Manual Annotation 2,601 +9.93 2,310 +17.8 13392 +16.9
Phenotype2GO Mappings 6,423 +3.1 116 +0.87 44,256 +2.9
IEA/Electronic 13,739 +1.0 1,548 +3.4 55,091 -6.1
Total 16,493 +1.4 3,420 +4.8 106,604 +2.6

Methods and strategies for annotation

Curation methods

Literature curation:

Curation of the primary literature continues to be the major focus of our manual annotation efforts.

Semi-automated curation using the Textpresso information retrieval system

We also routinely employ the Textpresso information retrieval system for semi-automated curation of GO Cellular Component and Molecular Function annotations.

Computational annotation strategies:

Our computational annotation strategies include mapping genes to GO terms using InterPro domains, and mapping genes to an 'integral to plasma membrane' term based on TMHMM profiling. These methods are performed automatically as part of the WormBase database build.

Curation strategies

Priorities for annotation

Selection of genes for annotation was guided by several criteria:

  • Publication of newly characterized genes
  • C. elegans genes orthologous to human disease genes
  • Genes identified in the Textpresso-based curation pipelines
  • Re-annotation of genes associated with now obsolete GO terms
  • Annotation of gene sets involved in specific biological processes as part of a pilot project on LEGO-style annotation
    • Engulfment of apoptotic cell and phagosome maturation involved in apoptotic cell clearance
    • Regulation of development, heterochronic
    • Innate immune response
    • Wnt receptor signaling pathway, regulating spindle orientation
    • Determination of L/R asymmetry in nervous system
    • Oocyte maturation
    • Sex determination and dosage compensation
    • Vulval development

Presentations and Publications


Van Auken K, Fey P, Berardini TZ, Dodson R, Cooper L, Li D, Chan J, Li Y, Basu S, Muller HM, Chisholm R, Huala E, Sternberg PW; the WormBase Consortium. Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR. Database (Oxford). 2012 Nov 17;2012(0):bas040. Print 2012. PubMed PMID: 23160413; PubMed Central PMCID: PMC3500519.


Van Auken K. Natural language processing (NLP) and Gene Ontology (GO) curation: using the Textpresso information system, Support Vector Machines (SVMs) and Hidden Markov Models (HMMs) to improve efficiency of manual GO curation. 5th International Biocuration Conference (BioCreative), Washington, DC, USA, April 2-4, 2012.

Kimberely Van Auken is developing GO-CAT.

Ranjana Kishore particpated in the PAINT training workshop in December 2012.

Ranjana Kishore is working on evaluation of website usability.

Caltech hosted the GOC meeting in October 2012.

Back to Project Reports 2012