WormBase December 2012
WormBase Summary, December 2012
WormBase has two main tasks. One is to work on Aim 4 (Common Annotation Framework), which is described under Software Group. The second is annotation. Most of our efforts over the past year were directed towards software development; remaining efforts were focused on annotation.
- Paul Sternberg, PI
- Juancarlos Chan, software developer
- Ranjana Kishore, curator
- Kimberly Van Auken, curator
- Hans Michael Mueller, PI
- James Done, software developer
- Yuling Li, software developer
WormBase GO Annotation Statistics as of December 2012
Table 1: Number of Genes Annotated
|Type of Annotation||Genes Annotated, Dec 2012||% Change from Dec 2011||Number of Unique GO Terms||% Change from 2011||Total Number of GO Terms||% Change from Dec 2011|
Methods and strategies for annotation
Curation of the primary literature continues to be the major focus of our manual annotation efforts.
Semi-automated curation using the Textpresso information retrieval system
We also routinely employ the Textpresso information retrieval system for semi-automated curation of GO Cellular Component and Molecular Function annotations.
Computational annotation strategies:
Our computational annotation strategies include mapping genes to GO terms using InterPro domains, and mapping genes to an 'integral to plasma membrane' term based on TMHMM profiling. These methods are performed automatically as part of the WormBase database build.
Priorities for annotation
Selection of genes for annotation was guided by several criteria:
- Publication of newly characterized genes
- C. elegans genes orthologous to human disease genes
- Genes identified in the Textpresso-based curation pipelines
- Re-annotation of genes associated with now obsolete GO terms
- Annotation of gene sets involved in specific biological processes as part of a pilot project on LEGO-style annotation
- Engulfment of apoptotic cell and phagosome maturation involved in apoptotic cell clearance
- Regulation of development, heterochronic
- Innate immune response
- Wnt receptor signaling pathway, regulating spindle orientation
- Determination of L/R asymmetry in nervous system
- Oocyte maturation
- Sex determination and dosage compensation
- Vulval development
Presentations and Publications
Van Auken K, Fey P, Berardini TZ, Dodson R, Cooper L, Li D, Chan J, Li Y, Basu S, Muller HM, Chisholm R, Huala E, Sternberg PW; the WormBase Consortium. Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR. Database (Oxford). 2012 Nov 17;2012(0):bas040. Print 2012. PubMed PMID: 23160413; PubMed Central PMCID: PMC3500519.
Van Auken K. Natural language processing (NLP) and Gene Ontology (GO) curation: using the Textpresso information system, Support Vector Machines (SVMs) and Hidden Markov Models (HMMs) to improve efficiency of manual GO curation. 5th International Biocuration Conference (BioCreative), Washington, D.C., USA, April 2-4, 2012.
Van Auken K. WormBase Literature Curation Workflow. BioCreative Workshop 2012, Washington, D.C., USA, April 4-5, 2012.
Van Auken K. Textpresso Text Mining: Semi-automated Curation of Protein Subcellular Localization Using the Gene Ontology’s Cellular Component Ontology. BioCreative Workshop 2012, Washington, D.C., USA, April 4-5, 2012.
Kimberely Van Auken is developing GO-CAT.
Ranjana Kishore particpated in the PAINT training workshop in December 2012.
Ranjana Kishore is working on evaluation of website usability.
Caltech hosted the GOC meeting in October 2012.
Back to Project Reports 2012