WormBase December 2012: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 93: Line 93:


Van Auken K, Fey P, Berardini TZ, Dodson R, Cooper L, Li D, Chan J, Li Y, Basu S, Muller HM, Chisholm R, Huala E, Sternberg PW; the WormBase Consortium. Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR. Database (Oxford). 2012 Nov 17;2012(0):bas040. Print 2012. PubMed PMID: 23160413; PubMed Central PMCID: PMC3500519.
Van Auken K, Fey P, Berardini TZ, Dodson R, Cooper L, Li D, Chan J, Li Y, Basu S, Muller HM, Chisholm R, Huala E, Sternberg PW; the WormBase Consortium. Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR. Database (Oxford). 2012 Nov 17;2012(0):bas040. Print 2012. PubMed PMID: 23160413; PubMed Central PMCID: PMC3500519.
Fang R, Schindelman G, Van Auken K, Fernandes J, Chen W, Wang X, Davis P, Tuli MA, Marygold SJ, Millburn G, Matthews B, Zhang H, Brown N, Gelbart WM, Sternberg PW. Automatic categorization of diverse experimental information in the bioscience literature. BMC Bioinformatics 2012;13:16. PubMed PMID: 22280404; PubMed Central PMCID: PMC3305665.


==Presentations==
==Presentations==

Revision as of 17:26, 13 December 2012

WormBase Summary, December 2012

Overview

WormBase has two main tasks. One is to work on Aim 4 (Common Annotation Framework), which is described under Software Group. The second is annotation. Most of our efforts over the past year were directed towards software development; remaining efforts were focused on annotation.

Staff:

WormBase

  • Paul Sternberg, PI
  • Juancarlos Chan, software developer
  • Ranjana Kishore, curator
  • Kimberly Van Auken, curator

Textpresso

  • Hans Michael Mueller, PI
  • James Done, software developer
  • Yuling Li, software developer


Annotation Progress

WormBase GO Annotation Statistics as of December 2012

Table 1: Number of Genes Annotated

Type of Annotation Genes Annotated, Dec 2012 % Change from Dec 2011 Number of Unique GO Terms % Change from 2011 Total Number of GO Terms % Change from Dec 2011
Manual Annotation 2,601 +9.93 2,310 +17.8 13392 +16.9
Phenotype2GO Mappings 6,423 +3.1 116 +0.87 44,256 +2.9
IEA/Electronic 13,739 +1.0 1,548 +3.4 55,091 -6.1
Total 16,493 +1.4 3,420 +4.8 106,604 +2.6


Methods and strategies for annotation

Curation methods

Literature curation:

Curation of the primary literature continues to be the major focus of our manual annotation efforts.

Semi-automated curation using the Textpresso information retrieval system

We also routinely employ the Textpresso information retrieval system for semi-automated curation of GO Cellular Component and Molecular Function annotations.

Computational annotation strategies:

Our computational annotation strategies include mapping genes to GO terms using InterPro domains, and mapping genes to an 'integral to plasma membrane' term based on TMHMM profiling. These methods are performed automatically as part of the WormBase database build.

Curation strategies

Priorities for annotation

Selection of genes for annotation was guided by several criteria:

  • Publication of newly characterized genes
  • C. elegans genes orthologous to human disease genes
  • Genes identified in the Textpresso-based curation pipelines
  • Re-annotation of genes associated with now obsolete GO terms
  • Annotation of gene sets involved in specific biological processes as part of a pilot project on LEGO-style annotation
    • Engulfment of apoptotic cell and phagosome maturation involved in apoptotic cell clearance
    • Regulation of development, heterochronic
    • Innate immune response
    • Wnt receptor signaling pathway, regulating spindle orientation
    • Determination of L/R asymmetry in nervous system
    • Oocyte maturation
    • Sex determination and dosage compensation
    • Vulval development


Presentations and Publications

Publications

Van Auken K, Fey P, Berardini TZ, Dodson R, Cooper L, Li D, Chan J, Li Y, Basu S, Muller HM, Chisholm R, Huala E, Sternberg PW; the WormBase Consortium. Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR. Database (Oxford). 2012 Nov 17;2012(0):bas040. Print 2012. PubMed PMID: 23160413; PubMed Central PMCID: PMC3500519.

Fang R, Schindelman G, Van Auken K, Fernandes J, Chen W, Wang X, Davis P, Tuli MA, Marygold SJ, Millburn G, Matthews B, Zhang H, Brown N, Gelbart WM, Sternberg PW. Automatic categorization of diverse experimental information in the bioscience literature. BMC Bioinformatics 2012;13:16. PubMed PMID: 22280404; PubMed Central PMCID: PMC3305665.

Presentations

Van Auken K. Natural language processing (NLP) and Gene Ontology (GO) curation: using the Textpresso information system, Support Vector Machines (SVMs) and Hidden Markov Models (HMMs) to improve efficiency of manual GO curation. 5th International Biocuration Conference (BioCreative), Washington, D.C., USA, April 2-4, 2012.

Van Auken K. WormBase Literature Curation Workflow. BioCreative Workshop 2012, Washington, D.C., USA, April 4-5, 2012.

Van Auken K. Textpresso Text Mining: Semi-automated Curation of Protein Subcellular Localization Using the Gene Ontology’s Cellular Component Ontology. BioCreative Workshop 2012, Washington, D.C., USA, April 4-5, 2012.


Ontology Development Contributions

WormBase continues to contribute to ontology development through creation and definition of new GO terms as well as revision of existing terms.

New term requests included:

  • nematode larval development, heterochronic
  • regulation of nematode larval development, heterochronic
  • transforming growth factor receptor signaling pathway involved in multicellular organism growth
  • insulin receptor signaling pathway involved in determination of adult lifespan
  • positive, negative regulation of oviposition
  • pairing center
  • muscle projection, muscle projection membrane (narrow synonyms: myopodia, muscle arm)
  • regulation of synaptic plasticity by receptor localization to synapse
  • regulation of basement membrane organization
  • regulation of RNA interference
  • regulation, positive, negative of oocyte maturation
  • incorrect InterPro2GO mapping for IPR003131
  • dishabituation
  • double-stranded DNA-dependent ATPase activity
  • new representation of tail tip morphogenesis
  • synonym for nuclear inner membrane
  • change definition of apical junction complex
  • regulation, positive, negative of serine-type endopeptidase activity
  • regulation, positive, negative of neuromuscular synaptic transmission


Annotation Outreach and User Advocacy Efforts

Ranjana Kishore is working on evaluation of website usability.

Kimberly Van Auken serves on the GO helpdesk rotation.


Other Highlights

Common Annotation Tool (GO-CAT)

Kimberly Van Auken is developing GO-CAT.

In June, Caltech hosted a small, working group meeting consisting of Suzi Lewis, Chris Mungall, and Heiko Dietze (LBL), Tony Sawford (EBI), Paul Thomas (USC), and Paul Sternberg, Hans Michael Mueller, Kimberly Van Auken, and Juancarlos Chan (Caltech). The meeting discussed the feasibility of using the Protein2GO annotation tool as a common annotation tool and strategies for migration of curation groups to Protein2GO. Other work from this meeting focused on implementing additional validation checks in the Protein2GO tool and modeling LEGO-style annotations.

Please see the Software Working Group report for a more detailed discussion of the CAT development.

PAINT

Ranjana Kishore particpated in the PAINT training workshop at Stanford in December 2012.

Gene Ontology Consortium Meeting

Caltech hosted the GOC meeting in October 2012.


Back to Project Reports 2012