WormBase December 2013

From GO Wiki
Revision as of 19:14, 6 March 2020 by Pascale (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search



[please include FTEs working on GOC tasks designating as well how many FTEs funding by GOC NIHGRI grant]

Paul Sternberg, PI, WormBase, GO [8%; 0% funded by GOC]

Juancarlos Chan, Developer, WormBase [25%; 25% funded by GOC]

James Done, Developer, Textpresso [40%; 40% funded by GOC]

Ranjana Kishore, Curator [25%; 10% funded by GOC]

Yuling Li, Developer, Textpresso [30%; 20% funded by GOC]

Hans Michael Mueller, PI, Textpresso [75%; 50% funded by GOC]

Daniela Raciti, Curator [10%; 0% funded by GOC]

Kimberly Van Auken, Curator [100%; 75% funded by GOC]

* Funded entirely or partially by GO

Annotation Progress

WormBase GO Annotation Statistics as of December 2013

Table 1: Number of Genes Annotated

Type of Annotation Genes Annotated, Dec 2013 % Change from Dec 2012 Number of Unique GO Terms % Change from 2012 Total Number of Annotations % Change from Dec 2012
Manual Annotation 2,767 +6.4 2,324 +0.6% 13,739 +2.6%
Phenotype2GO Mappings 5,541 -13.7 118 +1.7% 40,703 -8.0%
IEA/Electronic 13,946 +1.5 1,644 +6.2% 46,424 -15.7%
Total 15,677 -4.9% 3,436 +3.8% 100,866 -5.4%

Methods and strategies for annotation

Literature curation:

Curation of the primary literature continues to be the major focus of our manual annotation efforts.

Semi-automated curation using the Textpresso information retrieval system

We also routinely employ the Textpresso information retrieval system for semi-automated curation of GO Cellular Component and Molecular Function annotations.

Computational annotation strategies:

Our computational annotation strategies include mapping genes to GO terms using InterPro domains and mapping genes to Biological Process terms based upon parallel annotations to the Worm Phenotype Ontology (Phenotype2GO). These methods are performed automatically as part of the WormBase database build.

Note that during the past year, we stopped using an automated pipeline that mapped genes to GO:0016021, integral to plasma membrane, based on the results of a transmembrane prediction algorithm, TMHMM, as these IEA annotations had no external database identifier for the With/From column and therefore were not consistent with GO annotation practices.

Curation strategies

Priorities for annotation

Selection of genes for annotation is guided by several criteria:

  • Publication of newly characterized genes
  • C. elegans genes orthologous to human disease genes
  • Genes identified in the Textpresso-based curation pipelines
  • Re-annotation of genes associated with now obsolete GO terms or new ontology developments
  • Annotation of gene sets involved in specific biological processes as part of a pilot project at WormBase to coordinate topic-based curation across all data types.
    • The first topic annotated in this manner was the endoplasmic reticulum unfolded protein response.

Presentations and Publications

a. Papers with substantial GO content

  • Balakrishnan R, Harris MA, Huntley R, Van Auken K, Cherry JM. A guide to best practices for Gene Ontology (GO) manual annotation. Database (Oxford). 2013 Jul 9;2013:bat054. doi: 10.1093/database/bat054. PMID:23842463

b. Presentations including Talks and Tutorials and Teaching

c. Poster presentations

Other Highlights:

A. Migration to UniProt's Protein2GO Tool

  • As part of the migration to a common annotation framework, WormBase completed its round-trip data migration from UniProt's Protein2GO annotation tool to the WormBase database.

B. Natural Language Processing Tools for GO Curation

  • WormBase completed development of a new Textpresso for Cellular Component Curation (CCC) tool that includes new features such as autocompletion of GO terms, mapping of gene names and synonyms in text to MOD and UniProtKB Ids, and enhanced search capabilities of sentence source files and annotations. Most importantly, the new CCC tool and Protein2GO are now fully integrated: annotations made with the CCC tool are automatically sent to Protein2GO via web services.
  • The Protein2GO tool now contains a Literature Search link that allows curators to perform keyword searches on nine different Textpresso corpi from within Protein2GO.

B. Ontology Development Contributions:

  • WB contributions to ontology development:
    • Cellular Component
      • lysosome-related organelle
      • gut granule
      • gut granule lumen
      • gut granule membrane
      • amino acid transport complex
    • Biological Process
      • receptor localization to nonmotile primary cilium
      • stress response to copper ion
      • stress response to cadmium ion
      • positive regulation of transcription from RNA polymerase II promoter in response to reactive oxygen species
      • positive regulation of transcription from RNA polymerase II promoter in response to superoxide
      • L-lysine tranport
      • L-arginine transport
      • L-histidine transport
      • dense core granule transport
      • early endosome to recycling endosome transport

C. Annotation Outreach and User Advocacy Efforts:

  • Ranjana Kishore - AmiGO2 working group
  • Kimberly Van Auken continues to serve on the GO-help rota.

D. Other Highlights:

  • WormBase GO Annotation Model - We have been working on a new GO annotation model for WormBase that will incorporate the annotation extensions we've been adding to our annotations. The new GO model should be incorporated into WormBase in early 2014.
  • BioCreative - WormBase participated in the BioCreative Track 4 task of identifying GO evidence sentences and GO annotations from the full text of publications. Using a GO Annotation Tool (GOAT) developed by the Textpresso team that allowed for highlighting sentences and associating GO annotations, a WormBase curator provided training and test data for the full text of 22 papers and then helped to perform error analysis on the results submitted by the participating teams. Other curation groups participating included FlyBase, MaizeDB, RGD, and TAIR.

Back to 2013 Progress Reports