WormBase December 2016: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 219: Line 219:


== Annotation Outreach and User Advocacy Efforts ==
== Annotation Outreach and User Advocacy Efforts ==
* Kimberly Van Auken continues to serve on the GO-help rota.
*Kimberly Van Auken continues to serve on the GO-help rota.
*Kimberly Van Auken served on the Data Capture Working Group.


== Annotation Advocacy ==
== Annotation Advocacy ==

Revision as of 15:50, 20 December 2016

The data is currently for 2015. Report is in progress.

Overview:

Staff

Person, Group [Effort, Funding]

Paul Sternberg, PI, WormBase, GO [8%; 0% funded by GOC]

Juancarlos Chan, Developer, WormBase [25%; 25% funded by GOC]

Sibyl Gao, Developer, WormBase [5%; 0% funded by GOC]

Kevin Howe, Project Lead, WormBase - EBI [5%; 0% funded by GOC]

Raymond Lee, Curator, WormBase [10%; 0% funded by GOC]

Yuling Li, Developer, Textpresso [30%; 25% funded by GOC]

Jane Lomax, Curator, WormBase ParaSite [10%; 0% funded by GOC]

Hans Michael Mueller, Project Lead, Textpresso [75%; 50% funded by GOC]

Daniela Raciti, Curator [10%; 0% funded by GOC]

Kimberly Van Auken, Curator, Co-Manager, Annotation Working Group [100%; 75% funded by GOC]

Annotation Progress

WormBase GO Annotation Statistics as of December 20, 2016

Manual annotation statistics are summarized in Tables 1 - 3.

Total number of unique manual annotations: 42747 (+8.8% from 2015)

Total number of genes with manual annotations: 7596 (+12.3% from 2015)

Table 1: Summary of C. elegans Manual Biological Process Annotations

Numbers refer to total number of annotations; annotations in parentheses represent annotations with extensions.

Annotation Group IMP IGI IDA ISS TAS IEP IPI IC NAS ISM ND IBA IRD
WormBase 7623 (426) 3141 (90) 1106 (24) 327 (1) 109 292 (56) 51 52 (10) 32 2 2 0 0
UniProt 1530 (552) 976 (390) 165 (15) 197 26 (3) 14 2 (2) 5 104 0 65 0 0
CACAO 20 1 3 0 0 0 0 0 0 0 0 0 0
BHF-UCL 11 0 0 2 0 4 0 0 0 0 0 0 0
MGI 4 0 6 0 0 0 0 0 0 0 0 0 0
HGNC 0 0 0 4 0 0 0 0 0 0 0 0 0
GO_Central 2 0 0 4 0 0 0 0 0 0 0 7945 1
ParkinsonsUK-UCL 10 (4) 6 (3) 11 2 (1) 0 0 0 0 0 0 0 0 0
Totals 9200 (982) 4124 (483) 1291 (39) 536 (2) 135 (3) 310 (56) 53 (2) 57 (10) 136 2 67 7945 1


Table 2: Summary of C. elegans Molecular Function Annotations

Numbers refer to total number of annotations; annotations in parentheses represent annotations with extensions.

Annotation Group IMP IGI IDA ISS TAS IPI IC NAS ISM ND IBA ISO IKR IRD
WormBase 161 (11) 32 1688 (209) 647 (4) 45 1348 (5) 21 (1) 4 3 73 0 2 0 0
IntAct 0 0 0 0 0 2085 (52) 0 0 0 0 0 0 0 0
UniProt 57 (2) 17 139 (3) 194 23 (1) 321 (3) 4 51 0 126 0 0 0 0
CACAO 1 0 7 0 0 0 0 0 0 0 0 0 0 0
GO_Central 0 0 0 0 0 0 0 0 0 0 6538 0 1 1
HGNC 0 0 0 2 0 0 0 0 0 0 0 0 0 0
ParkinsonsUK-UCL 0 0 0 0 0 2 (2) 0 0 0 0 0 0 0 0
Totals 219 (13) 49 1834 (212) 843 (4) 68 (1) 3754 (60) 25 (4) 55 4 199 6538 2 1 1


Table 3: Summary of C. elegans Cellular Component Annotations

Numbers refer to total number of annotations; annotations in parentheses represent annotations with extensions.

Annotation Group IMP IGI IDA ISS TAS IPI IC NAS ISM ND IBA
WormBase 9 0 5863 (784) 382 27 141 (3) 50 7 4 10 0
GO_Central 0 0 0 0 0 0 0 0 0 0 6326
UniProt 29 (10) 1 379 (73) 203 18 0 19 50 0 118 0
MGI 0 0 16 0 0 0 0 0 0 0 0
HGNC 0 0 0 8 0 0 0 0 0 0 0
BHF-UCL 0 0 7 0 0 0 0 0 0 0 0
CACAO 0 0 3 0 0 0 0 0 0 0 0
Totals 38 (10) 1 6268 (857) 593 45 141 (3) 69 57 4 128 6326


Table 4: Summary of C. elegans Computational Annotations

Summary Statistics Based on WormBase Release WS256

Genes Stats:

 Genes with GO_term connections  15047 
   Non-IEA-only annotation              640
   IEA-only annotation                 7830
   Both IEA and non-IEA annotations    6577

GO_term Stats:

 Distinct GO_terms connected to Genes   5679
   Associated by non-IEA only               3123
   Associated by IEA only                    825
   Associated by both IEA and non-IEA       1731
Type of Annotation IEA
Phenotype2GO Mappings - WormBase 37,714
IEA/InterPro2GO - WormBase 22,660

Methods and strategies for annotation

Curation methods

Literature curation

Curation of the primary literature continues to be the major focus of our manual annotation efforts.

Over the past year, WormBase curation efforts were focused largely on developing preliminary pathway models using the Noctua curation tool. To this end, literature curation involved reviewing C. elegans pathways, the biological entities that participate in those pathways, and the annotations, particularly Molecular Function annotations, associated with those entities. Pathways reviewed include apoptosis, asymmetric cell division, defense response, insulin signaling, neuronal cell fate specification, mRNA decay, semaphorin/plexin signaling, thermosensory transduction, and TOR signaling.

Curation using the Textpresso information retrieval system

We also employ the Textpresso information retrieval system for curation of GO Cellular Component and Molecular Function annotations.

Computational annotation strategies

Our computational annotation strategies include mapping genes to GO terms using InterPro domains performed as part of the WormBase build cycle, as well as computational predictions made via the UniProtKB pipeline, including keyword mappings and UniRule mapping. Also as part of the WormBase build cycle, we map genes to Biological Process terms based upon mappings between terms in the Worm Phenotype Ontology (WPO).

Curation strategies

Priorities for annotation

Selection of genes for annotation is guided by several criteria:

  • Annotation of gene sets involved in specific biological processes as part of the LEGO working group
  • Genes identified in Textpresso-based curation pipelines, for example genes described in papers flagged by an SVM (Support Vector Machine) classification algorithm having a high confidence of reporting Molecular Function experiments such as enzymatic assays
  • Re-annotation of genes affected by changes to the ontology, e.g. cilia biology, ubiquitination, enzyme regulator activities, and obsoleted annotation extensions
  • Publication of newly characterized genes for which no previous biological data was available

Presentations and Publications

Papers with substantial GO content

  • Expansion of the Gene Ontology knowledgebase and resources. Gene Ontology Consortium. Nucleic Acids Research (2016) pii:gkw1108. PMID:27899567
  • Guidelines for the functional annotation of microRNAs using the Gene Ontology. Huntley RP, Sitnikov D, Orlic-Milacic M, Balakrishnan R, D'Eustachio P, Gillespie ME, Howe D, Kalea AZ, Maegdefessel L, Osumi-Sutherland D, Petri V, Smith JR, Van Auken K, Wood V, Zampetaki A, Mayr M, Lovering RC. RNA. 2016 May;22(5):667-76. doi:10.1261/rna.055301.115. PMID:26917558.

Presentations including Talks and Tutorials and Teaching

  • TextpressoCentral: A System for Integrating Full Text Literature Curation with Diverse Curation Platforms including the Gene Ontology Consortium's Common Annotation Framework - Kimberly Van Auken, Yuling Li, Seth Carbon, Christopher Mungall, Suzanna Lewis, Hans-Michael Muller and Paul Sternberg

Poster presentations

  • TextpressoCentral: A System for Integrating Full Text Literature Curation with Diverse Curation Platforms including the Gene Ontology Consortium's Common Annotation Framework. Kimberly Van Auken, Yuling Li, Seth Carbon, Christopher Mungall, Suzanna Lewis, Hans-Michael Muller and Paul Sternberg. ISB 2016 Geneva, Switzerland. https://www.sib.swiss/events/biocuration2016/oral-presentations

Other Highlights

Annotation Outreach and User Advocacy Efforts

  • Kimberly Van Auken continues to serve on the GO-help rota.
  • Kimberly Van Auken served on the Data Capture Working Group.

Annotation Advocacy

  • Kimberly Van Auken and David Hill (MGI) continue to serve as Annotation Working Group Co-Managers.
  • Kimberly Van Auken continued to participate in the LEGO working group as an alpha tester of the Noctua software and helped to train GO curators in LEGO curation and the Noctua annotation tool at the Geneva LEGO workshop (April, 2016), an MGI workshop (June 2016), an EBI workshop (September 2016), and the USC workshop (November 2016).

Text Mining and Textpresso Central

  • Monica McAndrews (MGI), Kimberly Van Auken, Hans-Michael Mueller, and Yuling Li (thru August 2016) are collaborating on a document classification pipeline to help MGI identify papers suitable for curation. Using training and testing papers supplied by MGI, we have developed an SVM classifier to distinguish mouse from non-mouse papers. We are beginning steps to put this pipeline into production.
  • Hans-Michael Muller, Kimberly Van Auken, and Seth Carbon continued development of the TextpressoCentral (TPC) curation system and its integration with the Noctua annotation tool. TPC enables curators to perform full text literature searches, view the search results in the context of the paper, annotate text, and send those annotations to an external database. Over the past year, we have worked on developing a curation interface for GO annotation, as well as the protocol for communication between TPC and Noctua

Back to http://wiki.geneontology.org/index.php/Progress_Reports