WormBase December 2015: Difference between revisions
(10 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
= Staff = | = Staff = | ||
Person, Group [Effort, Funding] | |||
Paul Sternberg, PI, WormBase, GO [8%; 0% funded by GOC] | Paul Sternberg, PI, WormBase, GO [8%; 0% funded by GOC] | ||
Line 7: | Line 9: | ||
Juancarlos Chan, Developer, WormBase [25%; 25% funded by GOC] | Juancarlos Chan, Developer, WormBase [25%; 25% funded by GOC] | ||
James Done, Developer, Textpresso [ | James Done, Developer, Textpresso [5%; 0% funded by GOC] | ||
Sibyl Gao, Developer, WormBase [5%; 0% funded by GOC] | |||
Kevin Howe, Project Manager, WormBase - EBI [ | Kevin Howe, Project Manager, WormBase - EBI [5; 0% funded by GOC] | ||
Ranjana Kishore, Curator [ | Ranjana Kishore, Curator [10%; 0% funded by GOC] | ||
Yuling Li, Developer, Textpresso [30%; | Yuling Li, Developer, Textpresso [30%; 25% funded by GOC] | ||
Jane Lomax, Curator, WormBase ParaSite [ | Jane Lomax, Curator, WormBase ParaSite [10%; 0% funded by GOC] | ||
Hans Michael Mueller, PI, Textpresso [75%; 50% funded by GOC] | Hans Michael Mueller, PI, Textpresso [75%; 50% funded by GOC] | ||
Line 152: | Line 156: | ||
Based on WormBase Release WS250 | Based on WormBase Release WS250 | ||
Total number of genes with Phenotype2GO-based Annotation: 6, | Total number of genes with Phenotype2GO-based Annotation: 5,628 | ||
Total number of genes with IEA-based Annotation: 11,342 | |||
Total number of genes with only IEA-based Annotation: 6,075 | |||
Total number of genes with | Total number of genes with only non-IEA-based Annotation: 1,182 | ||
{| class="wikitable" style="text-align:center" | {| class="wikitable" style="text-align:center" | ||
Line 162: | Line 170: | ||
|- | |- | ||
!Phenotype2GO Mappings - WormBase | !Phenotype2GO Mappings - WormBase | ||
| | | 36,708 | ||
|- | |- | ||
!IEA/InterPro2GO - WormBase | !IEA/InterPro2GO - WormBase | ||
| | | 25,011 | ||
|- | |- | ||
|} | |} | ||
Line 222: | Line 230: | ||
== Ontology Development Contributions == | == Ontology Development Contributions == | ||
*Ontology Contributions in 2015: | *Ontology Contributions and Discussions in 2015: | ||
**amino acid transport and transporter terms | **amino acid transport and transporter terms | ||
**ascaroside binding | **ascaroside binding | ||
**chitin-based cuticle extracellular matrix | |||
**hemidesmosome | |||
**modulation of age-related behavioral decline | **modulation of age-related behavioral decline | ||
**posttranscriptional regulation of synapse organization | **posttranscriptional regulation of synapse organization | ||
**numerous TermGenie requests | |||
== Annotation Outreach and User Advocacy Efforts == | == Annotation Outreach and User Advocacy Efforts == | ||
Line 233: | Line 244: | ||
== Annotation Advocacy == | == Annotation Advocacy == | ||
* Kimberly Van Auken participated in the LEGO working group as an alpha tester of the Noctua software. | * Kimberly Van Auken participated in the LEGO working group as an alpha tester of the Noctua software and participated in the Geneva LEGO workshop, December 8-10, 2015. | ||
* Kimberly Van Auken and David Hill are now Annotation Advocacy Co-Managers. | * Starting in October, 2015, Kimberly Van Auken and David Hill (MGI) are now Annotation Advocacy Co-Managers. | ||
== Other Highlights == | == Other Highlights == |
Latest revision as of 15:36, 15 December 2015
Overview:
Staff
Person, Group [Effort, Funding]
Paul Sternberg, PI, WormBase, GO [8%; 0% funded by GOC]
Juancarlos Chan, Developer, WormBase [25%; 25% funded by GOC]
James Done, Developer, Textpresso [5%; 0% funded by GOC]
Sibyl Gao, Developer, WormBase [5%; 0% funded by GOC]
Kevin Howe, Project Manager, WormBase - EBI [5; 0% funded by GOC]
Ranjana Kishore, Curator [10%; 0% funded by GOC]
Yuling Li, Developer, Textpresso [30%; 25% funded by GOC]
Jane Lomax, Curator, WormBase ParaSite [10%; 0% funded by GOC]
Hans Michael Mueller, PI, Textpresso [75%; 50% funded by GOC]
Daniela Raciti, Curator [10%; 0% funded by GOC]
Kimberly Van Auken, Curator [100%; 75% funded by GOC]
Annotation Progress
WormBase GO Annotation Statistics as of December 1, 2015
Manual annotation statistics are summarized in Tables 1 - 3.
Total number of unique manual annotations: 39290 (+43.3% from 2014)
Total number of genes with manual annotations: 6762 (+44.2% from 2014)
Table 1: Summary of C. elegans Manual Biological Process Annotations
Numbers refer to total number of annotations; annotations in parentheses represent annotations with extensions.
Annotation Group | IMP | IGI | IDA | ISS | TAS | IEP | IPI | IC | NAS | ISM | ND | IBA |
---|---|---|---|---|---|---|---|---|---|---|---|---|
WormBase | 7649 (367) | 3104 (78) | 1105 (23) | 324 (1) | 111 | 285 (58) | 51 | 51 (10) | 32 | 2 | 3 | 0 |
UniProt | 801 (229) | 155 (94) | 126 (9) | 190 | 25 (2) | 14 | 2 (2) | 5 | 104 | 0 | 65 | 0 |
CACAO | 18 | 1 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
GOC | 59 | 14 | 261 | 122 | 17 | 0 | 4 | 2 | 10 | 0 | 0 | 379 |
BHF-UCL | 11 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 |
MGI | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
HGNC | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
GO_Central | 2 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8119 |
ParkinsonsUK-UCL | 10 (4) | 5 (2) | 9 | 2 (1) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Totals | 8555 (600) | 3279 (174) | 1504 (32) | 646 (2) | 153 (2) | 303 (58) | 57 (2) | 58 (10) | 146 | 2 | 68 | 8498 |
Table 2: Summary of C. elegans Molecular Function Annotations
Numbers refer to total number of annotations; annotations in parentheses represent annotations with extensions.
Annotation Group | IMP | IGI | IDA | ISS | TAS | IPI | IC | NAS | ISM | ND | IBA | ISO |
---|---|---|---|---|---|---|---|---|---|---|---|---|
WormBase | 152 (6) | 33 | 1658 (188) | 650 (1) | 47 | 1298 (2) | 15 | 5 | 4 | 63 | 0 | 2 |
IntAct | 0 | 0 | 0 | 0 | 0 | 1989 (52) | 0 | 0 | 0 | 0 | 0 | 0 |
UniProt | 37 (1) | 2 | 119 (2) | 181 | 21 | 272 (3) | 4 | 51 | 0 | 126 | 0 | 0 |
CACAO | 1 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
GO_Central | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5531 | 0 |
HGNC | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ParkinsonsUK-UCL | 0 | 0 | 4 | 0 | 0 | 2 (2) | 0 | 0 | 0 | 0 | 0 | 0 |
Totals | 190 (7) | 35 | 1784 (190) | 833 (1) | 68 | 3561 (59) | 19 | 56 | 4 | 189 | 5531 | 2 |
Table 3: Summary of C. elegans Cellular Component Annotations
Numbers refer to total number of annotations; annotations in parentheses represent annotations with extensions.
Annotation Group | IMP | IGI | IDA | ISS | TAS | IPI | IC | NAS | ISM | ND | IBA |
---|---|---|---|---|---|---|---|---|---|---|---|
WormBase | 9 (3) | 0 | 5818 (738) | 345 | 26 | 140 | 46 | 6 | 4 | 8 | 0 |
GO_Central | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4927 |
UniProt | 24 (9) | 1 | 246 (31) | 196 | 16 | 0 | 19 | 50 | 0 | 118 | 0 |
GOC | 36 | 11 | 22 | 6 | 0 | 0 | 3 | 0 | 0 | 0 | 47 |
MGI | 0 | 0 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
HGNC | 0 | 0 | 0 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
BHF-UCL | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Reactome | 0 | 0 | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 0 |
CACAO | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Totals | 69 (12) | 12 | 6110 (769) | 555 | 47 | 140 | 68 | 56 | 4 | 126 | 4974 |
Table 4: Summary of C. elegans Computational Annotations
Based on WormBase Release WS250
Total number of genes with Phenotype2GO-based Annotation: 5,628
Total number of genes with IEA-based Annotation: 11,342
Total number of genes with only IEA-based Annotation: 6,075
Total number of genes with only non-IEA-based Annotation: 1,182
Type of Annotation | IEA |
---|---|
Phenotype2GO Mappings - WormBase | 36,708 |
IEA/InterPro2GO - WormBase | 25,011 |
Methods and strategies for annotation
Curation methods
Literature curation
Curation of the primary literature continues to be the major focus of our manual annotation efforts.
Over the past year, WormBase has begun a topic-based approach to curation in which curators focus curation efforts on one or more biological topics, or processes, for each release cycle. Topics over the past year have included the endoplasmic reticulum and mitochondrial unfolded protein responses, innate immunity and defense response, and Wnt signaling pathways (see below).
Semi-automated curation using the Textpresso information retrieval system
We also employ the Textpresso information retrieval system for semi-automated curation of GO Cellular Component and Molecular Function annotations.
Computational annotation strategies
Our computational annotation strategies include mapping genes to GO terms using InterPro domains performed as part of the WormBase build cycle, as well as computational predictions made via the UniProtKB pipeline, including keyword mappings and UniRule mapping. Also as part of the WormBase build cycle, we map genes to Biological Process terms based upon mappings between terms in the Worm Phenotype Ontology (WPO). Beginning with the WS246 WormBase release, these Phenotype2GO-based annotations include phenotypes based upon genetic variations as well as RNAi experiments. Results from automated methods are generated anew with each WormBase database build to reflect any changes in the underlying reference genome sequence and/or gene models.
Curation strategies
Priorities for annotation
Selection of genes for annotation is guided by several criteria:
- Annotation of gene sets involved in specific biological processes as part of the LEGO working group and WormBase's coordinated topic-based approach to curation
- Topics annotated to date:
- Unfolded Protein Response (ER and mitochondrial)
- innate immune response
- defense response to pathogen (fungal as well as Gram-negative and Gram-positive bacteria)
- Wnt signaling
- RNAi-mediated behavioral response to odor
- anchor cell invasion (in progress)
- Topics annotated to date:
- Genes identified in Textpresso-based curation pipelines
- Re-annotation of genes affected by changes to the ontology, e.g. cilia biology, ubiquitination, enzyme regulator activities
- Publication of newly characterized genes
Presentations and Publications
Papers with substantial GO content
- Gene Ontology Consortium: going forward. Gene Ontology Consortium. Nucleic Acids Research 2015 Jan;43(Database issue):D1049-56. doi: 10.1093/nar/gku1179, PMID:25428369
Presentations including Talks and Tutorials and Teaching
- Kimberly van Auken: Gene Ontology (GO): Finding GO annotations and performing enrichment analysis. 2015 International C. elegans Meeting, UCLA, Los Angeles, CA, June 25 and 27, 2015.
Poster presentations
- Textpresso Central: A System for Integratng Full Text Literature Curation with Diverse Curation Platforms. Kimberly Van Auken, Yuling Li, Hans-Michael Muller, and Paul Sternberg. BioCreative Workshop V, September 9-11, 2015. cicCartuja Research Center, Seville, Spain.
Other Highlights
Ontology Development Contributions
- Ontology Contributions and Discussions in 2015:
- amino acid transport and transporter terms
- ascaroside binding
- chitin-based cuticle extracellular matrix
- hemidesmosome
- modulation of age-related behavioral decline
- posttranscriptional regulation of synapse organization
- numerous TermGenie requests
Annotation Outreach and User Advocacy Efforts
- Kimberly Van Auken continues to serve on the GO-help rota.
- Kimberly Van Auken and Dmitry Snitnikov (MGI) are working with a group at Peking University to incorproate human lncRNA annotations into the GOC.
Annotation Advocacy
- Kimberly Van Auken participated in the LEGO working group as an alpha tester of the Noctua software and participated in the Geneva LEGO workshop, December 8-10, 2015.
- Starting in October, 2015, Kimberly Van Auken and David Hill (MGI) are now Annotation Advocacy Co-Managers.
Other Highlights
WormBase Data Models and Software
- WormBase GO Annotation Model - Starting with WS248, we have incorporated a new GO annotation model into WormBase. The model allows for full incorporation of annotation extension data into WormBase, as well as additional annotation details and new IEA annotations from the UniProt-GOA group.
- WormBase GO Annotation Display - To support the new GO annotation model, we revised the GO annotation web display on WB gene pages. The web display now has two views that users can select: Summary and View. The summary view allows users to see the GO ID, GO term, and annotation extension. The full view additionally provides the evidence code, reference, contributor, and supporting evidence in the With/From column of the gene association file.
Text Mining and Textpresso Central
- Monica McAndrews (MGI), Kimberly Van Auken, and Yuling Li are collaborating on a document classification pipeline to help MGI identify papers suitable for curation. Using training and testing papers supplied by MGI, we have developed an SVM classifier to distinguish mouse from non-mouse papers. We are beginning steps to put this pipeline into production.
- Hans-Michael Muller, Yuling Li, and Kimberly Van Auken have developed the Textpresso Central that enables curators to perform full text literature searches and then view the search results in the context of the paper, annotate text, and send those annotations to the Protein2GO tool hosted by the UniProt group at the EBI. The system is designed with the intent to empower the user to perform as many operations on a literature corpus or a particular paper as possible. It uses state-of-the-art software packages and frameworks such as the Unstructured Information Management Architecture (http://uima.apache.org), Lucene (http://lucene.apache.org), and Wt (http://www.webtoolkit.eu/wt). The corpus of papers can be build from fulltextarticles that are available in PDF format (http://en.wikipedia.org/wiki/Portable\_Document\_Format) or NXML (http://dtd.nlm.nih.gov/). An extension for articles published in HTML (http://en.wikipedia.org/wiki/HTML) is planned.
Back to http://wiki.geneontology.org/index.php/Progress_Reports