WormBase December 2016: Difference between revisions
No edit summary |
|||
(10 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Overview: | Overview: | ||
Line 17: | Line 15: | ||
Raymond Lee, Curator, WormBase [10%; 0% funded by GOC] | Raymond Lee, Curator, WormBase [10%; 0% funded by GOC] | ||
Yuling Li, Developer, Textpresso [30%; 25% funded by GOC] | Yuling Li (thru August 2016), Developer, Textpresso [30%; 25% funded by GOC] | ||
Jane Lomax, Curator, WormBase ParaSite [10%; 0% funded by GOC] | Jane Lomax (thru July 2016), Curator, WormBase ParaSite [10%; 0% funded by GOC] | ||
Hans Michael Mueller, Project Lead, Textpresso [75%; 50% funded by GOC] | Hans Michael Mueller, Project Lead, Textpresso [75%; 50% funded by GOC] | ||
Line 207: | Line 205: | ||
= Presentations and Publications = | = Presentations and Publications = | ||
==Papers with substantial GO content== | ==Papers with substantial GO content== | ||
*Gene Ontology | *Expansion of the Gene Ontology knowledgebase and resources. Gene Ontology Consortium. Nucleic Acids Research (2016) pii:gkw1108. PMID:27899567 | ||
*Guidelines for the functional annotation of microRNAs using the Gene Ontology. Huntley RP, Sitnikov D, Orlic-Milacic M, Balakrishnan R, D'Eustachio P, Gillespie ME, Howe D, Kalea AZ, Maegdefessel L, Osumi-Sutherland D, Petri V, Smith JR, '''Van Auken K''', Wood V, Zampetaki A, Mayr M, Lovering RC. RNA. 2016 May;22(5):667-76. doi:10.1261/rna.055301.115. PMID:26917558. | |||
== Presentations including Talks and Tutorials and Teaching == | == Presentations including Talks and Tutorials and Teaching == | ||
* | *TextpressoCentral: A System for Integrating Full Text Literature Curation with Diverse Curation Platforms including the Gene Ontology Consortium's Common Annotation Framework. '''Kimberly Van Auken''', Yuling Li, Seth Carbon, Christopher Mungall, Suzanna Lewis, Hans-Michael Muller and Paul Sternberg. ISB 2016 Geneva, Switzerland. https://www.sib.swiss/events/biocuration2016/oral-presentations | ||
=Other Highlights= | =Other Highlights= | ||
== Annotation Outreach and User Advocacy Efforts == | == Annotation Outreach and User Advocacy Efforts == | ||
* Kimberly Van Auken continues to serve on the GO-help rota. | *Kimberly Van Auken continues to serve on the GO-help rota. | ||
* Kimberly Van Auken | *Kimberly Van Auken served on the Data Capture Working Group. | ||
== Annotation Advocacy == | == Annotation Advocacy == | ||
* Kimberly Van Auken | * Kimberly Van Auken and David Hill (MGI) continue to serve as Annotation Working Group Co-Managers. | ||
* Kimberly Van Auken continued to participate in the LEGO working group as an alpha tester of the Noctua software and helped to train GO curators in LEGO curation and the Noctua annotation tool at the Geneva LEGO workshop (April, 2016), an MGI workshop (June 2016), an EBI workshop (September 2016), and the USC workshop (November 2016). | |||
== Text Mining and Textpresso Central == | |||
*Monica McAndrews (MGI), Kimberly Van Auken, and Yuling Li are collaborating on a document classification pipeline to help MGI identify papers suitable for curation. Using training and testing papers supplied by MGI, we have developed an SVM classifier to distinguish mouse from non-mouse papers. We are beginning steps to put this pipeline into production. | *Monica McAndrews (MGI), Kimberly Van Auken, Hans-Michael Mueller, and Yuling Li (thru August 2016) are collaborating on a document classification pipeline to help MGI identify papers suitable for curation. Using training and testing papers supplied by MGI, we have developed an SVM classifier to distinguish mouse from non-mouse papers. We are beginning steps to put this pipeline into production. | ||
*Hans-Michael Muller, | *Hans-Michael Muller, Kimberly Van Auken, and Seth Carbon continued development of the TextpressoCentral (TPC) curation system and its integration with the Noctua annotation tool. TPC enables curators to perform full text literature searches, view the search results in the context of the paper, annotate text, and send those annotations to an external database. Over the past year, we have worked on developing a curation interface for GO annotation, as well as the protocol for communication between TPC and Noctua | ||
Back to http://wiki.geneontology.org/index.php/Progress_Reports | Back to http://wiki.geneontology.org/index.php/Progress_Reports | ||
[[Category: Reports]] | [[Category: Reports]] |
Latest revision as of 15:52, 20 December 2016
Overview:
Staff
Person, Group [Effort, Funding]
Paul Sternberg, PI, WormBase, GO [8%; 0% funded by GOC]
Juancarlos Chan, Developer, WormBase [25%; 25% funded by GOC]
Sibyl Gao, Developer, WormBase [5%; 0% funded by GOC]
Kevin Howe, Project Lead, WormBase - EBI [5%; 0% funded by GOC]
Raymond Lee, Curator, WormBase [10%; 0% funded by GOC]
Yuling Li (thru August 2016), Developer, Textpresso [30%; 25% funded by GOC]
Jane Lomax (thru July 2016), Curator, WormBase ParaSite [10%; 0% funded by GOC]
Hans Michael Mueller, Project Lead, Textpresso [75%; 50% funded by GOC]
Daniela Raciti, Curator [10%; 0% funded by GOC]
Kimberly Van Auken, Curator, Co-Manager, Annotation Working Group [100%; 75% funded by GOC]
Annotation Progress
WormBase GO Annotation Statistics as of December 20, 2016
Manual annotation statistics are summarized in Tables 1 - 3.
Total number of unique manual annotations: 42747 (+8.8% from 2015)
Total number of genes with manual annotations: 7596 (+12.3% from 2015)
Table 1: Summary of C. elegans Manual Biological Process Annotations
Numbers refer to total number of annotations; annotations in parentheses represent annotations with extensions.
Annotation Group | IMP | IGI | IDA | ISS | TAS | IEP | IPI | IC | NAS | ISM | ND | IBA | IRD |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
WormBase | 7623 (426) | 3141 (90) | 1106 (24) | 327 (1) | 109 | 292 (56) | 51 | 52 (10) | 32 | 2 | 2 | 0 | 0 |
UniProt | 1530 (552) | 976 (390) | 165 (15) | 197 | 26 (3) | 14 | 2 (2) | 5 | 104 | 0 | 65 | 0 | 0 |
CACAO | 20 | 1 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
BHF-UCL | 11 | 0 | 0 | 2 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
MGI | 4 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
HGNC | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
GO_Central | 2 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7945 | 1 |
ParkinsonsUK-UCL | 10 (4) | 6 (3) | 11 | 2 (1) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Totals | 9200 (982) | 4124 (483) | 1291 (39) | 536 (2) | 135 (3) | 310 (56) | 53 (2) | 57 (10) | 136 | 2 | 67 | 7945 | 1 |
Table 2: Summary of C. elegans Molecular Function Annotations
Numbers refer to total number of annotations; annotations in parentheses represent annotations with extensions.
Annotation Group | IMP | IGI | IDA | ISS | TAS | IPI | IC | NAS | ISM | ND | IBA | ISO | IKR | IRD |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
WormBase | 161 (11) | 32 | 1688 (209) | 647 (4) | 45 | 1348 (5) | 21 (1) | 4 | 3 | 73 | 0 | 2 | 0 | 0 |
IntAct | 0 | 0 | 0 | 0 | 0 | 2085 (52) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
UniProt | 57 (2) | 17 | 139 (3) | 194 | 23 (1) | 321 (3) | 4 | 51 | 0 | 126 | 0 | 0 | 0 | 0 |
CACAO | 1 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
GO_Central | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6538 | 0 | 1 | 1 |
HGNC | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ParkinsonsUK-UCL | 0 | 0 | 0 | 0 | 0 | 2 (2) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Totals | 219 (13) | 49 | 1834 (212) | 843 (4) | 68 (1) | 3754 (60) | 25 (4) | 55 | 4 | 199 | 6538 | 2 | 1 | 1 |
Table 3: Summary of C. elegans Cellular Component Annotations
Numbers refer to total number of annotations; annotations in parentheses represent annotations with extensions.
Annotation Group | IMP | IGI | IDA | ISS | TAS | IPI | IC | NAS | ISM | ND | IBA |
---|---|---|---|---|---|---|---|---|---|---|---|
WormBase | 9 | 0 | 5863 (784) | 382 | 27 | 141 (3) | 50 | 7 | 4 | 10 | 0 |
GO_Central | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6326 |
UniProt | 29 (10) | 1 | 379 (73) | 203 | 18 | 0 | 19 | 50 | 0 | 118 | 0 |
MGI | 0 | 0 | 16 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
HGNC | 0 | 0 | 0 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
BHF-UCL | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
CACAO | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Totals | 38 (10) | 1 | 6268 (857) | 593 | 45 | 141 (3) | 69 | 57 | 4 | 128 | 6326 |
Table 4: Summary of C. elegans Computational Annotations
Summary Statistics Based on WormBase Release WS256
Genes Stats:
Genes with GO_term connections 15047 Non-IEA-only annotation 640 IEA-only annotation 7830 Both IEA and non-IEA annotations 6577
GO_term Stats:
Distinct GO_terms connected to Genes 5679 Associated by non-IEA only 3123 Associated by IEA only 825 Associated by both IEA and non-IEA 1731
Type of Annotation | IEA |
---|---|
Phenotype2GO Mappings - WormBase | 37,714 |
IEA/InterPro2GO - WormBase | 22,660 |
Methods and strategies for annotation
Curation methods
Literature curation
Curation of the primary literature continues to be the major focus of our manual annotation efforts.
Over the past year, WormBase curation efforts were focused largely on developing preliminary pathway models using the Noctua curation tool. To this end, literature curation involved reviewing C. elegans pathways, the biological entities that participate in those pathways, and the annotations, particularly Molecular Function annotations, associated with those entities. Pathways reviewed include apoptosis, asymmetric cell division, defense response, insulin signaling, neuronal cell fate specification, mRNA decay, semaphorin/plexin signaling, thermosensory transduction, and TOR signaling.
Curation using the Textpresso information retrieval system
We also employ the Textpresso information retrieval system for curation of GO Cellular Component and Molecular Function annotations.
Computational annotation strategies
Our computational annotation strategies include mapping genes to GO terms using InterPro domains performed as part of the WormBase build cycle, as well as computational predictions made via the UniProtKB pipeline, including keyword mappings and UniRule mapping. Also as part of the WormBase build cycle, we map genes to Biological Process terms based upon mappings between terms in the Worm Phenotype Ontology (WPO).
Curation strategies
Priorities for annotation
Selection of genes for annotation is guided by several criteria:
- Annotation of gene sets involved in specific biological processes as part of the LEGO working group
- Genes identified in Textpresso-based curation pipelines, for example genes described in papers flagged by an SVM (Support Vector Machine) classification algorithm having a high confidence of reporting Molecular Function experiments such as enzymatic assays
- Re-annotation of genes affected by changes to the ontology, e.g. cilia biology, ubiquitination, enzyme regulator activities, and obsoleted annotation extensions
- Publication of newly characterized genes for which no previous biological data was available
Presentations and Publications
Papers with substantial GO content
- Expansion of the Gene Ontology knowledgebase and resources. Gene Ontology Consortium. Nucleic Acids Research (2016) pii:gkw1108. PMID:27899567
- Guidelines for the functional annotation of microRNAs using the Gene Ontology. Huntley RP, Sitnikov D, Orlic-Milacic M, Balakrishnan R, D'Eustachio P, Gillespie ME, Howe D, Kalea AZ, Maegdefessel L, Osumi-Sutherland D, Petri V, Smith JR, Van Auken K, Wood V, Zampetaki A, Mayr M, Lovering RC. RNA. 2016 May;22(5):667-76. doi:10.1261/rna.055301.115. PMID:26917558.
Presentations including Talks and Tutorials and Teaching
- TextpressoCentral: A System for Integrating Full Text Literature Curation with Diverse Curation Platforms including the Gene Ontology Consortium's Common Annotation Framework. Kimberly Van Auken, Yuling Li, Seth Carbon, Christopher Mungall, Suzanna Lewis, Hans-Michael Muller and Paul Sternberg. ISB 2016 Geneva, Switzerland. https://www.sib.swiss/events/biocuration2016/oral-presentations
Other Highlights
Annotation Outreach and User Advocacy Efforts
- Kimberly Van Auken continues to serve on the GO-help rota.
- Kimberly Van Auken served on the Data Capture Working Group.
Annotation Advocacy
- Kimberly Van Auken and David Hill (MGI) continue to serve as Annotation Working Group Co-Managers.
- Kimberly Van Auken continued to participate in the LEGO working group as an alpha tester of the Noctua software and helped to train GO curators in LEGO curation and the Noctua annotation tool at the Geneva LEGO workshop (April, 2016), an MGI workshop (June 2016), an EBI workshop (September 2016), and the USC workshop (November 2016).
Text Mining and Textpresso Central
- Monica McAndrews (MGI), Kimberly Van Auken, Hans-Michael Mueller, and Yuling Li (thru August 2016) are collaborating on a document classification pipeline to help MGI identify papers suitable for curation. Using training and testing papers supplied by MGI, we have developed an SVM classifier to distinguish mouse from non-mouse papers. We are beginning steps to put this pipeline into production.
- Hans-Michael Muller, Kimberly Van Auken, and Seth Carbon continued development of the TextpressoCentral (TPC) curation system and its integration with the Noctua annotation tool. TPC enables curators to perform full text literature searches, view the search results in the context of the paper, annotate text, and send those annotations to an external database. Over the past year, we have worked on developing a curation interface for GO annotation, as well as the protocol for communication between TPC and Noctua
Back to http://wiki.geneontology.org/index.php/Progress_Reports