WormBase December 2014: Difference between revisions
Line 209: | Line 209: | ||
**gut granule membrane | **gut granule membrane | ||
**peptidyl-proline 4-dioxygenase binding | **peptidyl-proline 4-dioxygenase binding | ||
B. Annotation Outreach and User Advocacy Efforts: | B. Annotation Outreach and User Advocacy Efforts: | ||
Line 215: | Line 214: | ||
* Kimberly Van Auken assisted with migration of content to the new GO website. | * Kimberly Van Auken assisted with migration of content to the new GO website. | ||
C. Annotation Advocacy | |||
* Kimberly Van Auken is participating in bi-weekly calls on development of the LEGO curation model and accompanying curation tool, Noctua. | |||
C. Other Highlights: | C. Other Highlights: | ||
* We have written a new script for reporting our manual annotations statistics. This script reports the number of annotations per contributing group according to evidence code and also reports the number of annotations with annotation extensions. | * We have written a new script for reporting our manual annotations statistics. This script reports the number of annotations per contributing group according to evidence code and also reports the number of annotations with annotation extensions. | ||
* WormBase GO Annotation Model - We have completed | * WormBase GO Annotation Model - We have completed development and testing of a new GO annotation model for WormBase. The model will allow full incorporation of annotation extension data into WormBase. The new data will be included in WormBase Release WS237. | ||
* BioCreative - WormBase participated in the [http://www.biocreative.org/tasks/biocreative-iv/track-4-GO/2013 BioCreative Track 4] task of identifying GO evidence sentences and GO annotations from the full text of publications. Using a GO Annotation Tool (GOAT) developed by the Textpresso team that allowed for highlighting sentences and associating GO annotations, a WormBase curator provided training and test data for the full text of 22 papers and then helped to perform error analysis on the results submitted by the participating teams. Other curation groups participating included FlyBase, MaizeDB, RGD, and TAIR. Two papers describing this work were submitted to Database and one has been accepted with minor revision. | * BioCreative - WormBase participated in the [http://www.biocreative.org/tasks/biocreative-iv/track-4-GO/2013 BioCreative Track 4] task of identifying GO evidence sentences and GO annotations from the full text of publications. Using a GO Annotation Tool (GOAT) developed by the Textpresso team that allowed for highlighting sentences and associating GO annotations, a WormBase curator provided training and test data for the full text of 22 papers and then helped to perform error analysis on the results submitted by the participating teams. Other curation groups participating included FlyBase, MaizeDB, RGD, and TAIR. Two papers describing this work were submitted to Database and one has been accepted with minor revision. | ||
Back to http://wiki.geneontology.org/index.php/Progress_Reports | Back to http://wiki.geneontology.org/index.php/Progress_Reports |
Revision as of 17:45, 4 December 2014
Overview:
Staff:
Paul Sternberg, PI, WormBase, GO [8%; 0% funded by GOC]
Juancarlos Chan, Developer, WormBase [25%; 25% funded by GOC]
James Done, Developer, Textpresso [40%; 40% funded by GOC]
Ranjana Kishore, Curator [25%; 10% funded by GOC]
Yuling Li, Developer, Textpresso [30%; 20% funded by GOC]
Hans Michael Mueller, PI, Textpresso [75%; 50% funded by GOC]
Daniela Raciti, Curator [10%; 0% funded by GOC]
Kimberly Van Auken, Curator [100%; 75% funded by GOC]
Annotation Progress
WormBase GO Annotation Statistics as of December 1, 2014
Manual annotation statistics are summarized in Tables 1 - 3.
Total number of unique manual annotations: 27422
Total number of genes with manual annotations: 4690
Table 1: Summary of C. elegans Manual Biological Process Annotations
Numbers refer to total number of annotations; annotations in parentheses represent annotations with extensions.
Annotation Group | IMP | IGI | IDA | ISS | TAS | IEP | IPI | IC | NAS | ISM | ND | IBA | IRD | RCA | ISO | IKR |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
WormBase | 7461 (244) | 2990 (53) | 1107 (19) | 315 (1) | 115 | 275 (58) | 60 | 50 (10) | 32 | 2 | 0 | 0 | 0 | 0 | 0 | 0 |
UniProt | 466 (2) | 28 | 115 (1) | 170 | 22 | 13 | 0 | 5 | 104 | 0 | 65 | 0 | 0 | 2 | 0 | 0 |
GOC | 59 | 10 | 309 | 329 | 22 | 0 | 4 | 7 | 14 | 0 | 0 | 331 | 0 | 2 | 2 | 0 |
BHF-UCL | 11 | 0 | 0 | 2 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
MGI | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
HGNC | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
GO_Central | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2810 | 3 | 0 | 0 | 1 |
ParkinsonsUK-UCL | 2 | 2 (2) | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Totals | 8004 (246) | 3030 (55) | 1537 (20) | 804 (1) | 159 | 292 (58) | 64 | 62 (10) | 150 | 2 | 65 | 3141 | 3 | 4 | 2 | 1 |
Table 2: Summary of C. elegans Molecular Function Annotations
Numbers refer to total number of annotations; annotations in parentheses represent annotations with extensions.
Annotation Group | IMP | IGI | IDA | ISS | TAS | IEP | IPI | IC | NAS | ISM | ND | IBA | IRD | RCA | ISO | IKR |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
WormBase | 151 (5) | 35 | 1617 (133) | 658 (1) | 49 | 0 | 1211 | 11 | 7 | 4 | 35 | 0 | 0 | 0 | 2 | 0 |
IntAct | 0 | 0 | 0 | 0 | 0 | 0 | 1987 (54) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
UniProt | 33 | 2 | 99 (1) | 172 | 19 | 0 | 231 | 3 | 53 | 0 | 127 | 0 | 0 | 19 | 0 | 0 |
GO_Central | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2096 | 2 | 0 | 0 | 1 |
HGNC | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ParkinsonsUK-UCL | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Totals | 184 (5) | 37 | 1720 (134) | 832 (1) | 68 | 0 | 3429 (54) | 13 | 60 | 4 | 162 | 2096 | 2 | 19 | 2 | 1 |
Table 3: Summary of C. elegans Cellular Component Annotations
Numbers refer to total number of annotations; annotations in parentheses represent annotations with extensions.
Annotation Group | IMP | IGI | IDA | ISS | TAS | IEP | IPI | IC | NAS | ISM | ND | IBA | IRD | RCA | ISO | IKR |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
WormBase | 9 | 0 | 5625 (683) | 322 | 26 | 0 | 142 (3) | 43 | 6 | 4 | 4 | 0 | 0 | 1 | 0 | 0 |
GO_Central | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2001 | 3 | 0 | 0 | 1 |
UniProt | 14 | 1 | 208 | 186 | 16 | 0 | 0 | 19 | 50 | 0 | 119 | 0 | 0 | 18 | 0 | 0 |
MGI | 0 | 0 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
BHF-UCL | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Reactome | 0 | 0 | 0 | 3 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
HGNC | 0 | 0 | 0 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Totals | 23 | 1 | 5854 (683) | 519 | 46 | 0 | 142 (3) | 62 | 56 | 4 | 123 | 2001 | 3 | 19 | 0 | 1 |
Table 4: Summary of C. elegans Computational Annotations
Based on WormBase Release WS246
Total number of genes with Phenotype2GO-based Annotation: 6,809
Total number of genes with InterPro2GO-based Annotation: 11,282
Type of Annotation | IEA |
---|---|
Phenotype2GO Mappings - WormBase | 42,666 |
IEA/InterPro2GO - WormBase | 35,082 |
Methods and strategies for annotation
Curation methods
Literature curation:
Curation of the primary literature continues to be the major focus of our manual annotation efforts.
Over the past year, WormBase has begun a topic-based approach to curation in which curators focus curation efforts on one or more biological topics, or processes, for each release cycle. Topics over the past year have included the endoplasmic reticulum and mitochondrial unfolded protein responses, innate immunity and defense response, and Wnt signaling pathways (see below).
Semi-automated curation using the Textpresso information retrieval system
We also routinely employ the Textpresso information retrieval system for semi-automated curation of GO Cellular Component and Molecular Function annotations.
Computational annotation strategies:
Our computational annotation strategies include mapping genes to GO terms using InterPro domains and mapping genes to Biological Process terms based upon mappings between terms in the Worm Phenotype Ontology (WPO). Beginning with the WS246 WormBase release, these Phenotype2GO-based annotations will include phenotypes based upon genetic variations as well as RNAi experiments. Results from automated methods are generated anew with each WormBase database build to reflect any changes in the underlying reference genome sequence and/or gene models.
Curation strategies
Priorities for annotation
Selection of genes for annotation is guided by several criteria:
- Annotation of gene sets involved in specific biological processes as part of WormBase's coordinated topic-based curation process
- Topics annotated to date: Unfolded Protein Response (ER and mitochondrial), innate immune response, defense response to pathogen, and Wnt signaling
- Genes identified in Textpresso-based curation pipelines
- Re-annotation of genes affected by changes to the ontology, e.g. cilia biology, ubiquitination, enzyme regulator activities
- Publication of newly characterized genes
- C. elegans genes orthologous to human disease genes
Presentations and Publications
a. Papers with substantial GO content
- Van Auken K, Schaeffer ML, McQuilton P, Laulederkind SJF, Li D, Wang SJ, Hayman GT, Tweedie S, Arighi CN, Done J, Müller HM, Sternberg PW, Mao Y, Wei CH, Lu Z. BC4GO: A Full-Text Corpus for the BioCreative IV GO Task.Database (Oxford). 2014 Jul 28;2014. pii: bau074. doi: 10.1093/database/bau074. Print 2014. PMID: 25070993, PMCID: PMC4112614
- Mao Y, Van Auken K, Li D, Arighi CN, McQuilton P, Hayman GT, Tweedie S, Schaeffer ML, Laulederkind SJF Wang SJ, Gobeill J, Ruch P, Luu AT, Kim JJ, Chiang JH, Chen YD, Yang CJ, Liu H, Zhu D, Li Y, Yu H, Emadzadeh E, Gonzalez G, Chen JM, Dai HJ, Lu Z. Overview of the Gene Ontology Task at BioCreative IV. Database (Oxford). 2014 Aug 25;2014. pii: bau086. doi: 10.1093/database/bau086. Print 2014. PMID:25157073, PMCID: PMC4142793
- Huntley RP, Harris MA, Alam-Faruque Y, Blake JA, Carbon S, Dietze H, Dimmer EC, Foulger RE, Hill DP, Khodiyar VK, Lock A, Lomax J, Lovering RC, Mutowo-Meullenet P, Sawford T, Van Auken K, Wood V, Mungall CJ. A method for increasing expressivity of Gene Ontology annotations using a compositional approach. BMC Bioinformatics. 2014 May 21;15:155. doi: 10.1186/1471-2105-15-155. PubMed PMID: 24885854; PMCID: PMC4039540.
b. Presentations including Talks and Tutorials and Teaching
c. Poster presentations
Other Highlights:
A. Ontology Development Contributions:
- Pending Term Requests:
- lysosome-related organelle
- gut granule
- gut granule lumen
- gut granule membrane
- peptidyl-proline 4-dioxygenase binding
B. Annotation Outreach and User Advocacy Efforts:
- Kimberly Van Auken continues to serve on the GO-help rota.
- Kimberly Van Auken assisted with migration of content to the new GO website.
C. Annotation Advocacy
- Kimberly Van Auken is participating in bi-weekly calls on development of the LEGO curation model and accompanying curation tool, Noctua.
C. Other Highlights:
- We have written a new script for reporting our manual annotations statistics. This script reports the number of annotations per contributing group according to evidence code and also reports the number of annotations with annotation extensions.
- WormBase GO Annotation Model - We have completed development and testing of a new GO annotation model for WormBase. The model will allow full incorporation of annotation extension data into WormBase. The new data will be included in WormBase Release WS237.
- BioCreative - WormBase participated in the BioCreative Track 4 task of identifying GO evidence sentences and GO annotations from the full text of publications. Using a GO Annotation Tool (GOAT) developed by the Textpresso team that allowed for highlighting sentences and associating GO annotations, a WormBase curator provided training and test data for the full text of 22 papers and then helped to perform error analysis on the results submitted by the participating teams. Other curation groups participating included FlyBase, MaizeDB, RGD, and TAIR. Two papers describing this work were submitted to Database and one has been accepted with minor revision.
Back to http://wiki.geneontology.org/index.php/Progress_Reports