WormBase, September 2009: Difference between revisions
mNo edit summary |
|||
(34 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
[[Category:Reports - WormBase]] | |||
= Progress Report = | = Progress Report = | ||
= In Progress: last updated: 09- | = In Progress: last updated: 09-28-2009 = | ||
= Staff: = | = Staff: = | ||
Line 26: | Line 27: | ||
'''Anthony Rogers''' | '''Anthony Rogers''' | ||
WormBase, Sanger Center, Hinxton, UK | |||
'''Gary Williams''' | |||
WormBase, Sanger Center, Hinxton, UK | WormBase, Sanger Center, Hinxton, UK | ||
Line 43: | Line 48: | ||
Developer, Textpresso, Caltech, Pasadena, CA | Developer, Textpresso, Caltech, Pasadena, CA | ||
= Annotation Progress = | = Annotation Progress = | ||
Line 74: | Line 76: | ||
| IEA/Electronic | | IEA/Electronic | ||
! 9115 | ! 9115 | ||
! -27.7% | ! -27.7% (see Note) | ||
! 418 | ! 418 | ||
! 9115 | ! 9115 | ||
Line 85: | Line 87: | ||
|} | |} | ||
Note: The decrease in IEA annotations is due to changes in our InterPro2GO pipeline that introduced improved parameters for running each of the InterPro member database prediction algorithms on the ''C. elegans'' proteome. These improvements reduced the number of low confidence domain predictions and consequently, the number of low confidence IEA annotations. | |||
= Methods and Strategies for Annotation = | = Methods and Strategies for Annotation = | ||
Line 93: | Line 98: | ||
We have implemented a GO curation check-out form that affords curators easy visual access to the curation status of all named ''C. elegans'' gene, e.g. vha-6 or egl-9. Genes are displayed in a list that includes the current number of published papers (references) indexed to that gene and the last date for which annotations to either of the three ontologies were made. Curators can query and sort the list according to reference count, gene name, and curation status. | We have implemented a GO curation check-out form that affords curators easy visual access to the curation status of all named ''C. elegans'' gene, e.g. vha-6 or egl-9. Genes are displayed in a list that includes the current number of published papers (references) indexed to that gene and the last date for which annotations to either of the three ontologies were made. Curators can query and sort the list according to reference count, gene name, and curation status. | ||
'''Computational Methods''' | '''Computational Methods''' | ||
Line 114: | Line 117: | ||
by large scale RNA interference screens have been used for the mapping. For example, the phenotype | by large scale RNA interference screens have been used for the mapping. For example, the phenotype | ||
'STErile' (Ste) which is a specialization of 'post-embryonic defect' and 'reproductive defect' is mapped to | 'STErile' (Ste) which is a specialization of 'post-embryonic defect' and 'reproductive defect' is mapped to | ||
the GO term 'reproduction' (GO:0000003). | the GO term 'reproduction' (GO:0000003). A list of the currently used mappings can be found here: | ||
http://www.wormbase.org/wiki/index.php/Phenotype2GO_Mappings_File | |||
Line 130: | Line 134: | ||
4) Newly described genes for which previous annotation was not available | 4) Newly described genes for which previous annotation was not available | ||
5) Phenotype2GO and InterPro2GO annotations are updated with each release. | 5) ''C. elegans'' orthologs of human disease genes | ||
6) Phenotype2GO and InterPro2GO annotations are updated with each release. | |||
= Presentations and Publications = | = Presentations and Publications = | ||
Line 143: | Line 149: | ||
'''Presentations including Talks and Tutorials and Teaching''' | '''Presentations including Talks and Tutorials and Teaching''' | ||
Yook K, '''Van Auken KM''', Sternberg P, and the WormBase Consortium. Using Textpresso for Information Retrieval, Fact Extraction, and Database Entry. Third International Biocuration Conference, April 16-19, 2009, Berlin, Germany. Available from Nature Proceedings: http://precedings.nature.com/documents/3302/version/1 | |||
'''Poster presentations''' | '''Poster presentations''' | ||
Line 151: | Line 158: | ||
= Other Highlights = | = Other Highlights = | ||
'''Ontology | '''A. Ontology Development Contributions''' | ||
WormBase curators have contributed to ontology discussion and development in the areas of intraflagellar transport, sex determination and dosage compensation, apoptosis, gastrulation, and drug withdrawal. | |||
'''B. Annotation Outreach and User Advocacy Efforts''' | |||
Kimberly Van Auken continues to participate in the gohelp rotation. Ranjana Kishore continues to participate in the efforts of the GO News group. | |||
'''C. Other Highlights''' | |||
'''Curation Tools: Ontology Annotator''' | |||
We are developing a new, web-based curation tool, the Ontology Annotator, that can be used to annotate genes to any ontology, including the Gene Ontology and the WormBase Phenotype Ontology. The Ontology Annotator incorporates and expands upon much of the functionality of the Phenote Curation tool. Some of the more useful features of the tool include bulk annotation capabilities, autocomplete functions, retrieving data and filtering of the retrieved data for editing purposes. | |||
'''Semi-Automated Molecular Function Curation''' | '''Semi-Automated Molecular Function Curation''' | ||
We continue to explore Textpresso-based GO curation, by developing pipelines for semi-automated Molecular Function curation. Preliminarily, our plans involve a two-tiered approach encompassing: 1) document classification using SVMs (Support Vector Machines) and 2) category searches to identify curatable sentences within documents identified as high confidence for Molecular Function information by SVMs. Our initial efforts are focusing on the binding branch of the MF ontology, including protein-nucleic acid interactions. | We continue to explore Textpresso-based GO curation, by developing pipelines for semi-automated Molecular Function curation. Preliminarily, our plans involve a two-tiered approach encompassing: 1) document classification using SVMs (Support Vector Machines) and 2) category searches to identify curatable sentences within documents identified as high confidence for Molecular Function information by SVMs. Our initial efforts are focusing on the binding branch of the MF ontology, including protein-nucleic acid interactions. | ||
'''Expanded Phenotype2GO Mappings''' | |||
Working with the WormBase phenotype curators, we have added an additional 146 mappings to our Phenotype2GO mappings, which are used to make GO Biological Process annotations using the IMP evidence code. Allele- or RNAi-based phenotypes are annotated to a term from the WormBase phenotype ontology, which is then mapped to an appropriate GO term. A list of the new mappings can be found here: | |||
http://www.wormbase.org/wiki/index.php/Phenotype2GO_Mappings_Sept._09 |
Latest revision as of 19:14, 6 March 2020
Progress Report
In Progress: last updated: 09-28-2009
Staff:
WormBase
Juancarlos Chan
Developer, WormBase, Caltech, Pasadena, CA
Ranjana Kishore
Curator, WormBase, Caltech, Pasadena, CA
Paul Sternberg
PI, WormBase, Caltech, Pasadena, CA
Kimberly Van Auken
Curator, WormBase, Caltech, Pasadena, CA
Additional technical support:
Anthony Rogers
WormBase, Sanger Center, Hinxton, UK
Gary Williams
WormBase, Sanger Center, Hinxton, UK
Textpresso
Ruihua Fang
Developer, Textpresso, Caltech, Pasadena, CA
Hans Michael Muller
Project Leader, Textpresso, Caltech, Pasadena, CA
Arun Rangarajan
Developer, Textpresso, Caltech, Pasadena, CA
Annotation Progress
Table 1: Number of Genes Annotated to Each GO Ontology
Type of Annotation | Number of Genes Annotated | % Change from October 2008 | Number of Unique GO Terms | Total Number of GO Terms |
---|---|---|---|---|
Manual Annotation | 1684 | +10% | 1536 | 10573 |
Phenotype2GO Mappings | 4769 | +2.2% | 53 | 30644 |
IEA/Electronic | 9115 | -27.7% (see Note) | 418 | 9115 |
Total | 14623 | +2.0% | 1812 | 50332 |
Note: The decrease in IEA annotations is due to changes in our InterPro2GO pipeline that introduced improved parameters for running each of the InterPro member database prediction algorithms on the C. elegans proteome. These improvements reduced the number of low confidence domain predictions and consequently, the number of low confidence IEA annotations.
Methods and Strategies for Annotation
Literature Curation
Manual curation of the C. elegans literature remains our highest curation priority, contributing to ~90% of our total curation efforts.
We have implemented a GO curation check-out form that affords curators easy visual access to the curation status of all named C. elegans gene, e.g. vha-6 or egl-9. Genes are displayed in a list that includes the current number of published papers (references) indexed to that gene and the last date for which annotations to either of the three ontologies were made. Curators can query and sort the list according to reference count, gene name, and curation status.
Computational Methods
Our computational methods encompass two main approaches: 1) InterPro2GO mappings for IEA annotations and 2) Phenotype2GO mappings for IMP annotations.
InterPro2GO Mappings
These annotations are annotations of C. elegans proteins to GO terms based on electronic matching of protein motifs/domains to those documented in the Interpro database (http://www.ebi.ac.uk/interpro/), and their mapping to GO terms provided by the Interpro2go file generated by the EBI (PMID:12654719, PMID:12520011). Note that the 'IEA' annotations are not reviewed for accuracy by human curators. As such, all of these annotations use the evidence code 'IEA'.
Phenotype2GO Mappings:
These annotations are obtained by a semi-automated method wherein phenotypes are mapped to a GO term/s by WormBase curators. These mappings are then used by a script to attach GO_terms to genes. These annotations all have the evidence code 'IMP'. Currently, allele phenotypes or phenotypes obtained by large scale RNA interference screens have been used for the mapping. For example, the phenotype 'STErile' (Ste) which is a specialization of 'post-embryonic defect' and 'reproductive defect' is mapped to the GO term 'reproduction' (GO:0000003). A list of the currently used mappings can be found here:
http://www.wormbase.org/wiki/index.php/Phenotype2GO_Mappings_File
Priorities for Annotation
Our annotation priorities are as follows:
1) Reference Genome genes
2) Genes presented for annotation via our Textpresso-based semi-automated Cellular Component curation pipeline
3) Genes from training set papers used for piloting semi-automated Textpresso-based Molecular Function curation
4) Newly described genes for which previous annotation was not available
5) C. elegans orthologs of human disease genes
6) Phenotype2GO and InterPro2GO annotations are updated with each release.
Presentations and Publications
Publications
Reference Genome Group of the Gene Ontology Consortium. The Gene Ontology's Reference Genome Project: a unified framework for functional annotation across species, PLoS Comput Biol. 2009 Jul;5(7):e1000431. Epub 2009 Jul 3
Van Auken K, Jaffery J, Chan J, Müller HM, Sternberg PW. Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation. BMC Bioinformatics. 2009 Jul 21;10:228.
Presentations including Talks and Tutorials and Teaching
Yook K, Van Auken KM, Sternberg P, and the WormBase Consortium. Using Textpresso for Information Retrieval, Fact Extraction, and Database Entry. Third International Biocuration Conference, April 16-19, 2009, Berlin, Germany. Available from Nature Proceedings: http://precedings.nature.com/documents/3302/version/1
Poster presentations
None
Other Highlights
A. Ontology Development Contributions
WormBase curators have contributed to ontology discussion and development in the areas of intraflagellar transport, sex determination and dosage compensation, apoptosis, gastrulation, and drug withdrawal.
B. Annotation Outreach and User Advocacy Efforts
Kimberly Van Auken continues to participate in the gohelp rotation. Ranjana Kishore continues to participate in the efforts of the GO News group.
C. Other Highlights
Curation Tools: Ontology Annotator
We are developing a new, web-based curation tool, the Ontology Annotator, that can be used to annotate genes to any ontology, including the Gene Ontology and the WormBase Phenotype Ontology. The Ontology Annotator incorporates and expands upon much of the functionality of the Phenote Curation tool. Some of the more useful features of the tool include bulk annotation capabilities, autocomplete functions, retrieving data and filtering of the retrieved data for editing purposes.
Semi-Automated Molecular Function Curation
We continue to explore Textpresso-based GO curation, by developing pipelines for semi-automated Molecular Function curation. Preliminarily, our plans involve a two-tiered approach encompassing: 1) document classification using SVMs (Support Vector Machines) and 2) category searches to identify curatable sentences within documents identified as high confidence for Molecular Function information by SVMs. Our initial efforts are focusing on the binding branch of the MF ontology, including protein-nucleic acid interactions.
Expanded Phenotype2GO Mappings
Working with the WormBase phenotype curators, we have added an additional 146 mappings to our Phenotype2GO mappings, which are used to make GO Biological Process annotations using the IMP evidence code. Allele- or RNAi-based phenotypes are annotated to a term from the WormBase phenotype ontology, which is then mapped to an appropriate GO term. A list of the new mappings can be found here:
http://www.wormbase.org/wiki/index.php/Phenotype2GO_Mappings_Sept._09