WB Mar 3 to June 5: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
 
(29 intermediate revisions by the same user not shown)
Line 1: Line 1:
=IN PROGRESS=
Overview:
Overview:


Line 46: Line 44:
|-
|-
! WormBase
! WormBase
| 6938 (129) || 2829 (29) || 1015 (10) ||  '''296''',1 || 113 || 79 (56) || 61 || 43 (10) || 32 || 2 || 0 || 0 || 0 || 0 || 0 || 0  
| '''7149,186''' || '''2887,42''' || '''1030''',10 ||  '''296''',1 || '''115''' || '''234,55''' || 61 || '''44''',10 || 32 || 2 || 0 || 0 || 0 || 0 || 0 || 0  
|-
|-
!UniProt  
!UniProt  
| 442 (2) || 28 || 110 (1) ||  '''163''' || 22 || 13 || 0 || 5 || 106 || 0 || 65 || 0 || 0 || 2 || 0 || 0  
| '''456''',2 || 28,'''1''' || '''109''',1 ||  '''163''' || 22 || 13 || 0 || 5 || '''94''' || 0 || 65 || 0 || 0 || 2 || 0 || 0  
|-
|-
!GOC  
!GOC  
| 54 || 10 || 309 ||  '''184''' || 20 || 0 || 5 || 7 || 14 || 0 || 0 || 207 || 0 || 2 || 2 || 0
| '''53''' || 10 || '''313''' ||  '''184''' || 20 || 0 || 5 || 7 || '''12''' || 0 || 0 || 207 || 0 || 2 || 2 || 0
|-  
|-  
!BHF-UCL
!BHF-UCL
| 11 || 0 || 0 || 2 || 0 || 4 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0
| '''10''' || 0 || 0 || 2 || 0 || 4 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0
|-
|-
!MGI
!MGI
Line 64: Line 62:
|-
|-
!RefGenome
!RefGenome
| 0 || 0 || 0 || 4 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 801 || 3 || 0 || 0 || 1
| 0 || 0 || 0 || 4 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || '''1078''' || 3 || 0 || 0 || '''0'''
|-
|-
!Totals
!Totals
Line 84: Line 82:
|-
|-
! WormBase
! WormBase
| 140 || 35 || 1466 (66) ||  '''637''',1 || 50 || 0 || 1113 || 7 || 7 || 4 || 20 || 0 || 0 || 0 || 2 || 0  
| '''143,1''' || 35 || '''1490,76''' ||  '''637''',1 || '''49''' || 0 || '''1131''' || 7 || 7 || 4 || '''21''' || 0 || 0 || 0 || 2 || 0  
|-
|-
!IntAct  
!IntAct  
| 0 || 0 || 0 || 0 || 0 || 0 || 1978 (56) || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0  
| 0 || 0 || 0 || 0 || 0 || 0 || '''1980 ''',56 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0  
|-
|-
!UniProt  
!UniProt  
| 33 || 2 || 92 (1) || 161 || 18 || 0 || 226 || 3 || 52 || 0 || 123 || 0 || 0 || 19 || 0 || 0
| 33 || 2 || '''95''',1 || 161 || 18 || 0 || '''232''' || 3 || 52 || 0 || '''126''' || 0 || 0 || 19 || 0 || 0
|-  
|-  
!RefGenome
!RefGenome
| 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 769 || 2 || 0 || 0 || 1
| 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || '''795''' || 2 || 0 || 0 || '''0'''
|-
|-
!HGNC
!HGNC
Line 116: Line 114:
|-
|-
! WormBase
! WormBase
| 9 || 0 || 5393 (592) ||  '''281''' || 26 || 0 || 128 (3) || 35 || 6 || 4 || 5 || 0 || 0 || 1 || 0 || 0  
| 9 || 0 || '''5533,639''' ||  '''281''' || '''0''' || 0 || '''129''',3 || '''39''' || 6 || 4 || '''4''' || 0 || 0 || 1 || 0 || 0  
|-
|-
!RefGenome
!RefGenome
| 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 672 || 11 || 0 || 0 || 1
| 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || '''713''' || '''3''' || 0 || 0 || 1
|-
|-
!UniProt
!UniProt
| 13 || 1 || 202 ||  173 || 18 || 0 || 0 || 18 || 52 || 0 || 115 || 0 || 0 || 18 || 0 || 0
| '''14''' || 1 || '''205''' ||  173 || 18 || 0 || 0 || 18 || 52 || 0 || '''117''' || 0 || 0 || 18 || 0 || 0
|-
|-
!MGI
!MGI
Line 128: Line 126:
|-
|-
!HGNC
!HGNC
| 0 || 0 || 0 || 0 || 8 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0
| 0 || 0 || 0 || 8 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0 || 0
|-
|-
! BHF-UCL
! BHF-UCL
Line 148: Line 146:
'''Table 4: Summary of ''C. elegans'' IEA and Phenotype2GO Annotations'''
'''Table 4: Summary of ''C. elegans'' IEA and Phenotype2GO Annotations'''


Total number of genes with Phenotype2GO-based Annotation: 5,541
Numbers below are based on WormBase release WS243.


Total number of genes with InterPro2GO-based Annotation: 13,946
Total number of genes with Phenotype2GO-based Annotation: n/a
 
Total number of genes with InterPro2GO-based Annotation: 9758


{| class="wikitable" style="text-align:center"
{| class="wikitable" style="text-align:center"
Line 158: Line 158:
|-
|-
!Phenotype2GO Mappings - WormBase  
!Phenotype2GO Mappings - WormBase  
| 40,703 || 0   
| 17.088 || 0   
|-
|-
!IEA/InterPro2GO - WormBase  
!IEA/InterPro2GO - WormBase  
| 0 || 46,424
| 0 || 27,016
|-  
|-  
|}
|}
Line 203: Line 203:
a.  Papers with substantial GO content
a.  Papers with substantial GO content


One paper has been accepted with minor revisions:
The following papers have been accepted for publication:  


*"BC4GO: A Full-Text Corpus for the BioCreative IV GO Task." Kimberly Van Auken, Mary L. Schaeffer, Peter McQuilton, Stanley J. F. Laulederkind, Donghui Li, Shur-Jen Wang, G. Thomas Hayman, Susan Tweedie, Cecilia N. Arighi, James Done, Hans-Michael Müller, Paul W. Sternberg, Yuqing Mao, Chih-Hsuan Wei, Zhiyong Lu. Submitted to Database, currently under revision.
*"BC4GO: A Full-Text Corpus for the BioCreative IV GO Task." Kimberly Van Auken, Mary L. Schaeffer, Peter McQuilton, Stanley J. F. Laulederkind, Donghui Li, Shur-Jen Wang, G. Thomas Hayman, Susan Tweedie, Cecilia N. Arighi, James Done, Hans-Michael Müller, Paul W. Sternberg, Yuqing Mao, Chih-Hsuan Wei, Zhiyong Lu. Submitted to Database, currently under revision.
Two papers are currently under review:


*"Overview of the Gene Ontology Task at BioCreative IV." Yuqing Mao, Kimberly Van Auken, Donghui Li, Cecilia N. Arighi, Peter McQuilton, G. Thomas Hayman, Susan Tweedie, Mary L. Schaeffer, Stanley J. F. Laulederkind, Shur-Jen Wang, Gobeill Julien, Ruch Patrick, Luu Anh Tuan, Jung-jae Kim, Jung-Hsien Chiang, Yu-De Chen, Chia-Jung Yang, Hongfang Liu, Dongqing Zhu, Yanpeng Li, Hong Yu, Ehsan Emadzadeh, Graciela Gonzalez, Jian-Ming Chen, Hong-Jie Dai, Zhiyong Lu.  Submitted to Database.  
*"Overview of the Gene Ontology Task at BioCreative IV." Yuqing Mao, Kimberly Van Auken, Donghui Li, Cecilia N. Arighi, Peter McQuilton, G. Thomas Hayman, Susan Tweedie, Mary L. Schaeffer, Stanley J. F. Laulederkind, Shur-Jen Wang, Gobeill Julien, Ruch Patrick, Luu Anh Tuan, Jung-jae Kim, Jung-Hsien Chiang, Yu-De Chen, Chia-Jung Yang, Hongfang Liu, Dongqing Zhu, Yanpeng Li, Hong Yu, Ehsan Emadzadeh, Graciela Gonzalez, Jian-Ming Chen, Hong-Jie Dai, Zhiyong Lu.  Submitted to Database.  
Line 220: Line 218:


A. Ontology Development Contributions:
A. Ontology Development Contributions:
*Pending Term Requests:
*Term Requests:
**lysosome-related organelle
**meiotic cell cycle phase terms for annotation extension
**gut granule
**response to nematocide
**gut granule lumen
**pri-miRNA transcription from RNA polymerase II promoter
**gut granule membrane
**terminal web
**peptidyl-proline 4-dioxygenase binding
**regulation of membrane permeability




B.  Annotation Outreach and User Advocacy Efforts:
B.  Annotation Outreach and User Advocacy Efforts:
* Kimberly Van Auken continues to serve on the GO-help rota.
* Kimberly Van Auken continues to serve on the GO-help rota.
* Kimberly Van Auken assisted with migration of content to the new GO website.
 




C.  Other Highlights:
C.  Other Highlights:
* We have written a new script for reporting our manual annotations statistics.  This script reports the number of annotations per contributing group according to evidence code and also reports the number of annotations with annotation extensions.


* WormBase GO Annotation Model - We have completed a draft of a new GO annotation model for WormBase and will begin testing sample data.  The new GO model should be incorporated into WormBase build WS244.   
* WormBase GO Annotation Model - We continue work on a new GO annotation model for WormBase and testing sample data.   


* BioCreative - WormBase participated in the [http://www.biocreative.org/tasks/biocreative-iv/track-4-GO/2013 BioCreative Track 4] task of identifying GO evidence sentences and GO annotations from the full text of publications.  Using a GO Annotation Tool (GOAT) developed by the Textpresso team that allowed for highlighting sentences and associating GO annotations, a WormBase curator provided training and test data for the full text of 22 papers and then helped to perform error analysis on the results submitted by the participating teams.  Other curation groups participating included FlyBase, MaizeDB, RGD, and TAIR.  Two papers describing this work were submitted to Database and one has been accepted with minor revision.
* See also the Progress Report for the Common Annotation Framework for an update on integration of Textpresso Central into the GO's Common Annotation Framework.


Back to http://wiki.geneontology.org/index.php/Progress_Reports
Back to http://wiki.geneontology.org/index.php/Progress_reports_2nd_quarter_April-June

Latest revision as of 15:12, 13 June 2014

Overview:

Staff:

Paul Sternberg, PI, WormBase, GO [8%; 0% funded by GOC]

Juancarlos Chan, Developer, WormBase [25%; 25% funded by GOC]

James Done, Developer, Textpresso [40%; 40% funded by GOC]

Ranjana Kishore, Curator [25%; 10% funded by GOC]

Yuling Li, Developer, Textpresso [30%; 20% funded by GOC]

Hans Michael Mueller, PI, Textpresso [75%; 50% funded by GOC]

Daniela Raciti, Curator [10%; 0% funded by GOC]

Kimberly Van Auken, Curator [100%; 75% funded by GOC]

Annotation Progress

WormBase GO Annotation Statistics from March 3, 2014 - June 5, 2014

Manual Annotation Summary

Total number of unique manual annotations: 21453 (865 new annotations) (+4.2% increase)

Total number of genes with manual annotations: 3844 (214 additional genes with manual annotation) (+5.9% increase)


Detailed Manual Annotation Statistics

Manual annotation statistics are detailed in Tables 1 - 3 below.


Table 1: Summary of C. elegans Manual Biological Process Annotations

Annotation Group IMP IGI IDA ISS TAS IEP IPI IC NAS ISM ND IBA IRD RCA ISO IKR
WormBase 7149,186 2887,42 1030,10 296,1 115 234,55 61 44,10 32 2 0 0 0 0 0 0
UniProt 456,2 28,1 109,1 163 22 13 0 5 94 0 65 0 0 2 0 0
GOC 53 10 313 184 20 0 5 7 12 0 0 207 0 2 2 0
BHF-UCL 10 0 0 2 0 4 0 0 0 0 0 0 0 0 0 0
MGI 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
HGNC 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0
RefGenome 0 0 0 4 0 0 0 0 0 0 0 1078 3 0 0 0
Totals 7450 (131) 2867 (29) 1434 (11) 635 (1) 155 96 (56) 66 55 (10) 152 2 54 1008 3 4 2 1

Numbers refer to total number of annotations; number after comma represents annotations with extensions.

Numbers in bold represent changes from previous quarterly report.


Table 2: Summary of C. elegans Molecular Function Annotations

Annotation Group IMP IGI IDA ISS TAS IEP IPI IC NAS ISM ND IBA IRD RCA ISO IKR
WormBase 143,1 35 1490,76 637,1 49 0 1131 7 7 4 21 0 0 0 2 0
IntAct 0 0 0 0 0 0 1980 ,56 0 0 0 0 0 0 0 0 0
UniProt 33 2 95,1 161 18 0 232 3 52 0 126 0 0 19 0 0
RefGenome 0 0 0 0 0 0 0 0 0 0 0 795 2 0 0 0
HGNC 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0
Totals 173 37 1558 (67) 801 (1) 68 0 3317 (56) 10 59 4 143 769 2 19 2 1

Numbers refer to total number of annotations; number after comma represents annotations with extensions.

Numbers in bold represent changes from previous quarterly report.


Table 3: Summary of C. elegans Cellular Component Annotations

Annotation Group IMP IGI IDA ISS TAS IEP IPI IC NAS ISM ND IBA IRD RCA ISO IKR
WormBase 9 0 5533,639 281 0 0 129,3 39 6 4 4 0 0 1 0 0
RefGenome 0 0 0 0 0 0 0 0 0 0 0 713 3 0 0 1
UniProt 14 1 205 173 18 0 0 18 52 0 117 0 0 18 0 0
MGI 0 0 14 0 0 0 0 0 0 0 0 0 0 0 0 0
HGNC 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0
BHF-UCL 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0
Reactome 0 0 0 3 3 0 0 0 0 0 0 0 0 0 0 0
Totals 22 1 5615 (592) 445 47 0 128 (3) 53 58 4 120 672 11 19 0 1

Numbers refer to total number of annotations; number after comma represents annotations with extensions.

Numbers in bold represent changes from previous quarterly report.


Table 4: Summary of C. elegans IEA and Phenotype2GO Annotations

Numbers below are based on WormBase release WS243.

Total number of genes with Phenotype2GO-based Annotation: n/a

Total number of genes with InterPro2GO-based Annotation: 9758

Type of Annotation IMP IEA
Phenotype2GO Mappings - WormBase 17.088 0
IEA/InterPro2GO - WormBase 0 27,016

Methods and strategies for annotation

Curation methods

Literature curation:

Curation of the primary literature continues to be the major focus of our manual annotation efforts.

Semi-automated curation using the Textpresso information retrieval system

We also routinely employ the Textpresso information retrieval system for semi-automated curation of GO Cellular Component and Molecular Function annotations.

Computational annotation strategies:

Our computational annotation strategies include mapping genes to GO terms using InterPro domains and mapping genes to Biological Process terms based upon parallel annotations to the Worm Phenotype Ontology (Phenotype2GO). These methods are performed automatically as part of the WormBase database build.

Note that during the past year, we stopped using an automated pipeline that mapped genes to GO:0016021, integral to plasma membrane, based on the results of a transmembrane prediction algorithm, TMHMM, as these IEA annotations had no external database identifier for the With/From column and therefore were not consistent with GO annotation practices.


Curation strategies

Priorities for annotation

Selection of genes for annotation is guided by several criteria:

  • Annotation of gene sets involved in specific biological processes as part of WormBase's coordinated topic-based curation process
    • Topics annotated to date: Unfolded Protein Response (ER and mitochondrial), innate immune response, and defense response to pathogen
  • Genes identified in Textpresso-based curation pipelines
  • Re-annotation of genes associated with now obsolete GO terms or new ontology terms
  • Publication of newly characterized genes
  • C. elegans genes orthologous to human disease genes

Presentations and Publications

a. Papers with substantial GO content

The following papers have been accepted for publication:

  • "BC4GO: A Full-Text Corpus for the BioCreative IV GO Task." Kimberly Van Auken, Mary L. Schaeffer, Peter McQuilton, Stanley J. F. Laulederkind, Donghui Li, Shur-Jen Wang, G. Thomas Hayman, Susan Tweedie, Cecilia N. Arighi, James Done, Hans-Michael Müller, Paul W. Sternberg, Yuqing Mao, Chih-Hsuan Wei, Zhiyong Lu. Submitted to Database, currently under revision.
  • "Overview of the Gene Ontology Task at BioCreative IV." Yuqing Mao, Kimberly Van Auken, Donghui Li, Cecilia N. Arighi, Peter McQuilton, G. Thomas Hayman, Susan Tweedie, Mary L. Schaeffer, Stanley J. F. Laulederkind, Shur-Jen Wang, Gobeill Julien, Ruch Patrick, Luu Anh Tuan, Jung-jae Kim, Jung-Hsien Chiang, Yu-De Chen, Chia-Jung Yang, Hongfang Liu, Dongqing Zhu, Yanpeng Li, Hong Yu, Ehsan Emadzadeh, Graciela Gonzalez, Jian-Ming Chen, Hong-Jie Dai, Zhiyong Lu. Submitted to Database.
  • "A method for increasing expressivity of Gene Ontology annotations using a compositional approach." Rachael P Huntley, Midori A Harris, Yasmin Alam-Faruque, Judith A Blake, Seth Carbon, Heiko Dietze, Emily C Dimmer, Rebecca E Foulger, David P Hill, Varsha K Khodiyar, Antonia Lock, Jane Lomax, Ruth C Lovering, Prudence Mutowo-Meullenet, Tony Sawford, Kimberly Van Auken, Valerie Wood and Christopher J Mungall. Submitted to BMC Bioinformatics.

b. Presentations including Talks and Tutorials and Teaching

c. Poster presentations

Other Highlights:

A. Ontology Development Contributions:

  • Term Requests:
    • meiotic cell cycle phase terms for annotation extension
    • response to nematocide
    • pri-miRNA transcription from RNA polymerase II promoter
    • terminal web
    • regulation of membrane permeability


B. Annotation Outreach and User Advocacy Efforts:

  • Kimberly Van Auken continues to serve on the GO-help rota.


C. Other Highlights:

  • WormBase GO Annotation Model - We continue work on a new GO annotation model for WormBase and testing sample data.
  • See also the Progress Report for the Common Annotation Framework for an update on integration of Textpresso Central into the GO's Common Annotation Framework.

Back to http://wiki.geneontology.org/index.php/Progress_reports_2nd_quarter_April-June