PomBase December 2015: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
(Progress report)
 
(40 intermediate revisions by the same user not shown)
Line 15: Line 15:
=== 2. Annotation progress ===
=== 2. Annotation progress ===


==== 2.a Annotation Breadth ====


{| class="wikitable" cellpadding="5"
{| class="wikitable" cellpadding="5"
!Annotation Type !! l| Dec 2014  !! Dec 5, 2015 !! Change !!% change
!Annotation Type !! l| Nov 2014  !! Nov 24, 2015
|-
|-
|  Total Number of Genes:  2014 Nos * protein coding only || 5052* || 5451 ||n/a  || n/a
|  Total Number of Genes:  (2014 * protein coding only) || 5052* || 5451  
|-
|-
|Total Genes  with at least one (non root) GO  || ????? || 5289 ||n/a|| n/a
|Total Genes  with at least one (non root) GO  || 5279 || 5289  
|-
|-
|Total Genes  with at least one (non root) BP  || 4316*  || 4668 || n/a|| n/a
|Total Genes  with at least one (non root) BP  || 4316*  || 4668  
|-
|-
|Total Genes  with at least one (non root) CC  || 4894*  || 5191  || n/a|| n/a
|Total Genes  with at least one (non root) CC  || 4894*  || 5191   
|-
|-
|Total Genes  with at least one (non root) MF  || 3682*  || 3958 || n/a|| n/a
|Total Genes  with at least one (non root) MF  || 3682*  || 3958  
|-
|-
| Total Annotations: || 39233 || 40718 || +1435  || +3.65
| Total Annotations: || 39233 || 40718  
|-
|-
|colspan=5 align=center bgcolor=white| '''Annotation by Direct Experiment'''
|colspan=3 align=center bgcolor=white| '''Annotation by Experiment'''
|-
|-
|Total genes curated by experiment ||???? ||??? ||tba||tba
|Total genes with experimental annotation ||4572||4558
|-
|-
|Total genes curated by experiment ||???? ||??? ||tba||tba
|Total annotations by experiment ||18568 ||20912
|-
|-
|colspan=5 align=center bgcolor=white| '''Annotation by Orthology'''
|colspan=3 align=center bgcolor=white| '''Annotation by Orthology'''
|-
|-
|Total Genes Annotated by Orthology ||?????|| 3484 ||tba ||tba
|Total Genes Annotated by Orthology ||3196|| 3484  
|-
|-
|Total Orthology Annotation ||???? || 9346 ||tba ||tba
|Total Annotations by orthology ||9624 || 9346  
|-
|-
| Annotation by PomBase||?????|| 9134 ||tba ||tba
| Annotation by PomBase||8994|| 9134  
|-
|-
| Annotation by GOA-UniProt ||???? || 128 ||tba ||tba
| Annotation by GOA-UniProt ||119|| 128  
|-
|-
| Annotation by GOC ||???? || 4 ||tba ||tba
| Annotation by GOC ||468 || 4  
|-
|-
|colspan=5 align=center bgcolor=white| '''IEA Annotation'''
|colspan=3 align=center bgcolor=white| '''IEA Annotation'''
|-
|-
|Total Annotations: ||5342 || 4290 || tba||tba
|Total Annotations: ||5342 || 4290  
|}
|}


==== 2.b Annotation Depth ====
We ensure annotation depth is maintained by  identifying terms where it should always be possible to make a more specific annotation.
The current list of  restricted terms is 1220, with only 56 violations (all annotation sources).
==== 2.c Annotation extensions  ====
4223 extensions on 799 gene products


=== 3. Methods and strategies for annotation ===
=== 3. Methods and strategies for annotation ===


a. Literature curation:
A. Literature curation:


PomBase curation focus is on literature curation. PomBase  does not have dedicated GO curators but curate all aspects of papers including GO, phenotypes (single and multi gene, alleles names and descriptions, conditions), modifications, modifiers,  upstream, downstream targets, genetic and physical interactions, complementations, orthology, disease associations, proteins features, gene co-ordinates, other sequence features)
PomBase curation focus is on literature curation. PomBase  does not have dedicated GO curators but curate all aspects of papers including GO, phenotypes (single and multi gene, alleles names and descriptions, conditions), modifications, modifiers,  upstream, downstream targets, genetic and physical interactions, complementations, orthology, disease associations, proteins features, gene co-ordinates, other sequence features)
Line 63: Line 72:
Community curation
Community curation


B. Computational annotation strategies:


b. Computational annotation strategies:
C. Priorities for annotation:
 
 
c. Priorities for annotation:
 
1. Literature describing the characterization of previously unknown genes, or    genes previously unstudied in fission yeast
 
2. Legacy papers related to well studied processes, especially those  providing 'direct substrate' annotation extension to build  high quality networks/pathways based on GO data see "regulation of mitotic cell cycle" http://tinyurl.com/osp4nnq


* i. Literature describing the characterization of previously unknown genes, or    genes previously unstudied in fission yeast
* ii. Approval of community curation submissions (new publications)
* iii. Legacy papers related to well studied processes, especially those  providing 'direct substrate' annotation extension to build  high quality
networks/pathways based on GO data see "regulation of mitotic cell cycle" http://tinyurl.com/osp4nnq
  Priority processes
  Priority processes
  * mitotic chromosome segregation (including spindle checkpoint) (vw)
  * mitotic chromosome segregation (including spindle checkpoint) (vw)
  * G2/M and metaphase/anaphase cell cycle transitions (vw)
  * G2/M and metaphase/anaphase cell cycle transitions (vw)
  * cytokinesis (vw)
  * cytokinesis (vw)
Line 81: Line 88:
  * conjugation (al)
  * conjugation (al)


3. Approval of community curation submissions (new publications)
(There are no genes with no GO annotation, or only IEA/ISO but with available literature as these are processed immediately)
 


There are no genes with no GO annotation, or only IEA/ISO but with available literature as these are processed immediately


=== 4. Presentations and publications ===
=== 4. Presentations and publications ===


==== a. Papers with substantial GO content ====
 
* An Ancient Yeast for Young Geneticists: A Primer on the Schizosaccharomyces pombe Model System. Hoffman CS, Wood V, Fantes PA.
Genetics. 2015 Oct;201(2):403-23. doi: 10.1534/genetics.115.181503. PMID: 26447128
* PomBase 2015: updates to the fission yeast database. McDowall MD, Harris MA, Lock A, Rutherford K, Staines DM, Bähler J, Kersey PJ, Oliver SG, Wood V. Nucleic Acids Res. 2015 Jan;43(Database issue):D656-61. doi: 10.1093/nar/gku1040. Epub 2014 Oct 31. PMID: 25361970
==== b. Presentations including Talks and Tutorials and Teaching  ====
pombe2015 - Eighth International Fission Yeast Meetin Kobe, Japan, 21-26 June 2015


a. Papers with substantial GO content
* How to get more from PomBase - a brief tour of PomBase gene pages, curated data types, querying, tools, and external links (VW)
* Community curation using Canto - how to create, edit and submit annotations based on published experiments  (MAH)
* Hidden in plain sight: The eukaryotically conserved unstudied proteins and a framework for their classification and characterisation


b. Presentations including Talks and Tutorials and Teaching 
Cambridge University, teaching
   
   
c. Poster presentations with GO content
* Curation Workshop: Title: Genome Annotation Cambridge University Part III Biochemistry (VW, MAH, AL)
           
* Lecture: Title: Databases and Genome Annotation (Semantic Systems Biology) Cambridge University Part II Systems Biology (VW)
 
==== c. Poster presentations with GO content ====
 
pombe2015 - Eighth International Fission Yeast Meetin Kobe, Japan, 21-26 June 2015
* Hidden in plain sight: The eukaryotically conserved unstudied proteins and a framework for their classification and characterisation
 
27th International Conference on Yeast Genetics and Molecular Biology (ICYGMB), Levico Terme, Italy, 6-12 September 2015 
*  Hidden in plain sight: The eukaryotically conserved unstudied proteins and a framework for their classification and characterisation
 
 


=== 5. Other Highlights ===
=== 5. Other Highlights ===


'''A. GO terms and related contributions by PomBase'''
'''A. GO terms and related contributions by PomBase'''
Line 102: Line 132:
http://tinyurl.com/qzz7kn9
http://tinyurl.com/qzz7kn9


Requested *** new terms via term Genie
Requested 185 new terms via term Genie


Submitted 960 electronic mapping update requests
Submitted 960 electronic mapping update requests
Line 110: Line 140:
'''B. Annotation outreach and user advocacy efforts'''
'''B. Annotation outreach and user advocacy efforts'''


See workshops and presntation above
See workshops and presentation above.
 
We answer many GO related queries via our helpdesk and this is reflected in the large number of  PomBase FAQ items referring to GO http://www.pombase.org/faq
 
Solicited 78 responses to GO servey via PomBase 'collector'
 


'''C. Other highlights'''
'''C. Other highlights'''


1. Building  networks based on GO annotation
1. We have piloted a system to generate high quality physical networks directly from Gene Ontology annotation data. We are currently using these networks to target literature curation gaps.
Dataset
Dataset
http://www.pombase.org/documentation/high-confidence-physical-interaction-network
http://www.pombase.org/documentation/high-confidence-physical-interaction-network
Access to process based high confidence interaction networks
Access to process based high confidence interaction networks
http://www.pombase.org/browse-curation/fission-yeast-go-slim-terms
http://www.pombase.org/browse-curation/fission-yeast-go-slim-terms


2. Increasing GO slim breadth
2. Increasing GO slim breadth
Line 124: Line 160:
Proteins with a biological process annotation not covered by the slim (gene count: 115)
Proteins with a biological process annotation not covered by the slim (gene count: 115)
Proteins with no GO slim or biological process annotation (gene count: 756)
Proteins with no GO slim or biological process annotation (gene count: 756)


3. Creating inventories of  conserved and non conserved unstudied genes
3. Creating inventories of  conserved and non conserved unstudied genes
http://www.pombase.org/status/priority-unstudied-genes
http://www.pombase.org/status/priority-unstudied-genes
In fission yeast the "unknown" inventory is around 831 entries, many of which are apparently species-specific. However, a large number (511) are conserved, and a significant number of these (183) have orthologs in vertebrates.  
In fission yeast the "unknown" inventory is around 831 entries, many of which are apparently species-specific. However, a large number (511) are conserved, and a significant number of these (183) have orthologs in vertebrates.  


4.  Developing curation rules to improve curation depth and accuracy
4.  Developing curation rules to improve curation depth and accuracy
http://www.slideshare.net/ValerieWood/pombase-internal-rules-for-curation-using-ontologies
http://www.slideshare.net/ValerieWood/pombase-internal-rules-for-curation-using-ontologies
Includes
Includes:
a) Removing  IEA redundancy PomBase filters >87% of available IEA annotation (covered by manual annotation). Redundant experimental annotations are not filtered.
* Removing  IEA redundancy PomBase filters >87% of available IEA annotation (covered by manual annotation). Redundant experimental annotations are not filtered.
b)  Flagging high level GO terms as "not for direct annotation". PomBase excludes 1120 GO terms for direct annotation.
* Flagging high level GO terms as "not for direct annotation". PomBase excludes 1120 GO terms for direct annotation.
 


5. We have developed Canto, an intuitive web-based interface to support literature curation using ontologies, and a literature management environment. Canto supports  GO curation and is a generic component of the GMOD project and can be easily configured for use with other organisms, and ontologies.  Currently  Canto is being extended to fully support annotation extensions by community curators by restricting curator  options based on domain and range  of terms applicable to a given relationship. We are also working to use taxon constraints to limit available term choices  by species.
5. We have developed Canto, an intuitive web-based interface to support literature curation using ontologies, and a literature management environment. Canto supports  GO curation and is a generic component of the GMOD project and can be easily configured for use with other organisms, and ontologies.  Currently  Canto is being extended to fully support annotation extensions by community curators by restricting curator  options based on domain and range  of terms applicable to a given relationship. We are also working to use taxon constraints to limit available term choices  by species.
6. Community curation. Over 300 papers fully or partially curated by the fission yeast community (includes GO,  and other data types: phenotypes, modifications, genetic and physical interactions)
7. Our  phenotype ontology  FYPO now has over 4000 terms with logical definitions that refer to GO terms (using over 900 different GO terms)

Latest revision as of 05:47, 26 November 2015


PomBase, December 2015

PomBase is the Model Organism Database for the fission yeast Schizosaccharomyces pombe (www.pombase.org)

1. Staff working on GOC tasks

PomBase GO curators: Valerie Wood, Midori Harris, Antonia Lock

Developers associated with GO related projects at PomBase: online curation tool, pipelines, Website, automated QC, Query tools: Mark McDowall, Kim Rutherford

NO STAFF FUNDED BY GO OR GOC GRANTS

2. Annotation progress

==== 2.a Annotation Breadth ====
Annotation Type Nov 2014 Nov 24, 2015
Total Number of Genes: (2014 * protein coding only) 5052* 5451
Total Genes with at least one (non root) GO 5279 5289
Total Genes with at least one (non root) BP 4316* 4668
Total Genes with at least one (non root) CC 4894* 5191
Total Genes with at least one (non root) MF 3682* 3958
Total Annotations: 39233 40718
Annotation by Experiment
Total genes with experimental annotation 4572 4558
Total annotations by experiment 18568 20912
Annotation by Orthology
Total Genes Annotated by Orthology 3196 3484
Total Annotations by orthology 9624 9346
Annotation by PomBase 8994 9134
Annotation by GOA-UniProt 119 128
Annotation by GOC 468 4
IEA Annotation
Total Annotations: 5342 4290

2.b Annotation Depth

We ensure annotation depth is maintained by identifying terms where it should always be possible to make a more specific annotation. The current list of restricted terms is 1220, with only 56 violations (all annotation sources).

2.c Annotation extensions

4223 extensions on 799 gene products

3. Methods and strategies for annotation

A. Literature curation:

PomBase curation focus is on literature curation. PomBase does not have dedicated GO curators but curate all aspects of papers including GO, phenotypes (single and multi gene, alleles names and descriptions, conditions), modifications, modifiers, upstream, downstream targets, genetic and physical interactions, complementations, orthology, disease associations, proteins features, gene co-ordinates, other sequence features)

Community curation

B. Computational annotation strategies:

C. Priorities for annotation:

  • i. Literature describing the characterization of previously unknown genes, or genes previously unstudied in fission yeast
  • ii. Approval of community curation submissions (new publications)
  • iii. Legacy papers related to well studied processes, especially those providing 'direct substrate' annotation extension to build high quality

networks/pathways based on GO data see "regulation of mitotic cell cycle" http://tinyurl.com/osp4nnq

Priority processes
* mitotic chromosome segregation (including spindle checkpoint) (vw)
* G2/M and metaphase/anaphase cell cycle transitions (vw)
* cytokinesis (vw)
* DNA metabolism (mah)
* signalling pathways (al)
* conjugation (al)

(There are no genes with no GO annotation, or only IEA/ISO but with available literature as these are processed immediately)


4. Presentations and publications

a. Papers with substantial GO content

  • An Ancient Yeast for Young Geneticists: A Primer on the Schizosaccharomyces pombe Model System. Hoffman CS, Wood V, Fantes PA.

Genetics. 2015 Oct;201(2):403-23. doi: 10.1534/genetics.115.181503. PMID: 26447128

  • PomBase 2015: updates to the fission yeast database. McDowall MD, Harris MA, Lock A, Rutherford K, Staines DM, Bähler J, Kersey PJ, Oliver SG, Wood V. Nucleic Acids Res. 2015 Jan;43(Database issue):D656-61. doi: 10.1093/nar/gku1040. Epub 2014 Oct 31. PMID: 25361970

b. Presentations including Talks and Tutorials and Teaching

pombe2015 - Eighth International Fission Yeast Meetin Kobe, Japan, 21-26 June 2015

  • How to get more from PomBase - a brief tour of PomBase gene pages, curated data types, querying, tools, and external links (VW)
  • Community curation using Canto - how to create, edit and submit annotations based on published experiments (MAH)
  • Hidden in plain sight: The eukaryotically conserved unstudied proteins and a framework for their classification and characterisation

Cambridge University, teaching

  • Curation Workshop: Title: Genome Annotation Cambridge University Part III Biochemistry (VW, MAH, AL)
  • Lecture: Title: Databases and Genome Annotation (Semantic Systems Biology) Cambridge University Part II Systems Biology (VW)

c. Poster presentations with GO content

pombe2015 - Eighth International Fission Yeast Meetin Kobe, Japan, 21-26 June 2015

  • Hidden in plain sight: The eukaryotically conserved unstudied proteins and a framework for their classification and characterisation

27th International Conference on Yeast Genetics and Molecular Biology (ICYGMB), Levico Terme, Italy, 6-12 September 2015

  • Hidden in plain sight: The eukaryotically conserved unstudied proteins and a framework for their classification and characterisation


5. Other Highlights

A. GO terms and related contributions by PomBase

Submitted 2228 term or ontology update requests http://tinyurl.com/qzz7kn9

Requested 185 new terms via term Genie

Submitted 960 electronic mapping update requests http://tinyurl.com/oh45ouh


B. Annotation outreach and user advocacy efforts

See workshops and presentation above.

We answer many GO related queries via our helpdesk and this is reflected in the large number of PomBase FAQ items referring to GO http://www.pombase.org/faq

Solicited 78 responses to GO servey via PomBase 'collector'


C. Other highlights

1. We have piloted a system to generate high quality physical networks directly from Gene Ontology annotation data. We are currently using these networks to target literature curation gaps. Dataset http://www.pombase.org/documentation/high-confidence-physical-interaction-network Access to process based high confidence interaction networks http://www.pombase.org/browse-curation/fission-yeast-go-slim-terms


2. Increasing GO slim breadth http://www.pombase.org/browse-curation/fission-yeast-go-slim-terms Proteins with a biological process annotation not covered by the slim (gene count: 115) Proteins with no GO slim or biological process annotation (gene count: 756)


3. Creating inventories of conserved and non conserved unstudied genes http://www.pombase.org/status/priority-unstudied-genes In fission yeast the "unknown" inventory is around 831 entries, many of which are apparently species-specific. However, a large number (511) are conserved, and a significant number of these (183) have orthologs in vertebrates.


4. Developing curation rules to improve curation depth and accuracy http://www.slideshare.net/ValerieWood/pombase-internal-rules-for-curation-using-ontologies Includes:

  • Removing IEA redundancy PomBase filters >87% of available IEA annotation (covered by manual annotation). Redundant experimental annotations are not filtered.
  • Flagging high level GO terms as "not for direct annotation". PomBase excludes 1120 GO terms for direct annotation.


5. We have developed Canto, an intuitive web-based interface to support literature curation using ontologies, and a literature management environment. Canto supports GO curation and is a generic component of the GMOD project and can be easily configured for use with other organisms, and ontologies. Currently Canto is being extended to fully support annotation extensions by community curators by restricting curator options based on domain and range of terms applicable to a given relationship. We are also working to use taxon constraints to limit available term choices by species.

6. Community curation. Over 300 papers fully or partially curated by the fission yeast community (includes GO, and other data types: phenotypes, modifications, genetic and physical interactions)

7. Our phenotype ontology FYPO now has over 4000 terms with logical definitions that refer to GO terms (using over 900 different GO terms)