GOA December 2015: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
 
(11 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Category:GOA]] [[Category:Reports]]
[[Category:Reports - GOA]]
=UniProt Gene Ontology Annotation Summary 2015=
=UniProt Gene Ontology Annotation (GOA) Project Summary 2015=


=Overview=
=Overview=


The UniProt GO Annotation project (UniProtGOA) has been a member of the GO Consortium since 2001. All UniProt curators are actively involved in curating UniProtKB entries with Gene Ontology terms during the UniProt literature curation process, providing both high-quality manual GO annotations in addition to their contributions to electronic GO annotation pipelines. The multi-species nature of UniProtKB means that the GO Annotation project is able to assist in the GO curation of proteins from over 430,000 taxonomic groups.  
EMBL-EBI has been a member of the GO Consortium since 2001. One of the major activities is the UniProt Gene Ontology Annotation project which is delivered by staff from the Protein Function Content and Development teams. The core UniProt-GOA project staff are primarily responsible for supplying the GO Consortium with manual and electronic GO annotations to the human proteome. UniProt-GOA staff not only create manual annotations, but coordinate and check the integration of GO annotations from other curation efforts at the EMBL-EBI (including from InterPro, IntAct and Reactome). The UniProt-GOA dataset is supplemented with manual annotations from 35 annotating groups, including all members of the GO Consortium, as well as a number of external groups which produce relevant functional data. Nine electronic annotation pipelines are incorporated into the UniProt-GOA dataset, which provide the vast majority of annotations for non-model organism species. UniProt-GOA is therefore able to consolidate multiple sources of specialised knowledge, ensuring the UniProt-GOA resource remains a key up-to-date reference for a large number of research communities.  


The core UniProt-GOA project staff are primarily responsible for supplying the GO Consortium with manual and electronic GO annotations to the human proteome. UniProt-GOA staff not only create manual annotations, but coordinate and check the integration of GO annotations from other curation efforts at the EBI (including from InterPro, IntAct and Reactome). The UniProt-GOA dataset is supplemented with manual annotations from 35 annotating groups, including all members of the GO Consortium, as well as a number of external groups which produce relevant functional data. Nine electronic annotation pipelines are incorporated into the UniProt-GOA dataset, which provide the vast majority of annotations for non-model organism species. UniProt-GOA is therefore able to consolidate multiple sources of specialised knowledge, ensuring the UniProt-GOA resource remains a key up-to-date reference for a large number of research communities.  
In addition, all UniProt Knowledgebase (UniProtKB) curators in the Protein Function Content team at EMBL-EBI, SIB Swiss Institute of Bioinformatics (SIB) and Protein Information Resource (PIR) are actively involved in curating UniProtKB entries with Gene Ontology terms during the UniProt literature curation process, providing both high-quality manual GO annotations in addition to their contributions to electronic GO annotation pipelines. The multi-species nature of UniProtKB means that the GO Annotation project is able to assist in the GO curation of proteins from over 430,000 taxonomic groups.  


= Staff: =


Claire O'Donovan


Maria Martin
= Staff from the Protein Function Content and Development teams at EMBL-EBI who deliver the GOA project: =


Melanie Courtot (new hire)
Claire O'Donovan, Protein Function Content Team Leader (Consortium PI)


Tony Sawford*
Maria J. Martin Protein Function Development Team Leader (Senior Personnel)


Aleksandra Shypitsyna
Melanie Courtot GO/GOA Project Leader


Elena Speretta  (new hire)
Alexander Holmes GOA curator


Alexander Holmes  (new hire)
Tony Sawford* GOA programmer


Penelope Garmiri (maternity leave)
Aleksandra Shypitsyna* GOA curator


UniProt contributors (EBI, Hinxton, UK; SIB, Geneva, Switzerland; and PIR, Washington DC): Ioannis Xenarios, Lydie Bougueleret
Tony Wardell GOA programmer


Ghislaine Argoud-Puy, Andrea Auchinchloss, Kristian Axelsen, Marie-Claude Blatter, Emmanuel Boutet, Lionel Breuza, Alan Bridge, Gayatri Chavali, Elena Cibrian-Uhalte, Elizabeth Coudert, Isabelle Cusin, Paula Duek Roggli, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Arnaud Gos, Nadine Gruaz-Gumowski, Reija Hieta, Ursula Hinz, Chantal Hulo, Janet James, Florence Jungo, Guillaume Keller, Kati Laiho, Duncan Legge, Philippe Lemercier, Damien Lieberherr, Michele Magrane, Patrick Masson, Ivo Pedruzzi, Klemens Pichler, Diego Poggioli, Sylvain Poux, Catherine Rivoire, Bernd Roechert, Michel Schneider, Andre Stutz, Shyamala Sundaram, Michael Tognolli  
UniProt contributors (EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, UK; SIB, Geneva, Switzerland; and PIR, Washington DC): Ioannis Xenarios, Lydie Bougueleret, Ghislaine Argoud-Puy, Andrea Auchinchloss, Kristian Axelsen, Marie-Claude Blatter, Emmanuel Boutet, Lionel Breuza, Alan Bridge, Elizabeth Coudert, Isabelle Cusin, Paula Duek Roggli, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Penelope Garmiri, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Florence Jungo, Guillaume Keller, Kati Laiho, Philippe Lemercier, Damien Lieberherr, Michele Magrane, Patrick Masson, Ivo Pedruzzi, Klemens Pichler, Diego Poggioli, Sylvain Poux, Catherine Rivoire, Bernd Roechert, Michel Schneider, Elena Speretta, Andre Stutz, Shyamala Sundaram, Michael Tognolli  


<nowiki>*</nowiki> Funded entirely or partially by GO.
<nowiki>*</nowiki> Funded partially by GOC.


=Annotation Progress=
=Annotation Progress=


X annotation files were released by the GOA project between January 2015 and November 2015. These included non-redundant sets of GO annotations to 13 specific proteomes as well as data releases for annotations of all proteins in UniProtKB.
11 sets of UniProt-GOA release files were produced by the GOA project between January 2015 and November 2015. These included non-redundant sets of GO annotations to 13 specific proteomes as well as data releases for annotations of all proteins in UniProtKB.


Manual annotations originating from other GO Consortium members and affiliates are incorporated into UniProt and displayed in the relevant UniProtKB entries. The UniProt-GO Annotation project currently provides GO annotations for 65% of UniProt entries. Altogether, UniProt-GOA now provides almost 396 million GO annotations for almost 34 million proteins in over 452,000 different taxonomic groups. UniProt-GOA provides 249 563 annotations for the 43, 693 proteins in the human reference proteome.  
The UniProt-GOA project currently provides GO annotations for 65% of UniProtKB entries. Altogether, UniProt-GOA now provides almost 396 million GO annotations for almost 34 million proteins in over 452,000 different taxonomic groups. UniProt-GOA provides 249,563 annotations for the 43,693 proteins in the human reference proteome. In the numbers presented below, there appears to be a decrease in annotations. This has been caused in the main by redundancy removal in the UniProt Knowledgebase in release 2015_04 when it shrank from 92 million to 46 million entries (http://www.uniprot.org/help/2015/04/01/release) and hence less entries for the electronic pipelines.  
   
   
<center>'''UniProt-GOA UniProt gene association file release stats (comparison of January 2015 and November 2015 releases)'''</center>
<center>'''UniProt-GOA UniProt gene association file release stats (comparison of January 2015 and November 2015 releases)'''</center>


To be updated
[[File:GOA_Release_Stats_November2015.png]]


'''Key'''


<nowiki>*</nowiki>Reduction in Electronic annotations due to enforcing taxon constraints as hard checks during migration into UniProt-GOA database
==Methods and strategies for annotation==


<nowiki>**</nowiki> Reduction in the number of annotations that are assigned by 'GOC' due to further improvements to the pipeline that creates the GO Consortium 'inferred' annotations to reduce redundancy.
'''''Expert curation priorities:'''''


<nowiki>***</nowiki>New sources of annotation after January 2014
1. Proteins associated with the human exosome (all GOA curators)


2. Moonlighting proteins (all GOA curators)


3. Requests from user community (all GOA curators)
4. Proteins annotated during UniProt curation duties (all UniProtKB curators at the EMBL-EBI, PIR and SIB)


==Methods and strategies for annotation==
5. Annotation corrections based on quality control reports (all curators)
 
'''''Literature curation:'''''
 


'''''Computational annotation:'''''
'''''Computational annotation:'''''
Line 66: Line 63:
# UniProt Keyword 2GO (SPKW2GO)<sup>1,2</sup>
# UniProt Keyword 2GO (SPKW2GO)<sup>1,2</sup>
# UniProt Subcellular Locations2GO (SPSL2GO)<sup>1,2</sup>
# UniProt Subcellular Locations2GO (SPSL2GO)<sup>1,2</sup>
# Unipathway2GO<sup>1,2</sup>
# UniPathway2GO<sup>1,2</sup>
# HAMAP2GO<sup>1,2</sup>
# HAMAP2GO<sup>1,2</sup>
# Unirule2GO<sup>1,2</sup>
# InterPro2GO
# InterPro2GO
# Ensembl Compara (vertebrates)
# Ensembl Compara (vertebrates)
Line 78: Line 76:
<b><sup>2</sup></b>: electronic annotations generated by UniProt
<b><sup>2</sup></b>: electronic annotations generated by UniProt


UniProt curators supply information to entries that is subsequently used in electronic GO annotation pipelines such as UniProtKB keywords2GO, UniProtKB subcellular location2GO and HAMAP2GO. Altogether, automatic annotation pipelines provide 244 million annotations to almost 34 million proteins.  
UniProtKB curators supply information to entries that is subsequently used in electronic GO annotation pipelines such as UniProtKB keywords2GO, UniProtKB subcellular location2GO, UniRule2GO and HAMAP2GO. Altogether, automatic annotation pipelines provide 244 million annotations to almost 34 million proteins.  


'''''Priorities for annotation'''''
=Presentations and Publications=


1. Proteins associated with the exosome (Prudence, Aleksandra)
''a. Publications''


2. Requests from user community (all curators)
The GOA database: gene Ontology annotation updates for 2015.(PMID:25378336 PMCID:PMC4383930)  
Huntley RP, Sawford T, Mutowo-Meullenet P, Shypitsyna A, Bonilla C, Martin MJ, O'Donovan C
3. Proteins annotated during Swiss-Prot curation duties (all Swiss-Prot/UniProtKB curators at the EBI and SIB)
Nucleic Acids Res [2015, 43(database issue):d1057-63]


4. Annotation corrections based on quality control reports (all curators)


=Presentations and Publications=


''a. Publications''
''b. Presentations including Talks, Tutorials and Teaching''
Claire O'Donovan, Manual and Automatic annotation of plants and animals in UniProtKB and GO, Plant & Animal Genome Conference 2015, 11th January 2015, San Diego, USA (talk)


Claire O'Donovan, International collaboration in biocuration: projects & data/expertise sharing, 25th April 2015, Biocuration Conference 2015, Beijing, China (seminar)


Melanie Courtot, EMBL-EBI training courses - Introduction to ontologies, 4th November 2015, Cambridge, UK (Full day workshop + hands-on tutorial).


''b. Presentations including Talks, Tutorials and Teaching''
Klemens Pichler, EMBL-EBI training courses - Standards and Ontologies, 11-12th November 2015, University of Umea, Sweden (Two full day workshops and hands-on tutorials).
Melanie Courtot. EMBL-EBI training courses - Introduction to ontologies  4 November 2015, Cambridge, UK (Full day workshop + hands-on tutorial).  




Line 111: Line 109:
=Annotation Outreach and User Advocacy Efforts=
=Annotation Outreach and User Advocacy Efforts=


* Prudence Mutowo-Meullenet trained two new SyScilia Consortium curators in GO annotation  
*     Aleksandra Shypinitsa trained 3 new curators in GO annotation  
* Rachael Huntley and Prudence Mutowo-Meullenet continue to answer queries sent to the GO Consortium helpdesk  
*     Melanie Courtot is on the rota for the GO Consortium helpdesk  
* Rachael Huntley, Prudence Mutowo-Meullenet and Aleksandra Shypitsyna continue to answer user queries sent to the UniProt-GOA project  
*     Melanie Courtot and Aleksandra Shypitsyna are on the rota for UniProt-GOA project helpdesk
* UniProt is continuing to support external annotation groups, such as AgBase, BHF-UCL, DFLAT at Tuft's University, SIB and PIR by providing use of the Protein2GO curation tool.  
*     The Protein Function teams support external annotation groups, such as AgBase, BHF-UCL, DFLAT at Tuft's University, SIB and PIR by providing use of the Protein2GO curation tool, including WormBase and SGD this year.  
* UniProt is continuing to assist GO Consortium groups with migration of their annotations into the UniProt database, as well as providing access and training for the UniProt curation tool    Protein2GO.  
* The Protein Function teams assist GO Consortium groups with migration of their annotations into the GOA files and UniProtKB, as well as providing access and training for the UniProt curation tool    Protein2GO.  
* Access and training for the Protein2GO curation tool has been given to curators from the Syscilia consortium.
* Access and training for the Protein2GO curation tool has been given to curators from the Synapse project.


=Other Highlights=
=Other Highlights=
Line 122: Line 120:
''i. Improvements to the QuickGO user interface''
''i. Improvements to the QuickGO user interface''


Work to improve the QuickGO user interface has continued throughout 2014. This work also involves extending the range of features currently provided by QuickGO.
Work to improve the QuickGO user interface has continued throughout 2015. This work also involves extending the range of features currently provided by QuickGO, as well as extensive testing for the new version of QuickGO and contributions to the user interface design.


''ii. Improvements to the Protein2GO curation tool''
''ii. Improvements to the Protein2GO curation tool''
Line 137: Line 135:
A number of usability enhancements have been made to Protein2GO this year, but one of the most useful is the highlighting of annotations that are linked in some way. For example, if the user selects an annotation that is part of a set of reciprocal annotations, then the other annotations in the set are highlighted; this makes the task of checking annotations, for example, much easier.  
A number of usability enhancements have been made to Protein2GO this year, but one of the most useful is the highlighting of annotations that are linked in some way. For example, if the user selects an annotation that is part of a set of reciprocal annotations, then the other annotations in the set are highlighted; this makes the task of checking annotations, for example, much easier.  
In terms of number of users, we currently have 125 active users, with 25 different affiliations.
In terms of number of users, we currently have 125 active users, with 25 different affiliations.
'''November 2014'''
All annotation files that we provide, in both GAF and GPAD format, now contain a GO-version tag in their header which gives the IRI of the version of the GO that was current when the files were published, for example:
!GO-version: http://purl.obolibrary.org/obo/go/releases/2014-11-13/go.owl
This allows consumers of our annotation files to link a specific set of annotations to a specific version of the ontology.
'''October 2014'''
We now incorporate GO annotations to IntAct Complex Portal identifiers.
The IntAct Complex Portal (http://www.ebi.ac.uk/intact/complex/) is a manually curated resource of macromolecular complexes. These annotations are currently visible in our annotation files, except those that are based on UniProt reference proteomes as these contain annotations only to UniProtKB entries. The annotations are not visible in the current version of QuickGO (www.ebi.ac.uk/QuickGO), but will be available from the new version, which is due for release in the near future.
'''July 2014'''
We have improved the accuracy of automatic annotations by removing those annotations that violate taxon constraints. Some GO terms are applicable only to certain taxa and this is encoded in the GO taxon constraints. For example, if a GO term that is valid for use only with eukaryotes, e.g. GO:0000165 'MAPK cascade', is applied to a bacterial protein, the annotation would be incorrect and it would be deleted.
This process resulted in the deletion of approximately 106,000 incorrect electronic annotations.
'''March 2014'''
We are now incorporating manual annotations from the Alzheimer's Project at the University of Toronto. Further information on this project can be found at http://wiki.geneontology.org/index.php/Alzheimer%27s_Disease_Annotation_Project,
'''February 2014'''
We made further improvements to the pipeline that creates the GO Consortium 'inferred' annotations to reduce redundancy. This has caused a large decrease in the number of annotations that are assigned by 'GOC'.
Since the beginning of 2014 January we have included annotations from a new project "Parkinson's UK-UCL", which is a project led by Dr. Ruth Lovering at University College London to annotate proteins involved in Parkinson's disease. Further information on this project can be found at http://www.ucl.ac.uk/cardiovasculargeneontology/cardiovascular/newsletters
'''January 2014'''
We have suspended submission to the GO Consortium (GOC) of species-specific Gene Association Files if another group is responsible for the provision of GO annotations to that species. The following files were affected:
gene_association.goa_arabidopsis
gene_association.goa_mouse
gene_association.goa_rat
gene_association.goa_zebrafish
These files will no longer be available from the GOC annotation download webpage (http://www.geneontology.org/GO.downloads.annotations.shtml) nor the
GOC ftp site (ftp://ftp.geneontology.org/pub/go/gene-associations/submission/).
Users will still be able to get annotations for all of these species from the UniProt multispecies file on the GOC website (http://www.geneontology.org/GO.downloads.annotations.shtml#unfilter).
The above species-specific files will continue to be made available from the UniProt-GOA ftp site (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/).
As of the UniProt-GOA release in February 2014, we will remove all of the archived species-specific files mentioned above from the GOC CVS repository. These archived files will still be available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/old/
We noted a substantial increase in GO Consortium 'inferred' annotations. These annotations are automatically created based on inter-ontology links between Molecular Function and Biological Process terms and between Biological Process and Cellular Component terms. The increase is due to enhancements to the pipeline to take account of the GO hierarchy.
The UniProt-GOA gene association files now include manual annotations for Trypanosoma brucei and Leishmania major that have been created by the GeneDB project. Details of GeneDB can be found at: http://www.genedb.org/Homepage

Latest revision as of 05:33, 16 April 2019

UniProt Gene Ontology Annotation (GOA) Project Summary 2015

Overview

EMBL-EBI has been a member of the GO Consortium since 2001. One of the major activities is the UniProt Gene Ontology Annotation project which is delivered by staff from the Protein Function Content and Development teams. The core UniProt-GOA project staff are primarily responsible for supplying the GO Consortium with manual and electronic GO annotations to the human proteome. UniProt-GOA staff not only create manual annotations, but coordinate and check the integration of GO annotations from other curation efforts at the EMBL-EBI (including from InterPro, IntAct and Reactome). The UniProt-GOA dataset is supplemented with manual annotations from 35 annotating groups, including all members of the GO Consortium, as well as a number of external groups which produce relevant functional data. Nine electronic annotation pipelines are incorporated into the UniProt-GOA dataset, which provide the vast majority of annotations for non-model organism species. UniProt-GOA is therefore able to consolidate multiple sources of specialised knowledge, ensuring the UniProt-GOA resource remains a key up-to-date reference for a large number of research communities.

In addition, all UniProt Knowledgebase (UniProtKB) curators in the Protein Function Content team at EMBL-EBI, SIB Swiss Institute of Bioinformatics (SIB) and Protein Information Resource (PIR) are actively involved in curating UniProtKB entries with Gene Ontology terms during the UniProt literature curation process, providing both high-quality manual GO annotations in addition to their contributions to electronic GO annotation pipelines. The multi-species nature of UniProtKB means that the GO Annotation project is able to assist in the GO curation of proteins from over 430,000 taxonomic groups.



Staff from the Protein Function Content and Development teams at EMBL-EBI who deliver the GOA project:

Claire O'Donovan, Protein Function Content Team Leader (Consortium PI)

Maria J. Martin Protein Function Development Team Leader (Senior Personnel)

Melanie Courtot GO/GOA Project Leader

Alexander Holmes GOA curator

Tony Sawford* GOA programmer

Aleksandra Shypitsyna* GOA curator

Tony Wardell GOA programmer

UniProt contributors (EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, UK; SIB, Geneva, Switzerland; and PIR, Washington DC): Ioannis Xenarios, Lydie Bougueleret, Ghislaine Argoud-Puy, Andrea Auchinchloss, Kristian Axelsen, Marie-Claude Blatter, Emmanuel Boutet, Lionel Breuza, Alan Bridge, Elizabeth Coudert, Isabelle Cusin, Paula Duek Roggli, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Penelope Garmiri, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Florence Jungo, Guillaume Keller, Kati Laiho, Philippe Lemercier, Damien Lieberherr, Michele Magrane, Patrick Masson, Ivo Pedruzzi, Klemens Pichler, Diego Poggioli, Sylvain Poux, Catherine Rivoire, Bernd Roechert, Michel Schneider, Elena Speretta, Andre Stutz, Shyamala Sundaram, Michael Tognolli

* Funded partially by GOC.

Annotation Progress

11 sets of UniProt-GOA release files were produced by the GOA project between January 2015 and November 2015. These included non-redundant sets of GO annotations to 13 specific proteomes as well as data releases for annotations of all proteins in UniProtKB.

The UniProt-GOA project currently provides GO annotations for 65% of UniProtKB entries. Altogether, UniProt-GOA now provides almost 396 million GO annotations for almost 34 million proteins in over 452,000 different taxonomic groups. UniProt-GOA provides 249,563 annotations for the 43,693 proteins in the human reference proteome. In the numbers presented below, there appears to be a decrease in annotations. This has been caused in the main by redundancy removal in the UniProt Knowledgebase in release 2015_04 when it shrank from 92 million to 46 million entries (http://www.uniprot.org/help/2015/04/01/release) and hence less entries for the electronic pipelines.

UniProt-GOA UniProt gene association file release stats (comparison of January 2015 and November 2015 releases)


Methods and strategies for annotation

Expert curation priorities:

1. Proteins associated with the human exosome (all GOA curators)

2. Moonlighting proteins (all GOA curators)

3. Requests from user community (all GOA curators)

4. Proteins annotated during UniProt curation duties (all UniProtKB curators at the EMBL-EBI, PIR and SIB)

5. Annotation corrections based on quality control reports (all curators)

Computational annotation:

UniProt-GOA provides IEA annotations from the following methods:


  1. UniProt Keyword 2GO (SPKW2GO)1,2
  2. UniProt Subcellular Locations2GO (SPSL2GO)1,2
  3. UniPathway2GO1,2
  4. HAMAP2GO1,2
  5. Unirule2GO1,2
  6. InterPro2GO
  7. Ensembl Compara (vertebrates)
  8. Ensembl Genomes Compara (plants, fungi)

Key

1: mapping tables created and maintained by UniProt

2: electronic annotations generated by UniProt

UniProtKB curators supply information to entries that is subsequently used in electronic GO annotation pipelines such as UniProtKB keywords2GO, UniProtKB subcellular location2GO, UniRule2GO and HAMAP2GO. Altogether, automatic annotation pipelines provide 244 million annotations to almost 34 million proteins.

Presentations and Publications

a. Publications

The GOA database: gene Ontology annotation updates for 2015.(PMID:25378336 PMCID:PMC4383930) Huntley RP, Sawford T, Mutowo-Meullenet P, Shypitsyna A, Bonilla C, Martin MJ, O'Donovan C Nucleic Acids Res [2015, 43(database issue):d1057-63]


b. Presentations including Talks, Tutorials and Teaching

Claire O'Donovan, Manual and Automatic annotation of plants and animals in UniProtKB and GO, Plant & Animal Genome Conference 2015, 11th January 2015, San Diego, USA (talk)

Claire O'Donovan, International collaboration in biocuration: projects & data/expertise sharing, 25th April 2015, Biocuration Conference 2015, Beijing, China (seminar)

Melanie Courtot, EMBL-EBI training courses - Introduction to ontologies, 4th November 2015, Cambridge, UK (Full day workshop + hands-on tutorial).

Klemens Pichler, EMBL-EBI training courses - Standards and Ontologies, 11-12th November 2015, University of Umea, Sweden (Two full day workshops and hands-on tutorials).


c. Posters

UniProt-GOA: A central resource for data integration and GO annotation. Melanie Courtot, SWAT4LS, Cambridge UK December 2015

Ontology Development Contributions

  • All curators continue to request new GO terms or updates to the ontology where necessary, using either Term Genie or the SourceForge tracker.

Annotation Outreach and User Advocacy Efforts

  • Aleksandra Shypinitsa trained 3 new curators in GO annotation
  • Melanie Courtot is on the rota for the GO Consortium helpdesk
  • Melanie Courtot and Aleksandra Shypitsyna are on the rota for UniProt-GOA project helpdesk
  • The Protein Function teams support external annotation groups, such as AgBase, BHF-UCL, DFLAT at Tuft's University, SIB and PIR by providing use of the Protein2GO curation tool, including WormBase and SGD this year.
  • The Protein Function teams assist GO Consortium groups with migration of their annotations into the GOA files and UniProtKB, as well as providing access and training for the UniProt curation tool Protein2GO.
  • Access and training for the Protein2GO curation tool has been given to curators from the Synapse project.

Other Highlights

i. Improvements to the QuickGO user interface

Work to improve the QuickGO user interface has continued throughout 2015. This work also involves extending the range of features currently provided by QuickGO, as well as extensive testing for the new version of QuickGO and contributions to the user interface design.

ii. Improvements to the Protein2GO curation tool

As more GO Consortium curation groups migrate their annotations into the UniProt database and move to using Protein2GO as their sole curation tool for protein GO annotation, we continue to add more functionality to the tool.

  • support for new with_string format, plus all of the ECO-code-specific usage constraints

At the 2014 GOC meeting in Barcelona a change to the format of the with/from annotation column ("with_string") was agreed, which allows components of the with_string to be separated by both pipes and commas. In addition, a new set of rules was agreed that govern the usage, and acceptable format, of with/from with the GO evidence codes. Protein2GO now fully supports this enhanced format and the usage rules.

  • annotation to RNAcentral IDs

Following on from 2014's extension of the scope of GO annotation to allow annotations to be made to IntAct Complex Portal identifiers, the scope has been further extended this year to support annotations to (taxon-specific) RNAcentral identifiers.

  • highlighting of linked annotations

A number of usability enhancements have been made to Protein2GO this year, but one of the most useful is the highlighting of annotations that are linked in some way. For example, if the user selects an annotation that is part of a set of reciprocal annotations, then the other annotations in the set are highlighted; this makes the task of checking annotations, for example, much easier. In terms of number of users, we currently have 125 active users, with 25 different affiliations.