GOA December 2015

UniProt Gene Ontology Annotation (GOA) Project Summary 2015


EMBL-EBI has been a member of the GO Consortium since 2001. One of the major activities is the UniProt Gene Ontology Annotation project which is delivered by staff from the Protein Function Content and Development teams. The core UniProt-GOA project staff are primarily responsible for supplying the GO Consortium with manual and electronic GO annotations to the human proteome. UniProt-GOA staff not only create manual annotations, but coordinate and check the integration of GO annotations from other curation efforts at the EMBL-EBI (including from InterPro, IntAct and Reactome). The UniProt-GOA dataset is supplemented with manual annotations from 35 annotating groups, including all members of the GO Consortium, as well as a number of external groups which produce relevant functional data. Nine electronic annotation pipelines are incorporated into the UniProt-GOA dataset, which provide the vast majority of annotations for non-model organism species. UniProt-GOA is therefore able to consolidate multiple sources of specialised knowledge, ensuring the UniProt-GOA resource remains a key up-to-date reference for a large number of research communities.

In addition, all UniProt Knowledgebase (UniProtKB) curators in the Protein Function Content team at EMBL-EBI, SIB Swiss Institute of Bioinformatics (SIB) and Protein Information Resource (PIR) are actively involved in curating UniProtKB entries with Gene Ontology terms during the UniProt literature curation process, providing both high-quality manual GO annotations in addition to their contributions to electronic GO annotation pipelines. The multi-species nature of UniProtKB means that the GO Annotation project is able to assist in the GO curation of proteins from over 430,000 taxonomic groups.

Staff from the Protein Function Content and Development teams at EMBL-EBI who deliver the GOA project:

Claire O'Donovan, Protein Function Content Team Leader (Consortium PI)

Maria J. Martin Protein Function Development Team Leader (Senior Personnel)

Melanie Courtot GO/GOA Project Leader

Alexander Holmes GOA curator

Tony Sawford* GOA programmer

Aleksandra Shypitsyna* GOA curator

Tony Wardell GOA programmer

UniProt contributors (EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, UK; SIB, Geneva, Switzerland; and PIR, Washington DC): Ioannis Xenarios, Lydie Bougueleret, Ghislaine Argoud-Puy, Andrea Auchinchloss, Kristian Axelsen, Marie-Claude Blatter, Emmanuel Boutet, Lionel Breuza, Alan Bridge, Elizabeth Coudert, Isabelle Cusin, Paula Duek Roggli, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Penelope Garmiri, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Florence Jungo, Guillaume Keller, Kati Laiho, Philippe Lemercier, Damien Lieberherr, Michele Magrane, Patrick Masson, Ivo Pedruzzi, Klemens Pichler, Diego Poggioli, Sylvain Poux, Catherine Rivoire, Bernd Roechert, Michel Schneider, Elena Speretta, Andre Stutz, Shyamala Sundaram, Michael Tognolli

* Funded partially by GOC.

Annotation Progress

11 sets of UniProt-GOA release files were produced by the GOA project between January 2015 and November 2015. These included non-redundant sets of GO annotations to 13 specific proteomes as well as data releases for annotations of all proteins in UniProtKB.

The UniProt-GOA project currently provides GO annotations for 65% of UniProtKB entries. Altogether, UniProt-GOA now provides almost 396 million GO annotations for almost 34 million proteins in over 452,000 different taxonomic groups. UniProt-GOA provides 249,563 annotations for the 43,693 proteins in the human reference proteome. In the numbers presented below, there appears to be a decrease in annotations. This has been caused in the main by redundancy removal in the UniProt Knowledgebase in release 2015_04 when it shrank from 92 million to 46 million entries (http://www.uniprot.org/help/2015/04/01/release) and hence less entries for the electronic pipelines.

UniProt-GOA UniProt gene association file release stats (comparison of January 2015 and November 2015 releases)

GOA Release Stats November2015.png

Methods and strategies for annotation

Expert curation priorities:

1. Proteins associated with the human exosome (all GOA curators)

2. Moonlighting proteins (all GOA curators)

3. Requests from user community (all GOA curators)

4. Proteins annotated during UniProt curation duties (all UniProtKB curators at the EMBL-EBI, PIR and SIB)

5. Annotation corrections based on quality control reports (all curators)

Computational annotation:

UniProt-GOA provides IEA annotations from the following methods:

  1. UniProt Keyword 2GO (SPKW2GO)1,2
  2. UniProt Subcellular Locations2GO (SPSL2GO)1,2
  3. UniPathway2GO1,2
  4. HAMAP2GO1,2
  5. Unirule2GO1,2
  6. InterPro2GO
  7. Ensembl Compara (vertebrates)
  8. Ensembl Genomes Compara (plants, fungi)


1: mapping tables created and maintained by UniProt

2: electronic annotations generated by UniProt

UniProtKB curators supply information to entries that is subsequently used in electronic GO annotation pipelines such as UniProtKB keywords2GO, UniProtKB subcellular location2GO, UniRule2GO and HAMAP2GO. Altogether, automatic annotation pipelines provide 244 million annotations to almost 34 million proteins.

Presentations and Publications

a. Publications

The GOA database: gene Ontology annotation updates for 2015.(PMID:25378336 PMCID:PMC4383930) Huntley RP, Sawford T, Mutowo-Meullenet P, Shypitsyna A, Bonilla C, Martin MJ, O'Donovan C Nucleic Acids Res [2015, 43(database issue):d1057-63]

b. Presentations including Talks, Tutorials and Teaching

Claire O'Donovan, Manual and Automatic annotation of plants and animals in UniProtKB and GO, Plant & Animal Genome Conference 2015, 11th January 2015, San Diego, USA (talk)

Claire O'Donovan, International collaboration in biocuration: projects & data/expertise sharing, 25th April 2015, Biocuration Conference 2015, Beijing, China (seminar)

Melanie Courtot, EMBL-EBI training courses - Introduction to ontologies, 4th November 2015, Cambridge, UK (Full day workshop + hands-on tutorial).

Klemens Pichler, EMBL-EBI training courses - Standards and Ontologies, 11-12th November 2015, University of Umea, Sweden (Two full day workshops and hands-on tutorials).

c. Posters

UniProt-GOA: A central resource for data integration and GO annotation. Melanie Courtot, SWAT4LS, Cambridge UK December 2015

Ontology Development Contributions

  • All curators continue to request new GO terms or updates to the ontology where necessary, using either Term Genie or the SourceForge tracker.

Annotation Outreach and User Advocacy Efforts

  • Aleksandra Shypinitsa trained 3 new curators in GO annotation
  • Melanie Courtot is on the rota for the GO Consortium helpdesk
  • Melanie Courtot and Aleksandra Shypitsyna are on the rota for UniProt-GOA project helpdesk
  • The Protein Function teams support external annotation groups, such as AgBase, BHF-UCL, DFLAT at Tuft's University, SIB and PIR by providing use of the Protein2GO curation tool, including WormBase and SGD this year.
  • The Protein Function teams assist GO Consortium groups with migration of their annotations into the GOA files and UniProtKB, as well as providing access and training for the UniProt curation tool Protein2GO.
  • Access and training for the Protein2GO curation tool has been given to curators from the Synapse project.

Other Highlights

i. Improvements to the QuickGO user interface

Work to improve the QuickGO user interface has continued throughout 2015. This work also involves extending the range of features currently provided by QuickGO, as well as extensive testing for the new version of QuickGO and contributions to the user interface design.

ii. Improvements to the Protein2GO curation tool

As more GO Consortium curation groups migrate their annotations into the UniProt database and move to using Protein2GO as their sole curation tool for protein GO annotation, we continue to add more functionality to the tool.

  • support for new with_string format, plus all of the ECO-code-specific usage constraints

At the 2014 GOC meeting in Barcelona a change to the format of the with/from annotation column ("with_string") was agreed, which allows components of the with_string to be separated by both pipes and commas. In addition, a new set of rules was agreed that govern the usage, and acceptable format, of with/from with the GO evidence codes. Protein2GO now fully supports this enhanced format and the usage rules.

  • annotation to RNAcentral IDs

Following on from 2014's extension of the scope of GO annotation to allow annotations to be made to IntAct Complex Portal identifiers, the scope has been further extended this year to support annotations to (taxon-specific) RNAcentral identifiers.

  • highlighting of linked annotations

A number of usability enhancements have been made to Protein2GO this year, but one of the most useful is the highlighting of annotations that are linked in some way. For example, if the user selects an annotation that is part of a set of reciprocal annotations, then the other annotations in the set are highlighted; this makes the task of checking annotations, for example, much easier. In terms of number of users, we currently have 125 active users, with 25 different affiliations.