GOA December 2014: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
 
(12 intermediate revisions by one other user not shown)
Line 1: Line 1:
[[Category:GOA]] [[Category:Reports]]
[[Category:Reports - GOA]]
=UniProt Gene Ontology Annotation Summary, December 2014=
=UniProt Gene Ontology Annotation Summary 2014=


=Overview=
=Overview=
Line 17: Line 17:
Rachael Huntley*
Rachael Huntley*


Prudence Mutowo-Muellenet
Prudence Mutowo-Meullenet


Tony Sawford*
Tony Sawford*
Line 42: Line 42:
<center>'''UniProt-GOA UniProt gene association file release stats (comparison of January 2014 and November 2014 releases)'''</center>
<center>'''UniProt-GOA UniProt gene association file release stats (comparison of January 2014 and November 2014 releases)'''</center>


[[Image:GOA_Stats_2013a.png|800px]]
[[Image:GOA_Stats_2014.png|800px]]
[[Image:GOA_Stats_2013b.png|800px]]


'''Key'''
'''Key'''
Line 49: Line 48:
<nowiki>*</nowiki>Reduction in Electronic annotations due to enforcing taxon constraints as hard checks during migration into UniProt-GOA database
<nowiki>*</nowiki>Reduction in Electronic annotations due to enforcing taxon constraints as hard checks during migration into UniProt-GOA database


<nowiki>**</nowiki>New sources of annotation after January 2014
<nowiki>**</nowiki> Reduction in the number of annotations that are assigned by 'GOC' due to further improvements to the pipeline that creates the GO Consortium 'inferred' annotations to reduce redundancy.
 
<nowiki>***</nowiki>New sources of annotation after January 2014
 




Line 90: Line 92:
3. Proteins annotated during Swiss-Prot curation duties (all Swiss-Prot/UniProtKB curators at the EBI and SIB)  
3. Proteins annotated during Swiss-Prot curation duties (all Swiss-Prot/UniProtKB curators at the EBI and SIB)  


4. Annotation corrections based on quality control reports (all curators)  
4. Annotation corrections based on quality control reports (all curators)
 


=Presentations and Publications=
=Presentations and Publications=
Line 97: Line 98:
''a. Publications''
''a. Publications''


UniProt Consortium. 2013. Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res. 2013 Jan;41(Database issue):D43-7. [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531094/ PMC3531094]


Alam-Faruque Y, Hill DP, Dimmer EC, Harris MA, Foulger RE, Tweedie S, Attrill H, Howe DG, Thomas SR, Davidson D, Woolf AS, Blake JA, Mungall CJ, O'Donovan C, Apweiler R, Huntley RP. (2014) Representing kidney development using the gene ontology. PlosOne. 2014 ;9(6):e99864 /PMC4062467/
Alam-Faruque Y, Hill DP, Dimmer EC, Harris MA, Foulger RE, Tweedie S, Attrill H, Howe DG, Thomas SR, Davidson D, Woolf AS, Blake JA, Mungall CJ, O'Donovan C, Apweiler R, Huntley RP. (2014) Representing kidney development using the gene ontology. PlosOne. 2014 ;9(6):e99864 [http://europepmc.org/abstract/MED/24941002 PMC4062467]


Huntley RP, Sawford T, Martin MJ, O'Donovan C. (2014)  
Huntley RP, Sawford T, Martin MJ, O'Donovan C. (2014)  
Understanding how and why the Gene Ontology and its annotations evolve: the GO within UniProt. Gigascience. 2014;3(1):4. PMC3995153
Understanding how and why the Gene Ontology and its annotations evolve: the GO within UniProt. Gigascience. 2014;3(1):4. [http://europepmc.org/abstract/MED/24641996 PMC3995153]


The UniProt Consortium 2014. Activities at the Universal Protein Resource (UniProt)
The UniProt Consortium 2014. Activities at the Universal Protein Resource (UniProt)
Nucleic Acids Res. 42: D191-D198 (2014).
Nucleic Acids Res. 42: D191-D198 (2014). PMID[http://nar.oxfordjournals.org/content/42/D1/D191.long 24253303]


Huntley RP, Harris MA, Alam-Faruque Y, Blake JA, Carbon S, Dietze H, Dimmer EC, Foulger RE, Hill DP, Khodiyar VK, Lock A, Lomax J, Lovering RC, Mutowo-Meullenet P, Sawford T, Van Auken K, Wood V, Mungall CJ. A method for increasing expressivity of Gene Ontology annotations using a compositional approach. BMC Bioinformatics Volume 15 (2014) p.155 PMC4039540
Huntley RP, Harris MA, Alam-Faruque Y, Blake JA, Carbon S, Dietze H, Dimmer EC, Foulger RE, Hill DP, Khodiyar VK, Lock A, Lomax J, Lovering RC, Mutowo-Meullenet P, Sawford T, Van Auken K, Wood V, Mungall CJ. A method for increasing expressivity of Gene Ontology annotations using a compositional approach. BMC Bioinformatics Volume 15 (2014) p.155 [[[http://europepmc.org/abstract/MED/24885854 PMC4039540]]]


The UniProt Consortium 2014. UniProt: a hub for protein information. Nucleic Acids Res [2014] PMID:25348405
The UniProt Consortium 2014. UniProt: a hub for protein information. Nucleic Acids Res [2014] [http://nar.oxfordjournals.org/content/early/2014/10/27/nar.gku989.long PMID:25348405]




Line 115: Line 115:
''b. Presentations including Talks, Tutorials and Teaching''
''b. Presentations including Talks, Tutorials and Teaching''


Rachael Huntley. Automated function prediction SIG at ISMB/ECCB 11-15 July  2014, Boston, USA. (talk).


Rachael Huntley. The Gene Ontology Resource - Common Misconceptions and Improving Accuracy. Automated Function Prediction SIG at ISMB/ECCB July 20 2013, Berlin, Germany. (talk).
Rachael Huntley. Plant and Animal Genome Conference 11-15 January 2014. San Diego, USA (Hands-on tutorial).  


Rachael Huntley. Using the Gene Ontology. EMBL-EBI Industry Programme Workshop: "A practical overview of biomedical ontologies". April 17-18 2013, Hinxton, UK. (Hands-on tutorial).
Prudence Mutowo. Gene Ontology annotation training  25-27 October  2014 Belo Horizonte Brazil. Depto. Bioquímica e Imunologia, ICB, UFMG  (Hands-on tutorial).
 
Prudence Mutowo. Use of Gene Ontology Annotation to understand the peroxisome proteome in humans. 2013 International Biocuration Conference, April 7-10, Cambridge, UK. (talk).
Prudence Mutowo. Using Gene Ontology to understand peroxisome function in human 28 October  2014 Belo Horizonte Brazil. Depto. Bioquímica e Imunologia, ICB, UFMG  (Hands-on tutorial).  
 
Prudence Mutowo. [http://www.ebi.ac.uk/training/course/bioinformatics-roadshow-zimbabwe EBI Bioinformatics Roadshow] January 2013 Zimbabwe. (Hands-on tutorial).




''c. Posters''
''c. Posters''


The UniProt-GOA resource. EMBL-EBI Open Day, Hinxton. November 7 2013. Prudence Mutowo and Aleksandra Shypitsyna.
UniProt and the CAFA challenge. ISMB/ECCB 11-15 July 2014, Boston, USA. Rachael Huntley.
 
The Gene Ontology Resource - Common Misconceptions and Improving Accuracy. Automated Function Prediction SIG at ISMB/ECCB July 20 2013, Berlin, Germany. Rachael Huntley.
 
Protein2GO: A curation tool for Gene Ontology. 2013 International Biocuration Conference, April 7-10, Cambridge, UK. Rachael Huntley.
 
The UniProt-GOA resource. EMBL-EBI Open Day, Hinxton. March 14 2013. Rachael Huntley.


=Ontology Development Contributions=
=Ontology Development Contributions=


* Rachael Huntley was part of the cell cycle ontology development meeting to discuss changes to the cell cycle node of the ontology. This was held on 28th February-1st March in EMBL-EBI, Hinxton
* All curators continue to request new GO terms or updates to the ontology where necessary, using either Term Genie or the SourceForge tracker.
 
* All curators continue to request new GO terms or updates to the ontology where necessary, using either Term Genie or the SourceForge tracker


=Annotation Outreach and User Advocacy Efforts=
=Annotation Outreach and User Advocacy Efforts=


* Rachael Huntley and Prudence Mutowo-Meullenet trained two new UniProt (EBI) curators in GO annotation
* Prudence Mutowo-Meullenet trained two new SyScilia Consortium curators in GO annotation  
 
* Rachael Huntley and Prudence Mutowo-Meullenet continue to answer queries sent to the GO Consortium helpdesk  
* Rachael Huntley and Prudence Mutowo-Meullenet trained two GeneDB curators in GO annotation
* Rachael Huntley, Prudence Mutowo-Meullenet and Aleksandra Shypitsyna continue to answer user queries sent to the UniProt-GOA project  
 
* UniProt is continuing to support external annotation groups, such as AgBase, BHF-UCL, DFLAT at Tuft's University, SIB and PIR by providing use of the Protein2GO curation tool.  
* Rachael Huntley and Prudence Mutowo-Meullenet continue to answer queries sent to the GO Consortium helpdesk
* UniProt is continuing to assist GO Consortium groups with migration of their annotations into the UniProt database, as well as providing access and training for the UniProt curation tool     Protein2GO.  
 
* Access and training for the Protein2GO curation tool has been given to curators from the Syscilia consortium.
* Rachael Huntley, Prudence Mutowo-Meullenet and Aleksandra Shypitsyna continue to answer user queries sent to the UniProt-GOA project
 
* UniProt is continuing to support external annotation groups, such as AgBase, BHF-UCL, DFLAT at Tuft's University, SIB and PIR by providing use of the Protein2GO curation tool.  
 
* UniProt is continuing to assist GO Consortium groups with migration of their annotations into the UniProt database, as well as providing access and training for the UniProt curation tool Protein2GO. Both WormBase and SGD completed this transition in 2013.
 
* Access and training for the Protein2GO curation tool has been given to curators from the CamCellNet cilial protein curation project.
 
* Rachael Huntley is involved in a GO Consortium collaboration with a team at the Norwegian University of Science and Technology to assist them in making annotations for transcription factors and their target genes. Two meetings have been held at the EBI during 2013, the first (7-8 February) to discuss annotation guidelines and incorporation of annotations and second, a curation tool workshop on 26 September. This collaboration has so far resulted in one publication.
 
* As part of the continuing exosome annotation project, Prudence Mutowo-Meullenet is in contact with the ExoCarta database to assist them in providing UniProt with annotations they have for exosome proteins.
 
* Together with Rama Balakrishnan from SGD, Rachael Huntley is a manager for the GO Consortium's Annotation Advocacy and Coordination group. The aims of the group are to;
 
  * educate GO Consortium curators about best annotation practice
  * enforce the annotation rules and policies within the GOC
  * maintain the annotation and evidence code documentation
  * educate and keep all the annotating groups up-to-date with changes in GAF format and ontology development
  * assist new groups with annotations


=Other Highlights=
=Other Highlights=
Line 173: Line 145:
''i. Improvements to the QuickGO user interface''
''i. Improvements to the QuickGO user interface''


Work to improve the QuickGO user interface has continued throughout 2013. QuickGO will retain all of its current functionality but is designed to be more intuitive to use.
Work to improve the QuickGO user interface has continued throughout 2014. This work also involves extending the range of features currently provided by QuickGO.


''ii. Improvements to the Protein2GO curation tool''
''ii. Improvements to the Protein2GO curation tool''


As more GO Consortium curation groups migrate their annotations into the UniProt database and move to using Protein2GO as their sole curation tool for protein GO annotation, we have been adding features to assist these groups in searching for MOD identifiers they are more familiar with. Protein2GO can now be used to search for specific MOD IDs and return the equivalent UniProt accession(s). An indication is given as to whether the UniProt accession is reviewed or unreviewed. Curators from other MODs are also able to put MOD identifiers into the 'with' field.
As more GO Consortium curation groups migrate their annotations into the UniProt database and move to using Protein2GO as their sole curation tool for protein GO annotation, we continue to add more functionality to the tool.  


Another addition was a Literature Search link that allows GO curators to perform keyword searches on nine different Textpresso corpora from within Protein2GO.  
Protein2GO now allows for the creation of annotations to protein complexes. Protein complex identifiers obtained from the IntAct protein complex portal at the EBI can now be used to make GO annotations to complexes. Annotation guidance for complex annotation will evolve as more annotations are created.  


As GO curation guidelines evolve, we continue to add quality control checks into Protein2GO to prevent incorrect annotations from being created. These include;
Protein2GO now has an ‘Author Contact’ feature. This allows curators to email corresponding authors after they have curated their paper. The emails are sent out at release time and invite authors to view annotations created from their publication; inviting their feedback should they have any. Since the introduction of this feature, we have received positive emails from authors regarding the annotations created and the useful of GO in capturing information from their publications.


* Prevention of ISS annotations to blacklisted GO term:protein combinations
'''November 2014'''
* Prevention of 'NOT' qualified annotations when an annotation extension is present
* Warn curators if they update a UniProt accession of an annotation that has been copied to other proteins


Other features implemented include;
All annotation files that we provide, in both GAF and GPAD format, now contain a GO-version tag in their header which gives the IRI of the version of the GO that was current when the files were published, for example:


* Ability to dispute electronic annotations
!GO-version: http://purl.obolibrary.org/obo/go/releases/2014-11-13/go.owl
* Allow ISS annotations to be made to a less granular term
* Implemented subsets for annotation extension relations, which simplifies use of this field for the curator
* Ability to make a comment on ISS annotations to indicate how the group of target proteins was chosen
* Provision of a link out to Textpresso, used by various MODs
* Ability to use the new format UniProt accession
* Further SQL-based query reports in response to curator's needs


This allows consumers of our annotation files to link a specific set of annotations to a specific version of the ontology.


'''November 2013'''


In our 125th release of the multispecies UniProt file, we supplied over 200 million annotations to over 30 million proteins from all our collaborating groups.
'''October 2014'''


'''October 2013'''
We now incorporate GO annotations to IntAct Complex Portal identifiers.  
 
The IntAct Complex Portal (http://www.ebi.ac.uk/intact/complex/) is a manually curated resource of macromolecular complexes. These annotations are currently visible in our annotation files, except those that are based on UniProt reference proteomes as these contain annotations only to UniProtKB entries. The annotations are not visible in the current version of QuickGO (www.ebi.ac.uk/QuickGO), but will be available from the new version, which is due for release in the near future.
Manual annotations are now being integrated into our files for several bacterial and viral species as provided by the Community Assessment of Community Annotation with Ontologies (CACAO), a project to provide large-scale manual community annotation of gene function using the Gene Ontology.  
Details of the CACAO project can be found at: http://gowiki.tamu.edu/wiki/index.php/Category:CACAO
 
The CACAO annotations can be viewed in QuickGO: http://www.ebi.ac.uk/QuickGO/GAnnotation?source=CACAO


'''July 2014'''


'''July 2013'''
We have improved the accuracy of automatic annotations by removing those annotations that violate taxon constraints. Some GO terms are applicable only to certain taxa and this is encoded in the GO taxon constraints. For example, if a GO term that is valid for use only with eukaryotes, e.g. GO:0000165 'MAPK cascade', is applied to a bacterial protein, the annotation would be incorrect and it would be deleted.
This process resulted in the deletion of approximately 106,000 incorrect electronic annotations.


We have rationalized the attribution display for UniProt- and Ensembl-created annotations.
'''March 2014'''
This attribution is shown in the 'assigned_by' column, which is column 15 of the Gene Association Files (GAF) and column 10 of the Gene Product Assocation Data files (GPAD). UniProt-created annotations all now have the attribution 'UniProt', this includes UniProt manual annotation and automatic annotation based on EC, HAMAP, UniProt keywords, UniProt subcellular location and UniPathway. Ensembl-created annotations all now have the attribution 'Ensembl', this includes the automatic annotation from Ensembl vertebrates, EnsemblFungi and EnsemblPlants/Gramene.


'''June 2013'''
We are now incorporating manual annotations from the Alzheimer's Project at the University of Toronto. Further information on this project can be found at http://wiki.geneontology.org/index.php/Alzheimer%27s_Disease_Annotation_Project,


Manual annotations are now being integrated into our files for the filamentous fungi, Aspergillus, that have been created by the AspGD Project, as well as manual annotations for Pseudomonas aeruginosa that have been created by the Pseudomonas aeruginosa Community Annotation Project (PseudoCAP).
'''February 2014'''


Details of the AspGD project can be found at: http://www.aspgd.org/
We made further improvements to the pipeline that creates the GO Consortium 'inferred' annotations to reduce redundancy. This has caused a large decrease in the number of annotations that are assigned by 'GOC'.  
Details of the PseudoCAP project can be found at: http://www.pseudomonas.com/


'''May 2013'''
Since the beginning of 2014 January we have included annotations from a new project "Parkinson's UK-UCL", which is a project led by Dr. Ruth Lovering at University College London to annotate proteins involved in Parkinson's disease. Further information on this project can be found at http://www.ucl.ac.uk/cardiovasculargeneontology/cardiovascular/newsletters


We now release a set of species-specific annotation files that are based on UniProt reference proteomes that provide one protein per gene. The protein accessions included in these files are the protein sequences annotated in Swiss-Prot or the longest TrEMBL transcript if there is no Swiss-Prot record.


The files are released in both GAF2.0 and GPAD format and there is an accompanying Gene Product Information file, which contains additional information on all the protein accessions in the species' reference proteome.
'''January 2014'''


'''April 2013'''
We have suspended submission to the GO Consortium (GOC) of species-specific Gene Association Files if another group is responsible for the provision of GO annotations to that species. The following files were affected:


We now supply the species-specific annotation files in GPAD1.1 format.
gene_association.goa_arabidopsis
gene_association.goa_mouse
gene_association.goa_rat
gene_association.goa_zebrafish


Manual annotations are now being integrated into our files, for archaeal and bacterial species, that have been supplied by the Microbial ENergy processes Gene Ontology Project (MENGO). Details of the MENGO project can be found at: http://mengo.vbi.vt.edu/
These files will no longer be available from the GOC annotation download webpage (http://www.geneontology.org/GO.downloads.annotations.shtml) nor the
GOC ftp site (ftp://ftp.geneontology.org/pub/go/gene-associations/submission/).
Users will still be able to get annotations for all of these species from the UniProt multispecies file on the GOC website (http://www.geneontology.org/GO.downloads.annotations.shtml#unfilter).  


'''March 2013'''
The above species-specific files will continue to be made available from the UniProt-GOA ftp site (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/).


We now supply the UniProt annotation file in GPAD1.1 format.  
As of the UniProt-GOA release in February 2014, we will remove all of the archived species-specific files mentioned above from the GOC CVS repository. These archived files will still be available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/old/


'''February 2013'''
We noted a substantial increase in GO Consortium 'inferred' annotations. These annotations are automatically created based on inter-ontology links between Molecular Function and Biological Process terms and between Biological Process and Cellular Component terms. The increase is due to enhancements to the pipeline to take account of the GO hierarchy.


We have changed the referencing of GO Cellular Component annotations from the Human Protein Atlas (HPA) and LIFEdb. Previously all HPA annotations were referenced by PMID:18029348 and all LIFEdb annotations by PMID:11256614. These papers describe the pilot studies and methodology used to obtain the annotations rather than experiments for the individual protein localizations. It was decided that these annotations would be more correctly described using a GO reference (GO_REF), which is an abstract describing the methodology behind a set of annotations.
The UniProt-GOA gene association files now include manual annotations for Trypanosoma brucei and Leishmania major that have been created by the GeneDB project. Details of GeneDB can be found at: http://www.genedb.org/Homepage
The HPA annotations are now referenced by GO_REF:0000052 and the LIFEdb annotations by GO_REF:0000054. Both references are described here: http://www.geneontology.org/cgi-bin/references.cgi

Latest revision as of 05:33, 16 April 2019

UniProt Gene Ontology Annotation Summary 2014

Overview

The UniProt GO Annotation project (UniProtGOA) has been a member of the GO Consortium since 2001. All UniProt curators are actively involved in curating UniProtKB entries with Gene Ontology terms during the UniProt literature curation process, providing both high-quality manual GO annotations in addition to their contributions to electronic GO annotation pipelines. The multi-species nature of UniProtKB means that the GO Annotation project is able to assist in the GO curation of proteins from over 430,000 taxonomic groups.

The core UniProt-GOA project staff are primarily responsible for supplying the GO Consortium with manual and electronic GO annotations to the human proteome. UniProt-GOA staff not only create manual annotations, but coordinate and check the integration of GO annotations from other curation efforts at the EBI (including from InterPro, IntAct and Reactome). The UniProt-GOA dataset is supplemented with manual annotations from 35 annotating groups, including all members of the GO Consortium, as well as a number of external groups which produce relevant functional data. Nine electronic annotation pipelines are incorporated into the UniProt-GOA dataset, which provide the vast majority of annotations for non-model organism species. UniProt-GOA is therefore able to consolidate multiple sources of specialised knowledge, ensuring the UniProt-GOA resource remains a key up-to-date reference for a large number of research communities.


Staff:

Claire O'Donovan

Maria Martin

Rachael Huntley*

Prudence Mutowo-Meullenet

Tony Sawford*

Aleksandra Shypitsyna

Carlos Bonilla

Penelope Garmiri

UniProt contributors (EBI, Hinxton, UK; SIB, Geneva, Switzerland; and PIR, Washington DC): Ioannis Xenarios, Lydie Bougueleret

Ghislaine Argoud-Puy, Andrea Auchinchloss, Kristian Axelsen, Marie-Claude Blatter, Emmanuel Boutet, Lionel Breuza, Alan Bridge, Gayatri Chavali, Elena Cibrian-Uhalte, Elizabeth Coudert, Isabelle Cusin, Paula Duek Roggli, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Arnaud Gos, Nadine Gruaz-Gumowski, Reija Hieta, Ursula Hinz, Chantal Hulo, Janet James, Florence Jungo, Guillaume Keller, Kati Laiho, Duncan Legge, Philippe Lemercier, Damien Lieberherr, Michele Magrane, Patrick Masson, Ivo Pedruzzi, Klemens Pichler, Diego Poggioli, Sylvain Poux, Catherine Rivoire, Bernd Roechert, Michel Schneider, Andre Stutz, Shyamala Sundaram, Michael Tognolli

* Funded entirely or partially by GO.

Annotation Progress

11 annotation files were released by the GOA project between January 2014 and November 2014. These included non-redundant sets of GO annotations to 13 specific proteomes as well as data releases for annotations of all proteins in UniProtKB.

Manual annotations originating from other GO Consortium members and affiliates are incorporated into UniProt and displayed in the relevant UniProtKB entries. The UniProt-GO Annotation project currently provides GO annotations for 65% of UniProt entries. Altogether, UniProt-GOA now provides almost 396 million GO annotations for almost 34 million proteins in over 452,000 different taxonomic groups. UniProt-GOA provides 249 563 annotations for the 43, 693 proteins in the human reference proteome.


UniProt-GOA UniProt gene association file release stats (comparison of January 2014 and November 2014 releases)

Key

*Reduction in Electronic annotations due to enforcing taxon constraints as hard checks during migration into UniProt-GOA database

** Reduction in the number of annotations that are assigned by 'GOC' due to further improvements to the pipeline that creates the GO Consortium 'inferred' annotations to reduce redundancy.

***New sources of annotation after January 2014


Methods and strategies for annotation

Literature curation:

During 2014 we continued annotating proteins that are experimentally determined to be located in the extracellular vesicular exosome using annotation extensions to capture contextual information for each protein.

We completed annotating a list of approximately 400 human proteins that are targets of the Critical Assessment of Functional Annotation (CAFA) competition. We curated the primary functions and processes of these proteins in order to populate these targets with functional annotations, which assisted in the assessment of the CAFA competition.


Computational annotation:

UniProt-GOA provides IEA annotations from the following methods:


  1. UniProt Keyword 2GO (SPKW2GO)1,2
  2. UniProt Subcellular Locations2GO (SPSL2GO)1,2
  3. Unipathway2GO1,2
  4. HAMAP2GO1,2
  5. InterPro2GO
  6. Ensembl Compara (vertebrates)
  7. Ensembl Genomes Compara (plants, fungi)

Key

1: mapping tables created and maintained by UniProt

2: electronic annotations generated by UniProt

UniProt curators supply information to entries that is subsequently used in electronic GO annotation pipelines such as UniProtKB keywords2GO, UniProtKB subcellular location2GO and HAMAP2GO. Altogether, automatic annotation pipelines provide 244 million annotations to almost 34 million proteins.

Priorities for annotation

1. Proteins associated with the exosome (Prudence, Aleksandra)

2. Requests from user community (all curators)

3. Proteins annotated during Swiss-Prot curation duties (all Swiss-Prot/UniProtKB curators at the EBI and SIB)

4. Annotation corrections based on quality control reports (all curators)

Presentations and Publications

a. Publications


Alam-Faruque Y, Hill DP, Dimmer EC, Harris MA, Foulger RE, Tweedie S, Attrill H, Howe DG, Thomas SR, Davidson D, Woolf AS, Blake JA, Mungall CJ, O'Donovan C, Apweiler R, Huntley RP. (2014) Representing kidney development using the gene ontology. PlosOne. 2014 ;9(6):e99864 PMC4062467

Huntley RP, Sawford T, Martin MJ, O'Donovan C. (2014) Understanding how and why the Gene Ontology and its annotations evolve: the GO within UniProt. Gigascience. 2014;3(1):4. PMC3995153

The UniProt Consortium 2014. Activities at the Universal Protein Resource (UniProt) Nucleic Acids Res. 42: D191-D198 (2014). PMID24253303

Huntley RP, Harris MA, Alam-Faruque Y, Blake JA, Carbon S, Dietze H, Dimmer EC, Foulger RE, Hill DP, Khodiyar VK, Lock A, Lomax J, Lovering RC, Mutowo-Meullenet P, Sawford T, Van Auken K, Wood V, Mungall CJ. A method for increasing expressivity of Gene Ontology annotations using a compositional approach. BMC Bioinformatics Volume 15 (2014) p.155 [[PMC4039540]]

The UniProt Consortium 2014. UniProt: a hub for protein information. Nucleic Acids Res [2014] PMID:25348405


b. Presentations including Talks, Tutorials and Teaching

Rachael Huntley. Automated function prediction SIG at ISMB/ECCB 11-15 July 2014, Boston, USA. (talk).

Rachael Huntley. Plant and Animal Genome Conference 11-15 January 2014. San Diego, USA (Hands-on tutorial).

Prudence Mutowo. Gene Ontology annotation training 25-27 October 2014 Belo Horizonte Brazil. Depto. Bioquímica e Imunologia, ICB, UFMG (Hands-on tutorial).

Prudence Mutowo. Using Gene Ontology to understand peroxisome function in human 28 October 2014 Belo Horizonte Brazil. Depto. Bioquímica e Imunologia, ICB, UFMG (Hands-on tutorial).


c. Posters

UniProt and the CAFA challenge. ISMB/ECCB 11-15 July 2014, Boston, USA. Rachael Huntley.

Ontology Development Contributions

  • All curators continue to request new GO terms or updates to the ontology where necessary, using either Term Genie or the SourceForge tracker.

Annotation Outreach and User Advocacy Efforts

  • Prudence Mutowo-Meullenet trained two new SyScilia Consortium curators in GO annotation
  • Rachael Huntley and Prudence Mutowo-Meullenet continue to answer queries sent to the GO Consortium helpdesk
  • Rachael Huntley, Prudence Mutowo-Meullenet and Aleksandra Shypitsyna continue to answer user queries sent to the UniProt-GOA project
  • UniProt is continuing to support external annotation groups, such as AgBase, BHF-UCL, DFLAT at Tuft's University, SIB and PIR by providing use of the Protein2GO curation tool.
  • UniProt is continuing to assist GO Consortium groups with migration of their annotations into the UniProt database, as well as providing access and training for the UniProt curation tool Protein2GO.
  • Access and training for the Protein2GO curation tool has been given to curators from the Syscilia consortium.

Other Highlights

i. Improvements to the QuickGO user interface

Work to improve the QuickGO user interface has continued throughout 2014. This work also involves extending the range of features currently provided by QuickGO.

ii. Improvements to the Protein2GO curation tool

As more GO Consortium curation groups migrate their annotations into the UniProt database and move to using Protein2GO as their sole curation tool for protein GO annotation, we continue to add more functionality to the tool.

Protein2GO now allows for the creation of annotations to protein complexes. Protein complex identifiers obtained from the IntAct protein complex portal at the EBI can now be used to make GO annotations to complexes. Annotation guidance for complex annotation will evolve as more annotations are created.

Protein2GO now has an ‘Author Contact’ feature. This allows curators to email corresponding authors after they have curated their paper. The emails are sent out at release time and invite authors to view annotations created from their publication; inviting their feedback should they have any. Since the introduction of this feature, we have received positive emails from authors regarding the annotations created and the useful of GO in capturing information from their publications.

November 2014

All annotation files that we provide, in both GAF and GPAD format, now contain a GO-version tag in their header which gives the IRI of the version of the GO that was current when the files were published, for example:

!GO-version: http://purl.obolibrary.org/obo/go/releases/2014-11-13/go.owl

This allows consumers of our annotation files to link a specific set of annotations to a specific version of the ontology.


October 2014

We now incorporate GO annotations to IntAct Complex Portal identifiers. The IntAct Complex Portal (http://www.ebi.ac.uk/intact/complex/) is a manually curated resource of macromolecular complexes. These annotations are currently visible in our annotation files, except those that are based on UniProt reference proteomes as these contain annotations only to UniProtKB entries. The annotations are not visible in the current version of QuickGO (www.ebi.ac.uk/QuickGO), but will be available from the new version, which is due for release in the near future.


July 2014

We have improved the accuracy of automatic annotations by removing those annotations that violate taxon constraints. Some GO terms are applicable only to certain taxa and this is encoded in the GO taxon constraints. For example, if a GO term that is valid for use only with eukaryotes, e.g. GO:0000165 'MAPK cascade', is applied to a bacterial protein, the annotation would be incorrect and it would be deleted. This process resulted in the deletion of approximately 106,000 incorrect electronic annotations.

March 2014

We are now incorporating manual annotations from the Alzheimer's Project at the University of Toronto. Further information on this project can be found at http://wiki.geneontology.org/index.php/Alzheimer%27s_Disease_Annotation_Project,

February 2014

We made further improvements to the pipeline that creates the GO Consortium 'inferred' annotations to reduce redundancy. This has caused a large decrease in the number of annotations that are assigned by 'GOC'.

Since the beginning of 2014 January we have included annotations from a new project "Parkinson's UK-UCL", which is a project led by Dr. Ruth Lovering at University College London to annotate proteins involved in Parkinson's disease. Further information on this project can be found at http://www.ucl.ac.uk/cardiovasculargeneontology/cardiovascular/newsletters


January 2014

We have suspended submission to the GO Consortium (GOC) of species-specific Gene Association Files if another group is responsible for the provision of GO annotations to that species. The following files were affected:

gene_association.goa_arabidopsis gene_association.goa_mouse gene_association.goa_rat gene_association.goa_zebrafish

These files will no longer be available from the GOC annotation download webpage (http://www.geneontology.org/GO.downloads.annotations.shtml) nor the GOC ftp site (ftp://ftp.geneontology.org/pub/go/gene-associations/submission/). Users will still be able to get annotations for all of these species from the UniProt multispecies file on the GOC website (http://www.geneontology.org/GO.downloads.annotations.shtml#unfilter).

The above species-specific files will continue to be made available from the UniProt-GOA ftp site (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/).

As of the UniProt-GOA release in February 2014, we will remove all of the archived species-specific files mentioned above from the GOC CVS repository. These archived files will still be available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/old/

We noted a substantial increase in GO Consortium 'inferred' annotations. These annotations are automatically created based on inter-ontology links between Molecular Function and Biological Process terms and between Biological Process and Cellular Component terms. The increase is due to enhancements to the pipeline to take account of the GO hierarchy.

The UniProt-GOA gene association files now include manual annotations for Trypanosoma brucei and Leishmania major that have been created by the GeneDB project. Details of GeneDB can be found at: http://www.genedb.org/Homepage