UniProt-GOA Mar 3 to June 5: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
 
(8 intermediate revisions by one other user not shown)
Line 1: Line 1:
==In Progress!!==
 
=Staff=
=Staff=


Claire O'Donovan
UniProt contributors (EBI, Hinxton, UK; SIB, Geneva, Switzerland; and PIR, Washington DC).


Maria Martin
'''UniProt-EBI:'''


Rachael Huntley*
Claire O'Donovan, Maria Martin, Rachael Huntley*, Prudence Mutowo-Muellenet, Tony Sawford*, Aleksandra Shypitsyna, Carlos Bonilla, Joanna Argasinska, Elena Cibrian-Uhalte, Penelope Garmiri, Emma Hatton-Ellis, Reija Hieta, Duncan Legge, Michele Magrane, Klemens Pichler.


Prudence Mutowo-Muellenet
<nowiki>*</nowiki> Funded entirely or partially by GO.


Tony Sawford*
'''UniProt-SIB:'''


Aleksandra Shypitsyna
Ghislaine Argoud-Puy, Andrea Auchinchloss, Kristian Axelsen, Marie-Claude Blatter, Emmanuel Boutet, Lionel Breuza, Alan Bridge,  Elizabeth Coudert, Isabelle Cusin, Paula Duek Roggli, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Florence Jungo, Guillaume Keller, Philippe Lemercier, Damien Lieberherr, Patrick Masson, Ivo Pedruzzi,  Sylvain Poux, Catherine Rivoire, Bernd Roechert, Michel Schneider, Andre Stutz, Shyamala Sundaram, Michael Tognolli.


Carlos Bonilla
'''UniProt-PIR:'''


UniProt contributors (EBI, Hinxton, UK; SIB, Geneva, Switzerland; and PIR, Washington DC): Ioannis Xenarios, Lydie Bougueleret
Kati Laiho
 
Ghislaine Argoud-Puy, Andrea Auchinchloss, Kristian Axelsen, Marie-Claude Blatter, Emmanuel Boutet, Lionel Breuza, Alan Bridge, Gayatri Chavali, Elena Cibrian-Uhalte, Elizabeth Coudert, Isabelle Cusin, Paula Duek Roggli, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Arnaud Gos, Nadine Gruaz-Gumowski, Reija Hieta, Ursula Hinz, Chantal Hulo, Janet James, Florence Jungo, Guillaume Keller, Kati Laiho, Duncan Legge, Philippe Lemercier, Damien Lieberherr, Michele Magrane, Patrick Masson, Ivo Pedruzzi, Klemens Pichler, Diego Poggioli, Sylvain Poux, Catherine Rivoire, Bernd Roechert, Michel Schneider, Andre Stutz, Shyamala Sundaram, Michael Tognolli
 
<nowiki>*</nowiki> Funded entirely or partially by GO.


=Annotation progress=
=Annotation progress=
Line 43: Line 39:
==a. Literature curation==
==a. Literature curation==


In May 2013 we initiated the annotation of a list of approximately 400 human proteins that are targets of the Critical Assessment of Functional Annotation (CAFA) competition. We are curating the primary functions and processes of these proteins in order to populate these targets with functional annotations, which will assist in the assessment of the CAFA competition. We have now completed this annotation project, in total we have curated 4067 proteins with 12,293 annotations. Of these, 384 proteins are suitable for CAFA assessment on Molecular Function or Biological Process terms.  "Suitable" means that the protein has at least one new annotation (with evidence code IDA, IMP, IGI, IEP or IPI) in MF or BP that is not the same as any existing manual annotations.
In May 2013 the UniProt-EBI curators initiated the annotation of a list of approximately 400 human proteins that are targets of the Critical Assessment of Functional Annotation (CAFA) competition. We are curating the primary functions and processes of these proteins in order to populate these targets with functional annotations, which will assist in the assessment of the CAFA competition. We have now completed this annotation project, in total we have curated 4067 proteins with 12,293 annotations. Of these, 384 proteins are suitable for CAFA assessment on Molecular Function or Biological Process terms.  "Suitable" means that the protein has at least one new annotation (with evidence code IDA, IMP, IGI, IEP or IPI) in MF or BP that is not the same as any existing manual annotations.


We continue to annotate proteins that are experimentally determined to be located in the extracellular vesicular exosome.
The UniProt-EBI curators continue to annotate proteins that are experimentally determined to be located in the extracellular vesicular exosome. We have identified approximately 2,500 proteins using data from the Exocarta database and a subset of these have been annotated as part of the CAFA project detailed above. We will progress by identifying groups of related proteins, e.g. keratins, immune-related proteins, to form discrete annotation projects.


==b. Computational annotation strategies==
==b. Computational annotation strategies==
Line 67: Line 63:




UniProt curators supply information to entries that is subsequently used in electronic GO annotation pipelines such as UniProtKB keywords2GO, UniProtKB subcellular location2GO and HAMAP2GO. Altogether, automatic annotation pipelines provide almost 262 million annotations to almost 37 million proteins.
All UniProt curators supply information to entries that is subsequently used in electronic GO annotation pipelines such as UniProtKB keywords2GO, UniProtKB subcellular location2GO and HAMAP2GO. Altogether, automatic annotation pipelines provide almost 262 million annotations to almost 37 million proteins.


==c. Priorities for annotation==
==c. Priorities for annotation==
Line 73: Line 69:
1. Proteins associated with the exosome (Prudence, Aleksandra)
1. Proteins associated with the exosome (Prudence, Aleksandra)


2. Proteins from the CAFA target list (all curators)
2. Proteins from the CAFA target list (UniProt-EBI curators)


3. Requests from user community (all curators)
3. Requests from user community (UniProt-EBI curators)


4. Proteins annotated during Swiss-Prot curation duties (all Swiss-Prot/UniProtKB curators at the EBI and SIB)
4. Proteins annotated during Swiss-Prot curation duties (all UniProtKB/Swiss-Prot curators at the EBI, SIB and PIR)


5. Annotation corrections based on quality control reports (all curators)  
5. Annotation corrections based on quality control reports (all curators)


=Presentations and Publications=
=Presentations and Publications=
Line 113: Line 109:
* Rachael Huntley, Prudence Mutowo-Meullenet and Aleksandra Shypitsyna continue to answer user queries sent to the UniProt-GOA project   
* Rachael Huntley, Prudence Mutowo-Meullenet and Aleksandra Shypitsyna continue to answer user queries sent to the UniProt-GOA project   


* UniProt is continuing to support external annotation groups, such as AgBase, BHF-UCL, Parkinsons UK-UCL, DictyBase, SGD, CamCellNet, WormBase, DFLAT at Tuft's University, SIB, PIR, Alzheimers Project at the University of Toronto and the GO Consortium PAINT curators by providing use of the Protein2GO curation tool.  
* UniProt-EBI is continuing to support external annotation groups, such as AgBase, BHF-UCL, Parkinsons UK-UCL, DictyBase, SGD, CamCellNet, WormBase, DFLAT at Tuft's University, SIB, PIR, Alzheimers Project at the University of Toronto and the GO Consortium PAINT curators by providing use of the Protein2GO curation tool.  


* UniProt is continuing to assist GO Consortium groups with migration of their annotations into the UniProt database, as well as providing access and training for the UniProt curation tool Protein2GO.  
* UniProt-EBI is continuing to assist GO Consortium groups with migration of their annotations into the UniProt database, as well as providing access and training for the UniProt curation tool Protein2GO.  


* Rachael Huntley is involved in a GO Consortium collaboration with a team at the Norwegian University of Science and Technology to assist them in making annotations for transcription factors and their target genes.  
* Rachael Huntley is involved in a GO Consortium collaboration with a team at the Norwegian University of Science and Technology to assist them in making annotations for transcription factors and their target genes.  
Line 133: Line 129:
Work is continuing on a new user interface for the UniProt GO browser QuickGO. New features will include the ability to view the Evidence Code Ontology in the ancestor chart view and also to display annotation extensions. This work is being carried out by Carlos Bonilla with support from Tony Sawford.
Work is continuing on a new user interface for the UniProt GO browser QuickGO. New features will include the ability to view the Evidence Code Ontology in the ancestor chart view and also to display annotation extensions. This work is being carried out by Carlos Bonilla with support from Tony Sawford.


''ii. Improvements to the Protein2GO curation tool by UniProt-EBI''


''ii. Improvements to the Protein2GO curation tool''
We are continuing to migrate annotations from GO Consortium curation groups into the UniProt database as needed. This also involves each group annotating to proteins using Protein2GO as their sole curation tool.


We are continuing to migrate annotations from GO Consortium curation groups into the UniProt database as needed. This also involves each group annotating to proteins using Protein2GO as their sole curation tool.
We continue to include quality control checks within the Protein2GO interface as requested by users or in response to guidelines from the GO Consortium. These have included the ability for each curator to view deleted annotations from all sources and flagging annotations that violate taxon constraints.


UniProt-GOA is moving towards using ECO codes in the database and curation tool. To this end, Protein2GO now displays ECO codes alongside the equivalent GO evidence codes. The next step will be to allow curators to use more granular ECO codes when creating an annotation. Tony Sawford is responsible for the development and maintenance of Protein2GO and the UniProt-GOA database.
Protein2GO is currently being developed to support annotation to entities other than proteins, such as protein complexes (using IntAct Complex IDs) and RNAs (using RNA Central IDs). Tony Sawford is responsible for the development and maintenance of Protein2GO and the UniProt-GOA database.


''iii. Annotation file changes''
''iii. Annotation file changes''


'''February 2014'''
Annotation files are released every 4 weeks by UniProt-GOA at EBI.
 
1. We have made further improvements to the pipeline that creates the GO Consortium 'inferred' annotations to reduce redundancy. This has caused a large decrease in the number of annotations that are assigned by 'GOC'.
 
2. Since January we have included annotations from a new project "Parkinson's UK-UCL", which is a project led by Dr. Ruth Lovering at University College London to annotate proteins involved in Parkinson's disease. Further information on this project can be found at http://www.ucl.ac.uk/cardiovasculargeneontology/cardiovascular/newsletters.
 
'''January 2014'''
 
1. We have suspended submission to the GO Consortium (GOC) of species-specific Gene Association Files if another
group is responsible for the provision of GO annotations to that species. This affects the following files:
 
gene_association.goa_arabidopsis
gene_association.goa_mouse
gene_association.goa_rat
gene_association.goa_zebrafish
 
These files will no longer be available from the GOC annotation download webpage (http://www.geneontology.org/GO.downloads.annotations.shtml) nor the
GOC ftp site (ftp://ftp.geneontology.org/pub/go/gene-associations/submission/).
Users will still be able to get annotations for all of these species from the UniProt multispecies file on the GOC website (http://www.geneontology.org/GO.downloads.annotations.shtml#unfilter).
 
The above species-specific files will continue to be made available from the UniProt-GOA ftp site (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/).
 
2. All of the archived species-specific files mentioned above have been removed from the GOC CVS repository. These archived files will still be available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/old/
 
3. In the January release there was a substantial increase in GO Consortium 'inferred' annotations. These annotations are automatically created based on inter-ontology links between Molecular Function and Biological Process terms and between Biological Process and Cellular Component terms. The increase is due to enhancements to the pipeline to take account of the GO hierarchy.
 
4. Manual annotations for Trypanosoma brucei and Leishmania major, created by the GeneDB project, are now included in the UniProt multi-species annotation files.
 
'''December 2013'''
 
Changes to the provision of UniProt GO annotation files to the GO Consortium.


1. As of the December release we are additionally supplying the GO Consortium (GOC) with a set of species-specific annotation files for human, dog, pig, cow and chicken that are based on UniProt reference proteomes and provide one protein per gene. The protein accessions included in these files are the protein sequences annotated in Swiss-Prot or the longest TrEMBL transcript if there is no Swiss-Prot record.
'''March 2014'''
The files will be available in both GAF2.0 and GPAD1.1 format and can be identified by the inclusion of "ref" in the file name, e.g. gp_association.goa_ref_human. The GAF2.0 files will be available from the GOC annotation downloads page (http://www.geneontology.org/GO.downloads.annotations.shtml) and the GOC ftp site (ftp://ftp.geneontology.org/pub/go/gene-associations/); the locations of the GPAD files will be announced at a later date.


These files are already available from the UniProt-GOA ftp site: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
1. We are now incorporating manual annotations from the Alzheimer's Project at the University of Toronto. Sejal Patel, an MSc student from the University of Toronto, is focusing on curating genes associated with Alzheimer’s disease that have been significant in previous genome wide association studies. For further information on this project, a brief project plan can be found at http://wiki.geneontology.org/index.php/Alzheimer%27s_Disease_Annotation_Project


[[Category:GOA]] [[Category:Reports]]
[[Category:Reports - GOA‏‎]]

Latest revision as of 19:11, 6 March 2020

Staff

UniProt contributors (EBI, Hinxton, UK; SIB, Geneva, Switzerland; and PIR, Washington DC).

UniProt-EBI:

Claire O'Donovan, Maria Martin, Rachael Huntley*, Prudence Mutowo-Muellenet, Tony Sawford*, Aleksandra Shypitsyna, Carlos Bonilla, Joanna Argasinska, Elena Cibrian-Uhalte, Penelope Garmiri, Emma Hatton-Ellis, Reija Hieta, Duncan Legge, Michele Magrane, Klemens Pichler.

* Funded entirely or partially by GO.

UniProt-SIB:

Ghislaine Argoud-Puy, Andrea Auchinchloss, Kristian Axelsen, Marie-Claude Blatter, Emmanuel Boutet, Lionel Breuza, Alan Bridge, Elizabeth Coudert, Isabelle Cusin, Paula Duek Roggli, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Florence Jungo, Guillaume Keller, Philippe Lemercier, Damien Lieberherr, Patrick Masson, Ivo Pedruzzi, Sylvain Poux, Catherine Rivoire, Bernd Roechert, Michel Schneider, Andre Stutz, Shyamala Sundaram, Michael Tognolli.

UniProt-PIR:

Kati Laiho

Annotation progress

Between 3 March and 5 June 2014, the UniProt-GOA project provided the GO Consortium with 3 annotation file releases, including non-redundant sets of GO annotations to 13 specific proteomes, as well as data releases for annotations of all proteins in UniProtKB.

UniProt incorporates manual annotations from other GO Consortium members and affiliates and displays these annotations in the relevant UniProtKB entries. Currently, the UniProt-GO Annotation project provides GO annotations for 65% of UniProt entries. Altogether, UniProt-GOA now provides over 263 million GO annotations for almost 37 million proteins in over 462,000 different taxonomic groups. UniProt-GOA provides 287,407 annotations for 18,411 proteins (out of a total of 20,670 proteins) in the human reference proteome.

UniProt-GOA UniProt gene association file release stats (comparison of February 2014 and May 2014 releases)

Key

*New sources of annotation after March 2014

**Decrease in MTBbase annotations due to a large number of UniProt accessions becoming secondary, manual intervention is in progress to rectify this.

Methods and strategies for annotation

a. Literature curation

In May 2013 the UniProt-EBI curators initiated the annotation of a list of approximately 400 human proteins that are targets of the Critical Assessment of Functional Annotation (CAFA) competition. We are curating the primary functions and processes of these proteins in order to populate these targets with functional annotations, which will assist in the assessment of the CAFA competition. We have now completed this annotation project, in total we have curated 4067 proteins with 12,293 annotations. Of these, 384 proteins are suitable for CAFA assessment on Molecular Function or Biological Process terms. "Suitable" means that the protein has at least one new annotation (with evidence code IDA, IMP, IGI, IEP or IPI) in MF or BP that is not the same as any existing manual annotations.

The UniProt-EBI curators continue to annotate proteins that are experimentally determined to be located in the extracellular vesicular exosome. We have identified approximately 2,500 proteins using data from the Exocarta database and a subset of these have been annotated as part of the CAFA project detailed above. We will progress by identifying groups of related proteins, e.g. keratins, immune-related proteins, to form discrete annotation projects.

b. Computational annotation strategies

UniProt-GOA provides IEA annotations from the following methods:


  1. UniProt Keyword 2GO (SPKW2GO)1,2
  2. UniProt Subcellular Locations2GO (SPSL2GO)1,2
  3. Unipathway2GO1,2
  4. HAMAP2GO1,2
  5. InterPro2GO
  6. Ensembl Compara (vertebrates)
  7. Ensembl Genomes Compara (plants, fungi)

Key

1: mapping tables created and maintained by UniProt

2: electronic annotations generated by UniProt


All UniProt curators supply information to entries that is subsequently used in electronic GO annotation pipelines such as UniProtKB keywords2GO, UniProtKB subcellular location2GO and HAMAP2GO. Altogether, automatic annotation pipelines provide almost 262 million annotations to almost 37 million proteins.

c. Priorities for annotation

1. Proteins associated with the exosome (Prudence, Aleksandra)

2. Proteins from the CAFA target list (UniProt-EBI curators)

3. Requests from user community (UniProt-EBI curators)

4. Proteins annotated during Swiss-Prot curation duties (all UniProtKB/Swiss-Prot curators at the EBI, SIB and PIR)

5. Annotation corrections based on quality control reports (all curators)

Presentations and Publications

a. Publications

"Understanding how and why the Gene Ontology and its annotations evolve: the GO within UniProt." Rachael P Huntley, Tony Sawford, Maria J Martin and Claire O'Donovan. 2014 GigaScience 3(1):4 doi: 10.1186/2047-217X-3-4. PMCID: PMC3995153

"A method for increasing expressivity of Gene Ontology annotations using a compositional approach." Rachael P Huntley, Midori A Harris, Yasmin Alam-Faruque, Judith A Blake, Seth Carbon, Heiko Dietze, Emily C Dimmer, Rebecca E Foulger, David P Hill, Varsha K Khodiyar, Antonia Lock, Jane Lomax, Ruth C Lovering, Prudence Mutowo-Meullenet, Tony Sawford, Kimberly Van Auken, Valerie Wood and Christopher J Mungall. 2014 BMC Bioinformatics 5(1):155. doi: 10.1186/1471-2105-15-155. PMCID: PMC4039540

The following has been accepted for publication:

"Representing Kidney Development Using The Gene Ontology." Yasmin Alam-Faruque; David P. Hill; Emily C. Dimmer; Midori A. Harris; Rebecca E. Foulger; Susan Tweedie; Helen Attrill; Douglas G. Howe; Stephen Randall Thomas; Duncan Davidson; Adrian S. Woolf; Judith A. Blake; Christopher J. Mungall; Claire O'Donovan; Rolf Apweiler; Rachael P. Huntley. 2014 PlosOne

"Standardized description of scientific evidence using the Evidence Ontology (ECO)." Marcus C. Chibucos, Christopher J. Mungall, Rama Balakrishnan, Karen R. Christie, Rachael P. Huntley, Owen White, Judith A. Blake, Suzanna E. Lewis, Michelle Giglio. 2014 Database

b. Presentations including Talks, Tutorials and Teaching


c. Posters

Other highlights

A. Ontology development contributions

  • All curators continue to request new GO terms or updates to the ontology where necessary, using either Term Genie or the SourceForge tracker

B. Annotation outreach and user advocacy efforts

  • Rachael Huntley and Prudence Mutowo-Meullenet continue to answer queries sent to the GO Consortium helpdesk
  • Rachael Huntley, Prudence Mutowo-Meullenet and Aleksandra Shypitsyna continue to answer user queries sent to the UniProt-GOA project
  • UniProt-EBI is continuing to support external annotation groups, such as AgBase, BHF-UCL, Parkinsons UK-UCL, DictyBase, SGD, CamCellNet, WormBase, DFLAT at Tuft's University, SIB, PIR, Alzheimers Project at the University of Toronto and the GO Consortium PAINT curators by providing use of the Protein2GO curation tool.
  • UniProt-EBI is continuing to assist GO Consortium groups with migration of their annotations into the UniProt database, as well as providing access and training for the UniProt curation tool Protein2GO.
  • Rachael Huntley is involved in a GO Consortium collaboration with a team at the Norwegian University of Science and Technology to assist them in making annotations for transcription factors and their target genes.
  • Together with Rama Balakrishnan from SGD, Rachael Huntley is a manager for the GO Consortium's Annotation Advocacy and Coordination group. The aims of the group are to;
  * educate GO Consortium curators about best annotation practice
  * enforce the annotation rules and policies within the GOC
  * maintain the annotation and evidence code documentation
  * educate and keep all the annotating groups up-to-date with changes in GAF format and ontology development 
  * assist new groups with annotations

C. Other highlights

i. Improvements to the QuickGO user interface

Work is continuing on a new user interface for the UniProt GO browser QuickGO. New features will include the ability to view the Evidence Code Ontology in the ancestor chart view and also to display annotation extensions. This work is being carried out by Carlos Bonilla with support from Tony Sawford.

ii. Improvements to the Protein2GO curation tool by UniProt-EBI

We are continuing to migrate annotations from GO Consortium curation groups into the UniProt database as needed. This also involves each group annotating to proteins using Protein2GO as their sole curation tool.

We continue to include quality control checks within the Protein2GO interface as requested by users or in response to guidelines from the GO Consortium. These have included the ability for each curator to view deleted annotations from all sources and flagging annotations that violate taxon constraints.

Protein2GO is currently being developed to support annotation to entities other than proteins, such as protein complexes (using IntAct Complex IDs) and RNAs (using RNA Central IDs). Tony Sawford is responsible for the development and maintenance of Protein2GO and the UniProt-GOA database.

iii. Annotation file changes

Annotation files are released every 4 weeks by UniProt-GOA at EBI.

March 2014

1. We are now incorporating manual annotations from the Alzheimer's Project at the University of Toronto. Sejal Patel, an MSc student from the University of Toronto, is focusing on curating genes associated with Alzheimer’s disease that have been significant in previous genome wide association studies. For further information on this project, a brief project plan can be found at http://wiki.geneontology.org/index.php/Alzheimer%27s_Disease_Annotation_Project