GOA, September 2009: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
 
(19 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Category:Reports - GOA]]
= Gene Ontology Annotation at UniProtKB, 2009 =
= Gene Ontology Annotation at UniProtKB, 2009 =


Line 19: Line 20:


Tony Sawford
Tony Sawford
Swiss-Prot contributors (EBI, Hinxton, UK and SIB, Geneva, Switzerland)
Ioannis Xenarios
Amos Bairoch
Lydie Bougueleret
Serenella Ferro-Rojas
Andrea Auchinchloss
Marie-Claude Blatter
Emmanuel Boutet
Lionel Breuza
Alan Bridge
Paul Browne
Wei Mun Chan
Elizabeth Coudert
Louise Daugherty
Ruth Eberhardt
Anne Estreicher
Livia Famiglietti
Marc Feuermann
Rebecca Foulger
Nadine Gruaz-Gumowski
Ursula Hinz
Silvia Jimenez
Florence Jungo
Guillaume Keller
Kati Laiho
Duncan Legge
Philippe Lemercier
Damien Lieberherr
Michele Magrane
Ivo Pedruzzi
Sylvain PouxCatherine Rivoire
Bernd Roechert
Michel Schneider
Eleanor Stanley
Andre Stutz
Shyamala Sundaram
Michael Tognolli


= Annotation Progress =
= Annotation Progress =
Line 24: Line 65:


In addition, with the newly started kidney-centric annotation project, additional emphasis has been placed on certain genes associated with renal development and disease.
In addition, with the newly started kidney-centric annotation project, additional emphasis has been placed on certain genes associated with renal development and disease.
Currently the curators from the GOA and BHF-UCL projects have completely annotated 77.5% (609/785) of supplied Reference Genome Targets.




<center>'''GOA UniProt gene association file release stats (comparison of March 09 and September 2009 releases)'''</center>
<center>'''GOA UniProt gene association file release stats (comparison of March 09 and September 2009 releases)'''</center>


[[Image:Release_mar_sep.JPG]]
[[File:Release_mar_sep.JPG]]


= Methods and strategies for annotation =
= Methods and strategies for annotation =
Line 68: Line 111:
a.'' Papers with substantial GO content''
a.'' Papers with substantial GO content''


Atherosclerosis. 2009 Jul;205(1):9-14. Improvements to cardiovascular gene ontology. Lovering RC, Dimmer EC, Talmud PJ.  
Lovering RC, Dimmer EC, Talmud PJ. 2009 Improvements to cardiovascular gene ontology. Atherosclerosis. 205(1):9-14.  


PLoS Comput Biol. 2009 Jul;5(7):e1000431. The Gene Ontology's Reference Genome Project: a unified framework for functional annotation across species.  Reference Genome Group of the Gene Ontology Consortium.
Binns D, Dimmer E, Huntley R, Barrell D, O'Donovan C, Apweiler R. 2009 QuickGO: a web-based tool for Gene Ontology searching. Bioinformatics. [Epub ahead of print]
 
Reference Genome Group of the Gene Ontology Consortium. 2009 The Gene Ontology's Reference Genome Project: a unified framework for functional annotation across species.  PLoS Comput Biol. 5(7):e1000431.  




b. ''Presentations including Talks and Tutorials and Teaching''
b. ''Presentations including Talks and Tutorials and Teaching''


June 2009 - Presentation of GOA and the Renal Annotation Initiative at Edinburgh University (GUDMAP Consortium), Yasmin Alam-Faruque  
16-19th April 2009 - Poster: 'The GO Annotation (GOA) Project'; Biocurator conference in Berlin, Rachael Huntley
 
22nd June 2009 - Presentation of GOA and the Renal Annotation Initiative at Edinburgh University (GUDMAP Consortium) Introduction to Gene Ontology Annotation (GOA) and The Renal GOA Initiative’. Yasmin Alam-Faruque
 
29th July 2009 Presentation of the Renal GOA Initiative to Wellcome Trust Genome Campus-wide curators. Title of talk: ‘Introduction to The Renal Gene Ontology Annotation (GOA) Initiative’. Yasmin Alam-Faruque  


7-8th September 2009 - Renal Gene Ontology Annotation Initiative Poster and presentation for the Kidney Research Fellows Day, Yasmin Alam-Faruque
7-8th September 2009 - Renal Gene Ontology Annotation Initiative Poster and presentation for the Kidney Research Fellows Day, Yasmin Alam-Faruque


23rd April, 1st July, 2nd SeptemberGO annotation training of Swiss-Prot curators at the Swiss Institute of Bioinformatics, Geneva, Switzerland, Emily Dimmer and Rachael Huntley
23rd April, 1st July, 2nd September GO annotation training of Swiss-Prot curators at the Swiss Institute of Bioinformatics, Geneva, Switzerland, Emily Dimmer and Rachael Huntley
 
17-18th September British Atherosclerosis Society Autumn Meeting; 'Genetics of Complex Diseases'. Introduction to GO. Emily Dimmer and Ruth Lovering.


= Other Highlights=
= Other Highlights=


New staff:
Yasmin Alam-Faruque
Tony Sawford


'''A. Ontology Development Contributions:'''
'''A. Ontology Development Contributions:'''


33 GO terms have been created during annotation efforts by the group
100 SourceForge items regarding requested changes to the GO have been placed by curators associated with the GOA group since March 2009




'''B. Annotation Outreach and User Advocacy Efforts:'''
'''B. Annotation Outreach and User Advocacy Efforts:'''


GOA is currently training curators at the Swiss Institute of Bioinformatics.
The GOA group has been training Swiss-Prot curators at the Swiss Institute of Bioinformatics in Geneva over the last year. GOA curators have been training small groups of curators, and then evaluating and feeding back to individual curators every GO annotation made during the first two months after training. After this time, if it is felt the curator is confident in annotating to GO, their annotations are subsequently spot-checked. This has been/is a very large amount of work for our group.  GOA has traveled to Geneva 5 times since January, training on average 5/6 curators during each visit.
Rachael and Emily from GOA and Michele Magrane from the Swiss-Prot group at the EBI travelled to Geneva on the 28th-29th January to train 8 curators.
Rachael and Emily then checked all annotations generated by this group over the following 2 months before making the annotations public.
The Swiss-Prot team in Geneva have so far generated approximately 4,000 manual GO annotations.
Annotations are created in GOA's protein2go tool, and released in the groups gene association files.  Such annotations use the existing source 'UniProtKB' (for column 15 of the gene association file).


Emily will visit SIB in April to train another 5 curators.
The Swiss-Prot team in Geneva have so far generated approximately 13,158 manual GO annotations to 2,958 UniProtKB proteins.Annotations are created in GOA's protein2go tool, and released in the groups gene association files. Such annotations use the existing source 'UniProtKB' (for column 15 of the gene association file). GOA will continue to train and mentor SIB curators over 2009 and 2010.  


GOA will continue to train and mentor SIB curators over 2009.


'''C. Other'''
'''C. Other'''




''Renal GO annotation initiative funded by Kidney Research UK.''
'''''Changes to SPKW2GO and SPSL2GO'''''
This grant will start on the 1st April, and will be run by Yasmin Alam-Faruque, who will join GOA from the Swiss-Prot team at the EBI.
 
This initiative will generate high-quality manual annotation for those genes/processes found to be implicated in kidney development and disease.
With the aid of Serenella Ferro-Rojas at SIB, the SPKW2GO and SPSL2GO mappings have been revised - so far 70 mappings have been added to the SPSL2GO mapping, and changes to 24 different SPK2GO mappings have been made since March 2009.
 
Please note that GOA is intending to stop using the SPWK2GO mapping for production of cellular component GO annotations. In future only the Swiss-Prot Subcellular Location 2 GO (SPSL2GO) mapping will be applied to generate CC annotations from UniProtKB. This change will improve the correctness of GOA-supplied cellular component annotations, as SPSL2GO includes 'host x' subcellular location terms, in contrast to Swiss-Prot keywords. Therefore it is important that all groups which integrate annotations from SPKW2GO from GOA gene association files include annotations originating from SPSL2GO.
 
 
'''''Renal GO annotation initiative funded by Kidney Research UK.'''''
 
This grant started on the 1st April, managed by Yasmin Alam-Faruque, who joined GOA from the Swiss-Prot team at the EBI.
 
''Aim 1: Promote wider scientific engagement in GO annotation through a variety of outreach efforts''
• Generated web pages providing further information about the Renal GOA Initiative: http://www.ebi.ac.uk/GOA/kidney/  and http://www.geneontology.org/GO.renal.shtml.
• Set up the Renal Interest Group mailing list on the Gene Ontology Consortium web pages: http://www.geneontology.org/GO.interests.shtml?all#renal (aimed at encouraging researchers to provide suggestions/ advice/ discussions on renal gene/ protein-related issues)
• Two quarterly newsletters have been circulated, which provide further information on the project’s progress: http://www.ebi.ac.uk/GOA/kidney/newsletter/.
 
''Aim 2: Annotate gene products associated with kidney development and disease''
• In collaboration with members of the Scientific Advisory Panel, a list of renal-related proteins has been created which act as an initial set of curation targets, available from: http://www.ebi.ac.uk/GOA/kidney/.
Initial annotation targets were highly investigated proteins involved in the proton pump process and ammonium transport; both processes being important for the maintenance of acid-base homeostasis in the kidney.  Additional targets have been provided by the GUDMAP Consortium, which have no current GO annotations.
• To date this initiative has created 1,885 GO annotations to 387 UniProtKB protein accessions; this figure exceeds stated annotation targets in the grant proposal
• Annotations to this list are publicly available from QuickGO browser (http://www.ebi.ac.uk/QuickGO/GAnnotation?protein=KRUK).  - enabling members of the biomedical research community to search and view the dataset easily in the GOA group’s QuickGO browser
• Collaboration initiated with the Reactome group, resulting in further members of the solute-carrier transmembrane transporter protein superfamily added to the Reactome database. This is going to be expanded to other transmembrane transporter protein families including the proton pump, ion channels and aquaporins. The inclusion of more proteins involved in renal-specific processes into the Reactome database will provide an even more unique and comprehensively detailed functional dataset for mammalian gene products implicated in renal function and development.
• A co-annotation project has been initiated with other curators to improve annotation of proteins involved in the development and function of the Loop-of-Henle structure for mammalian and  non-mammalian organisms . This will highlight not only biological insights into the similarities and differences of the orthologous genes in distinct species, but also demonstrate the usefulness of focused, collaborative cross-species GO annotation by demonstrating to users the usefulness of functional annotation. This effort will also lead to focused development of GO terms to more accurately describe renal-associated processes. An envisaged publication on such a study would provide publicity to the renal annotation initiative and other databases.
 
 
''Aim 3: Improve the Gene Ontology with respect to Renal Processes''
• Collaboration initiated with the GUDMAP Consortium. This effort is reviewing the state of renal GO terms that currently exist in the ontology (initially with respect to nephrogenesis) and hence develop and improve GO terms in-line with the kidney anatomy work carried out by the renal community
 
'''''Verification of mappings to UniProtKB accessions in GO Consortium gp2protein files'''
 
The GOA group has recently provided groups in the GO Consortium with checks of the UniProtKB accessions applied in gp2protein mapping files. Annotation groups receive an email to indicate where in their file a secondary or deleted
UniProtKB has been used. This email will also (where possible) indicate suitable replacement UniProtKB accessions. Such checks will be run and results emailed to annotation groups on the first of each month.  


''Human Gene Association File changes..
'''''Gene Association File changes'''
In February 2009, the production of the gene_association.goa_human file
changed from using the International Protein Index (IPI) to using the
complete human proteome set available from UniProtKB/Swiss-Prot
(http://www.uniprot.org/news/2008/09/02/release).


The name and format of this human file has remained the same, however
annotations are now assigned to proteins from just the 'UniProtKB' (column 1)
database source. Human IPI identifiers continue to be included
in column 11 of annotations.


In addition, new releases of the cross-references file for human IPI set (human.xrefs.gz),
[[July 2009]]
will no longer be provided. Instead, identifier mapping is possible
Previously, annotation lines produced by electronic methods (as indicated by the presence of the 'IEA' evidence code in column 7) contained two identifiers piped together in the reference column; an internal GOA keyword (e.g. GOA:interpro) and a GO reference identifier (e.g. GO_REF:0000002). As the July 2009 release, the GOA internal reference (e.g. GOA:interpro) and pipe has been removed from this field, so that only a GO_REF identifier is provided.
using the UniProt ID mapping file, available from:
ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/idmapping/idmapping.dat.gz


idmapping.dat.gz is a tab-delimited table, which includes mappings for 20
[[May 2009]]
different sequence identifier types, including IPI identifiers.
DB field (column 1). Column 1 of the gene association file is used to identify the database which has supplied the sequence identifier displayed in column 2. Its value is very often 'UniProtKB'. However recent changes to other fields in the GOA gene association files have made it difficult for users to identify whether an UniProtKB accession originates from the UniProtKB/Swiss-Prot or UniProtKB/TrEMBL databases. Therefore it is intended that when a UniProtKB accession is provided in column 2, column 1 displays either 'UniProtKB/Swiss-Prot' or 'UniProtKB/TrEMBL'.

Latest revision as of 05:34, 16 April 2019

Gene Ontology Annotation at UniProtKB, 2009

Report on the GOA team's activities between March 2009 and September 2009.

Staff:

Rolf Apweiler

Claire O'Donovan

Emily Dimmer

Rachael Huntley

Yasmin Alam-Faruque

Daniel Barrell

David Binns

Tony Sawford

Swiss-Prot contributors (EBI, Hinxton, UK and SIB, Geneva, Switzerland)

Ioannis Xenarios Amos Bairoch Lydie Bougueleret Serenella Ferro-Rojas

Andrea Auchinchloss Marie-Claude Blatter Emmanuel Boutet Lionel Breuza Alan Bridge Paul Browne Wei Mun Chan Elizabeth Coudert Louise Daugherty Ruth Eberhardt Anne Estreicher Livia Famiglietti Marc Feuermann Rebecca Foulger Nadine Gruaz-Gumowski Ursula Hinz Silvia Jimenez Florence Jungo Guillaume Keller Kati Laiho Duncan Legge Philippe Lemercier Damien Lieberherr Michele Magrane Ivo Pedruzzi Sylvain PouxCatherine Rivoire Bernd Roechert Michel Schneider Eleanor Stanley Andre Stutz Shyamala Sundaram Michael Tognolli

Annotation Progress

We continue to put emphasis on the annotation of those genes selected for the Reference Genome Project.

In addition, with the newly started kidney-centric annotation project, additional emphasis has been placed on certain genes associated with renal development and disease.

Currently the curators from the GOA and BHF-UCL projects have completely annotated 77.5% (609/785) of supplied Reference Genome Targets.


GOA UniProt gene association file release stats (comparison of March 09 and September 2009 releases)

Methods and strategies for annotation

  1. Literature curation:

Literature curation continues to be the major focus of our annotation efforts, with an emphasis on the use of experimental evidence codes.


  1. Computational annotation strategies:

GOA provides IEA annotations from the following methods:

  1. Swiss-Prot Keyword 2GO (SPKW2GO)1,2
  2. Swiss-Prot Subcellular Locations2GO (SPSL2GO) 1,2
  3. HAMAP2GO2
  4. InterPro2GO2
  5. EC2GO2
  6. Ensembl Compara

Legend

1: mapping tables created and maintained by the GOA group

2: electronic annotations generated by the GOA group, using UniProtKB.


  1. Priorities for annotation
  1. Genes assigned by Reference Genome Project (Rachael, Emily)
  2. Genes associated with renal processes (Yasmin)
  3. Requests from user community (all curators)
  4. Proteins annotated during Swiss-Prot curation duties (all Swiss-Prot curators at the EBI and SIB)

Presentations and Publications

a. Papers with substantial GO content

Lovering RC, Dimmer EC, Talmud PJ. 2009 Improvements to cardiovascular gene ontology. Atherosclerosis. 205(1):9-14.

Binns D, Dimmer E, Huntley R, Barrell D, O'Donovan C, Apweiler R. 2009 QuickGO: a web-based tool for Gene Ontology searching. Bioinformatics. [Epub ahead of print]

Reference Genome Group of the Gene Ontology Consortium. 2009 The Gene Ontology's Reference Genome Project: a unified framework for functional annotation across species. PLoS Comput Biol. 5(7):e1000431.


b. Presentations including Talks and Tutorials and Teaching

16-19th April 2009 - Poster: 'The GO Annotation (GOA) Project'; Biocurator conference in Berlin, Rachael Huntley

22nd June 2009 - Presentation of GOA and the Renal Annotation Initiative at Edinburgh University (GUDMAP Consortium) Introduction to Gene Ontology Annotation (GOA) and The Renal GOA Initiative’. Yasmin Alam-Faruque

29th July 2009 Presentation of the Renal GOA Initiative to Wellcome Trust Genome Campus-wide curators. Title of talk: ‘Introduction to The Renal Gene Ontology Annotation (GOA) Initiative’. Yasmin Alam-Faruque

7-8th September 2009 - Renal Gene Ontology Annotation Initiative Poster and presentation for the Kidney Research Fellows Day, Yasmin Alam-Faruque

23rd April, 1st July, 2nd September GO annotation training of Swiss-Prot curators at the Swiss Institute of Bioinformatics, Geneva, Switzerland, Emily Dimmer and Rachael Huntley

17-18th September British Atherosclerosis Society Autumn Meeting; 'Genetics of Complex Diseases'. Introduction to GO. Emily Dimmer and Ruth Lovering.

Other Highlights

New staff:

Yasmin Alam-Faruque

Tony Sawford

A. Ontology Development Contributions:

100 SourceForge items regarding requested changes to the GO have been placed by curators associated with the GOA group since March 2009


B. Annotation Outreach and User Advocacy Efforts:

The GOA group has been training Swiss-Prot curators at the Swiss Institute of Bioinformatics in Geneva over the last year. GOA curators have been training small groups of curators, and then evaluating and feeding back to individual curators every GO annotation made during the first two months after training. After this time, if it is felt the curator is confident in annotating to GO, their annotations are subsequently spot-checked. This has been/is a very large amount of work for our group. GOA has traveled to Geneva 5 times since January, training on average 5/6 curators during each visit.

The Swiss-Prot team in Geneva have so far generated approximately 13,158 manual GO annotations to 2,958 UniProtKB proteins.Annotations are created in GOA's protein2go tool, and released in the groups gene association files. Such annotations use the existing source 'UniProtKB' (for column 15 of the gene association file). GOA will continue to train and mentor SIB curators over 2009 and 2010.


C. Other


Changes to SPKW2GO and SPSL2GO

With the aid of Serenella Ferro-Rojas at SIB, the SPKW2GO and SPSL2GO mappings have been revised - so far 70 mappings have been added to the SPSL2GO mapping, and changes to 24 different SPK2GO mappings have been made since March 2009.

Please note that GOA is intending to stop using the SPWK2GO mapping for production of cellular component GO annotations. In future only the Swiss-Prot Subcellular Location 2 GO (SPSL2GO) mapping will be applied to generate CC annotations from UniProtKB. This change will improve the correctness of GOA-supplied cellular component annotations, as SPSL2GO includes 'host x' subcellular location terms, in contrast to Swiss-Prot keywords. Therefore it is important that all groups which integrate annotations from SPKW2GO from GOA gene association files include annotations originating from SPSL2GO.


Renal GO annotation initiative funded by Kidney Research UK.

This grant started on the 1st April, managed by Yasmin Alam-Faruque, who joined GOA from the Swiss-Prot team at the EBI.

Aim 1: Promote wider scientific engagement in GO annotation through a variety of outreach efforts • Generated web pages providing further information about the Renal GOA Initiative: http://www.ebi.ac.uk/GOA/kidney/ and http://www.geneontology.org/GO.renal.shtml. • Set up the Renal Interest Group mailing list on the Gene Ontology Consortium web pages: http://www.geneontology.org/GO.interests.shtml?all#renal (aimed at encouraging researchers to provide suggestions/ advice/ discussions on renal gene/ protein-related issues) • Two quarterly newsletters have been circulated, which provide further information on the project’s progress: http://www.ebi.ac.uk/GOA/kidney/newsletter/.

Aim 2: Annotate gene products associated with kidney development and disease • In collaboration with members of the Scientific Advisory Panel, a list of renal-related proteins has been created which act as an initial set of curation targets, available from: http://www.ebi.ac.uk/GOA/kidney/. Initial annotation targets were highly investigated proteins involved in the proton pump process and ammonium transport; both processes being important for the maintenance of acid-base homeostasis in the kidney. Additional targets have been provided by the GUDMAP Consortium, which have no current GO annotations. • To date this initiative has created 1,885 GO annotations to 387 UniProtKB protein accessions; this figure exceeds stated annotation targets in the grant proposal • Annotations to this list are publicly available from QuickGO browser (http://www.ebi.ac.uk/QuickGO/GAnnotation?protein=KRUK). - enabling members of the biomedical research community to search and view the dataset easily in the GOA group’s QuickGO browser • Collaboration initiated with the Reactome group, resulting in further members of the solute-carrier transmembrane transporter protein superfamily added to the Reactome database. This is going to be expanded to other transmembrane transporter protein families including the proton pump, ion channels and aquaporins. The inclusion of more proteins involved in renal-specific processes into the Reactome database will provide an even more unique and comprehensively detailed functional dataset for mammalian gene products implicated in renal function and development. • A co-annotation project has been initiated with other curators to improve annotation of proteins involved in the development and function of the Loop-of-Henle structure for mammalian and non-mammalian organisms . This will highlight not only biological insights into the similarities and differences of the orthologous genes in distinct species, but also demonstrate the usefulness of focused, collaborative cross-species GO annotation by demonstrating to users the usefulness of functional annotation. This effort will also lead to focused development of GO terms to more accurately describe renal-associated processes. An envisaged publication on such a study would provide publicity to the renal annotation initiative and other databases.


Aim 3: Improve the Gene Ontology with respect to Renal Processes • Collaboration initiated with the GUDMAP Consortium. This effort is reviewing the state of renal GO terms that currently exist in the ontology (initially with respect to nephrogenesis) and hence develop and improve GO terms in-line with the kidney anatomy work carried out by the renal community

Verification of mappings to UniProtKB accessions in GO Consortium gp2protein files

The GOA group has recently provided groups in the GO Consortium with checks of the UniProtKB accessions applied in gp2protein mapping files. Annotation groups receive an email to indicate where in their file a secondary or deleted UniProtKB has been used. This email will also (where possible) indicate suitable replacement UniProtKB accessions. Such checks will be run and results emailed to annotation groups on the first of each month.

Gene Association File changes


July 2009 Previously, annotation lines produced by electronic methods (as indicated by the presence of the 'IEA' evidence code in column 7) contained two identifiers piped together in the reference column; an internal GOA keyword (e.g. GOA:interpro) and a GO reference identifier (e.g. GO_REF:0000002). As the July 2009 release, the GOA internal reference (e.g. GOA:interpro) and pipe has been removed from this field, so that only a GO_REF identifier is provided.

May 2009 DB field (column 1). Column 1 of the gene association file is used to identify the database which has supplied the sequence identifier displayed in column 2. Its value is very often 'UniProtKB'. However recent changes to other fields in the GOA gene association files have made it difficult for users to identify whether an UniProtKB accession originates from the UniProtKB/Swiss-Prot or UniProtKB/TrEMBL databases. Therefore it is intended that when a UniProtKB accession is provided in column 2, column 1 displays either 'UniProtKB/Swiss-Prot' or 'UniProtKB/TrEMBL'.