GOA December 2010

From GO Wiki
Jump to navigation Jump to search

!!Report In Progress!!

Gene Ontology Annotation at UniProtKB, 2010

Staff:

Rolf Apweiler

Claire O'Donovan

Emily Dimmer

Rachael Huntley

Yasmin Alam-Faruque

Daniel Barrell

David Binns

Tony Sawford

Swiss-Prot contributors (EBI, Hinxton, UK and SIB, Geneva, Switzerland): Ioannis Xenarios, Amos Bairoch, Lydie Bougueleret, Serenella Ferro-Rojas

Ghislaine Argoud-Puy, Andrea Auchinchloss, Kristian Axelsen, Marie-Claude Blatter, Emmanuel Boutet, Silvia Braconi Quintaje, Lionel Breuza, Alan Bridge, Paul Browne, Wei Mun Chan, Elizabeth Coudert, Isabelle Cusin, Louise Daugherty, Paula Duek Roggli, Ruth Eberhardt, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Rebecca Foulger, Michael Gardner, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Janet James, Silvia Jimenez, Florence Jungo, Guillaume Keller, Kati Laiho, Duncan Legge, Philippe Lemercier, Damien Lieberherr, Michele Magrane, Patrick Masson, Madelaine Moinat, Ivo Pedruzzi, Klemens Pichler, Diego Poggioli, Sylvain Poux, Catherine Rivoire, Bernd Roechert, Michel Schneider, Harminder Sehra, Eleanor Stanley, Andre Stutz, Shyamala Sundaram, Michael Tognolli

Annotation Progress

We continue to put emphasis on the annotation of those genes selected for the Reference Genome Project.

Proteins associated with kidney development and disease are the focus of the GOA Renal Annotation Initiative.

Currently the curators from the GOA and BHF-UCL projects have together completely annotated 65% (719/1111) of supplied Reference Genome Targets.

Between January 2010 and December 2010, the GOA project provided the GO Consortium with ?? annotation file releases, including non-redundant sets of GO annotations to the human, mouse, rat, zebrafish, Arabidopsis, chicken and cow proteomes, as well as data releases for annotations of all proteins in UniProtKB. Since 12th July 2010, GOA has provided an interim release of the human and chicken gene association files to allow the Reference Genomes PAINT project to collect the most up-to-date annotations for use in the tree curation. The human and chicken files are now released every two weeks, that is as part of the main GOA monthly file release and again two weeks later. GOA now provides over ?? million GO annotations for ?? million proteins in over 2??,000 different taxonomic groups. GOA provides ?? annotations for the human proteome (providing ??% of the human proteome with at least one GO annotation). Over the last year the number of manual annotations has increased by ?? % in the UniProtKB file and the number of manual annotations for the human file has increased by ??%. Between January and December 2010, GOA has continued training, checking and supporting 35 curators in the Swiss-Prot team at the Swiss Institute of Bioinformatics, who have since created over 37,500 manual GO annotations for UniProtKB entries from a range of species.

Methods and strategies for annotation

1. Literature curation:

Literature curation continues to be the major focus of our annotation efforts, with an emphasis on the use of experimental evidence codes.

2. Computational annotation strategies:

GOA provides IEA annotations from the following methods:

  1. Swiss-Prot Keyword 2GO (SPKW2GO)1,2
  2. Swiss-Prot Subcellular Locations2GO (SPSL2GO) 1,2
  3. HAMAP2GO2
  4. InterPro2GO2
  5. Ensembl Compara


Legend

1: mapping tables created and maintained by the GOA group

2: electronic annotations generated by the GOA group, using UniProtKB.


3. Priorities for annotation

  1. Genes assigned by Reference Genome Project (Rachael, Emily)
  2. Genes associated with renal processes (Yasmin)
  3. Requests from user community (all curators)
  4. Proteins annotated during Swiss-Prot curation duties (all Swiss-Prot/UniProtKB curators at the EBI and SIB)

Presentations and Publications

Publications, Talks, Posters 2010-

Other Highlights

A. Ontology Development Contributions:

  • 90 (up until 5/10/10) SourceForge items regarding requested changes to the GO have been placed by curators associated with the GOA group since January 2010.
  • Yasmin Alam-Faruque and the GOA group hosted a kidney-related ontology development meeting in ?? 2010 during which renal experts, ontology editors and curators discussed new renal-related terms. As a result of this meeting 462 new GO terms have so far been created allowing curators to choose much more specific terms when annotating kidney function and process.

B. Annotation Outreach and User Advocacy Efforts:

  • In September, GOA was contacted by a researcher requesting that annotations he had personally made to M. tuberculosis proteins be included in the GOA database. The annotations were reviewed by Rachael Huntley (GOA) and Rama Balakrishnan (SGD) and were deemed to be of high-quality. The annotations were subsequently incorporated into the September release of the GOA-UniProt file.
  • SIB training

C. Other

Changes to SPKW2GO and SPSL2GO


Renal GO annotation initiative funded by Kidney Research UK.


Gene Association File changes

April 2010

1. To avoid processing problems for GO Consortium tools we changed the contents of column 1 in the GOA-UniProt gene association file. Column 1 was originally displaying the values 'UniProtKB/TrEMBL' or 'UniProtKB/Swiss-Prot' to indicate which section of UniProtKB an accession is a member . This information is now provided in a tab-separated, supplementary gene product information file (gp_information file can be found here; ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/) that is released alongside the GOA-UniProt gene association files (for more information on this file see the readme; ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/gp_information_readme). Column 1 of our UniProt GAF has changed to consistently display 'UniProtKB' for all UniProtKB accessions.

2. UniProtKB-GOA files are supplied in GAF2.0 format (http://www.geneontology.org/GO.format.gaf-2_0.shtml).

3. New PDB gene association file. This file has been generated from a collaboration between the InterPro, PDB and UniProtKB-GOA teams, and once again is able to offer annotations to PDB chain identifiers. In addition, further sources of GO annotations are now associated with PDB chains, to provide a more comprehensive PDB GO annotation resource. Manual and electronic GO annotations are now provided in this file from two sources:

a. where an InterPro entry matches a PDB chain, annotations supplied by the InterPro2GO electronic method are assigned to the chain identifier (for further details on this method see: http://www.geneontology.org/cgi-bin/references.cgi#GO_REF:0000002).

b. PDB chains are additionally supplied with manual and electronic GO annotations (excluding InterPro2GO) when a PDB chain maps with at least 90% identity to a UniProtKB accession (more specifically with the UniProtKB's CHAIN feature), whereupon manual and electronic annotations are supplied to the PDB chain identifier from the matching UniProtKB accession.

May 2010

UniProtKB-GOA gene association files changed to correctly attribute the InterPro group as the source of annotations generated by the InterPro2GO electronic annotation pipeline. This means that the value in column 15 (Assigned_By) has changed from 'UniProtKB' to 'InterPro' where column 6 (DB:Reference) displays the reference 'GO_REF:0000002'.

June 2010