GOA May 2011

From GO Wiki
Jump to navigation Jump to search

Gene Ontology Annotation at UniProtKB, 2010

Report on the GOA team's activities between September 2009 and March 2010.

Staff:

GOA

Yasmin Alam-Faruque

Emily Dimmer

Rachael Huntley

Tony Sawford

UniProtKB EBI

Rolf Apweiler

Maria Jesus-Martin

Claire O’Donovan

Ben Bely

Gayatri Chavali

Michael Gardner

Reija Hieta

Duncan Legge

Michele Magrane

Wei Mun Chan

Sandra Orchard

Klemens Pichler

Diego Poggioli

Harminder Sehra

Eleanor Stanley

UniProtKB SIB

Ioannis Xenarios

Lydie Bougueleret

Alan Bridge

Sylvain Poux

Ghislaine Argoud-Puy

Andrea Auchincloss Damay

Kristian Axelsen

Marie-Claude Blatter

Emmanuel Boutet

Lionel Breuza

Elizabeth Coudert

Isabelle Cusin

Paula Duek Roggli

Anne Estreicher

Livia Famiglietti

Marc Feuermann

Arnaud Gos

Nadine Gruaz-Gumowski

Ursula Hinz

Chantal Hulo

Janet James

Florence Jungo

Guillaume Keller

Philippe Lemercier

Damien Lieberherr

Patrick Masson

Ivo Pedruzzi

Catherine Rivoire

Bernd Roechert

Michel Schneider

Andre Stutz

Shyamala Sundaram

Michael Tognolli


Annotation Progress

All curators from the different UniProtKB teams (both based at the EBI and SIB) use the web-based Protein2GO editor maintained and developed by the UniProtKB-GOA team.

In total the UniProt group has provided 1,654 taxonomic groups with manual GO annotation.

Currently the curators from the GOA and BHF-UCL projects have completely annotated x of supplied Reference Genome Targets.

Contributions from the UniProtKB curation group:


Methods and strategies for annotation

  1. Literature curation:

Literature curation continues to be the major focus of our annotation efforts, with an emphasis on the use of experimental evidence codes.


  1. Computational annotation strategies:

GOA provides IEA annotations from the following methods:

  1. Swiss-Prot Keyword 2GO (SPKW2GO)1,2
  2. Swiss-Prot Subcellular Locations2GO (SPSL2GO) 1,2
  3. HAMAP2GO2
  4. InterPro2GO2
  5. EC2GO2
  6. Ensembl Compara

Legend

1: mapping tables created and maintained by the GOA group

2: electronic annotations generated by the GOA group, using UniProtKB.


  1. Priorities for annotation

UniProtKB curators annotate inline with UniProtKB priorities and curate to GO while carrying out UniProtKB annotation work.

The table showing the species prioritised for annotation is displayed below. This is in addition to the annotation projects involving animal toxins, submissions, proteins with 3D structures, enzymes, post-translational modifications. A number of these projects have required substantial work into developing appropriate GO Terms.

<diagram!>

The curators in the UniProtKB-GOA team continue to put emphasis on the annotation of those genes selected for the Reference Genome Project, user-feedback as well as annotations for the grant deliverables from British Heart Foundation and Kidney Research UK funding.

Presentations and Publications

a. Papers with substantial GO content

Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development. Deegan née Clark JI, Dimmer EC, Mungall CJ. BMC Bioinformatics. 2010 Oct 25;11:530.

b. Presentations including Talks and Tutorials and Teaching


Other Highlights

A. Ontology Development Contributions:

Apoptosis

B. Annotation Outreach and User Advocacy Efforts:

Tufts University; Human Fetal Development annotation collaboration. The UniProtKB-GOA group is providing annotation support to Heather Wick, a curator from Tufts University, who is working as a part of an NIH grant investigating proteins implicated in human fetal development (PI: Donna Slonim). Heather will use the UniProtKB-GOA protein2go curation tool and will have their manual annotations released via UniProtKB-GOA release pipelines into the UniProtKB and Human gene association files.

NTNU Annotations to gastrin genes submitted by the systems biology group at NTNU

Bacteriophage proteins Contact made with a mexican group investigating the possibility of using protein2go for annotation to bacteriophage proteins. No final decision reached.

APO-SYS Work carried out with the Apo-Sys EU Consortium, with a view to improving the annotations available for apoptotic proteins.

C. Other

Renal GO annotation initiative funded by Kidney Research UK. Requires Short list of activities carried out by Yasmin

Verification of mappings to UniProtKB accessions in GO Consortium gp2protein files

The GOA group continues to provide groups in the GO Consortium with checks of the UniProtKB accessions applied in gp2protein mapping files. Annotation groups receive an email to indicate where in their file a secondary or deleted UniProtKB has been used. This email also (where possible) indicates suitable replacement UniProtKB accessions. Such checks are run and results emailed to annotation groups on the first of each month.


QuickGO browser The browser continues to be developed.

'Changes to UniProtKB GOA gene association files'

May 2011

UniProtKB-GOA now incorporates annotations from external groups that use a GO reference (GO_REF) or a MOD-specific reference that can be converted to an equivalent GO_REF using the mappings defined in http://www.geneontology.org/doc/GO.references in their reference field. Previously, UniProtKB-GOA only accepted annotations that used a PubMed identifier in this field.

An example of a GO_REF that we are now accepting is GO_REF:0000015 'Use of the 'No biological data' (ND) evidence code for Gene Ontology terms'. A description and complete list of the GO_REFs available can be found at http://www.geneontology.org/cgi-bin/references.cgi.

April 2011

A greater diversity of identifiers in the 'with' field (column 8) of manual annotations and a rise in the number of Reference Genome annotations integrated.

Over the last month we have been working to provide a more complete display of the manual annotations that we integrate into the UniProtKB-GOA dataset from external annotation groups. Whereas previously the 'with' field (column 8) in our annotation file was left empty if a manual annotation did not include either UniProtKB or GO identifier, our files now displays 43 different gene, protein and chemical identifier types (such as WormBase, CHEBI and EcoCyc identifiers) in this field. This development ensures that integrated manual GO annotations display with the full set of information that curation groups have used when translating experimental data into a GO annotation.

The UniProtKB-GOA files also now contain a larger set of the manual annotations supplied by the GO Consortium's Reference Genome project (source: RefGenome). This project has generated inferred annotations for 47 species using GO Consortium manual annotations and phylogenetic trees from gene families. The Reference Genome project is fully described here: http://www.geneontology.org/GO.refgenome.shtml.


February 2011

Inferred Biological Process GO annotations now included in the UniProtKB-GOA annotation set.

We are pleased to announce an additional set of GO annotations available in this release that have been automatically generated from the Molecular Function (MF) -> Biological Process (BP) inter-ontology relationships present in the GO OBO v1.2 format.

As many GO users do not currently reason over the GO inter-ontology relationships, a set of inferred annotations has been generated to improve the consistency of the Biological Process annotation set. These GO annotations are produced when an annotation has been made (either manually or electronically) to a Molecular Function term that, either directly or via one of its parent terms, has a relationship to a Biological Process term and where the Process term (or one of its children) has not already been used in the annotation set for the same gene product identifier. This inferred annotation set applies the same gene product identifier, reference and evidence code as the asserted function annotation and are generated from all sources of GO annotations, with only 'NOT'-qualified annotations being excluded. All such inferred GO annotations can be identified by the 'GOC' value in the 'assigned_by' field (column 15).