UniProtKB-GOA May 2011
Gene Ontology Annotation at UniProtKB, 2011
Report on the GOA team's activities between September 2010 and May 2011
PI and Team Leaders: Rolf Apweiler, Maria Jesus-Martin, Claire O’Donovan
Ben Bely, Gayatri Chavali, Michael Gardner, Reija Hieta, Duncan Legge, Michele Magrane, Wei Mun Chan, Sandra Orchard, Klemens Pichler, Diego Poggioli, Harminder Sehra, Eleanor Stanley
Yasmin Alam-Faruque, Emily Dimmer, Rachael Huntley, Tony Sawford
PI and Team Leaders: Ioannis Xenarios, Lydie Bougueleret, Alan Bridge, Sylvain Poux
Ghislaine Argoud-Puy, Andrea Auchincloss Damay, Kristian Axelsen, Marie-Claude Blatter, Emmanuel Boutet, Lionel Breuza, Elizabeth Coudert, Isabelle Cusin, Paula Duek Roggli, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Janet James, Florence Jungo, Guillaume Keller , Philippe Lemercier, Damien Lieberherr, Patrick Masson, Ivo Pedruzzi, Catherine Rivoire, Bernd Roechert, Michel Schneider, Andre Stutz, Shyamala Sundaram, Michael Tognolli
All curators from the different UniProtKB teams (both based at the EBI and SIB) use the web-based Protein2GO editor maintained and developed by the UniProtKB-GOA team.
In total the UniProt group has provided 1,654 taxonomic groups with manual GO annotation. Since September 2010 UniProtKB curators from the EBI and SIB locations have together contributed 21,106 GO annotations since September 2010(this figure excludes the UniProtKB-GOA annotation effort that has supplied 13,062 annotations during the same period).
Currently the curators from the GOA and BHF-UCL projects have together completely annotated 76% of supplied human Reference Genome Targets.
Changes to the contents of the UniProtKB GO annotation file between September 2010 and May 2011:
N.B. 1. starred sources indicate those new to the file in the specified period.
2. Changes to annotations reflect the increasing number of QC checks, including RegExps checking for the correct format of identifiers contributed from external annotation groups and present in column 8 ('with')
3. Additional annotations from external groups were increased due to improved integration of annotations whose reference can be mapped to a publicly-described GO_REF reference.
4. HAMAP2GO. Current issues with this pipeline have caused this method to over-predict annotations to unexpected species. Errors in this pipeline are being investigated
Methods and strategies for annotation
- Literature curation:
Literature curation continues to be the major focus of our annotation efforts, with an emphasis on the use of experimental evidence codes.
- Computational annotation strategies:
UniProtKB provides IEA annotations using the following methods that use information extracted from UniProtKB, InterPro, the GO ontology and Ensembl:
- Swiss-Prot Keyword 2GO (SPKW2GO)1,2
- Swiss-Prot Subcellular Locations2GO (SPSL2GO) 1,2
- Ensembl Compara
1: mapping tables created and maintained by the UniProtKB group
2: electronic annotations generated by the UniProtKB group, using external resources combined with UniProtKB annotations.
- Priorities for annotation
UniProtKB curators annotate inline with UniProtKB priorities and curate to GO while carrying out UniProtKB annotation work.
The table showing the species prioritised for annotation is displayed below. This is in addition to the annotation projects involving animal toxins, submissions, proteins with 3D structures, enzymes, post-translational modifications. A number of these projects have required substantial work into developing appropriate GO Terms.
The curators in the UniProtKB-GOA team continue to put emphasis on the annotation of those genes selected for the Reference Genome Project, user-feedback as well as annotations for the grant deliverables from British Heart Foundation and Kidney Research UK funding.
Presentations and Publications
a. Papers with substantial GO content
Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development. Deegan née Clark JI, Dimmer EC, Mungall CJ. BMC Bioinformatics. 2010 Oct 25;11:530.
b. Book Chapters
Practical Applications of the Gene Ontology Resource. Rachael P. Huntley, Emily C. Dimmer and Rolf Apweiler. In Problem Solving Handbook in Computational Biology and Bioinformatics. Heath, Lenwood S.; Ramakrishnan, Naren (Eds.) 2011, Part 5, 319-339, DOI: 10.1007/978-0-387-09760-2_15. ISBN 978-0-387-09759-6
c. Presentations including Talks and Tutorials and Teaching
3rd September 2010 GO poster at Kidney Research UK Fellows Day Meeting in Coventry (Yasmin)
13th October 2010 GOA annotation presentation at the Industry Ontology and Engineering Workshop (Yasmin)
2nd November 2010 GO poster at EBI Open Day (Yasmin and Rachael)
10th December 2010 GO presentation at the Wellcome Trust Proteomics Course (Rachael)
26th January 2011 GO annotation presentation at the SysKid Annual Meeting, Innesbruck (Yasmin)
15th March 2011 GO poster at EBI Open Day (Yasmin and Rachael)
10th May 2011 GO annotation presentation at an InterPro tutorial, EBI(Rachael)
A. Ontology Development Contributions:
Terms and content meeting organisation to improve the Apoptosis node.
Creation of a new external2GO mapping file that uses the UniPathway resource is requiring GO content development effort as currently 366 UniPathway terms do not have an equivalent in GO e.g. palmatine biosynthesis
The development of new terms relating to aspects of kidney development is ongoing with the creation of a further 23 terms created earlier this year bringing the total number of kidney development terms to 470 (from 446 last year). This represents ~1.4% of all the current terms in the Gene Ontology. The total list of terms can viewed at . Also there were improvements made to renal physiology terms describing diuresis and natriuresis in response to a SourceForge request made by Cynthia Smith . This was done in close collaboration with renal experts via a webex conference call and a face-to-face meeting with them. It was decided to delete the existing terms diuresis ; GO:0030146 The process of renal water excretion and natriuresis ; GO:0030147 The process of renal sodium excretion and to replace them with positive regulation of urine volume ; GO:NEW to describe diuresis and positive regulation of renal sodium excretion ; GO:NEW to describe natriuresis. Further details can be found at .
B. Annotation Outreach and User Advocacy Efforts:
dictyBase Curators manually annotating to GO in the dictyBase group have been moved across to using UniProtKB's GO curation tool: Protein2GO. UniProtKB-GOA will generate a gene association file for Dictyostellium that dictyBase will modify and enhance for their user community.
Tufts University; Human Fetal Development annotation collaboration. The UniProtKB-GOA group is providing annotation support to Heather Wick, a curator from Tufts University, who is working as a part of an NIH grant investigating proteins implicated in human fetal development (PI: Donna Slonim). Heather will use the UniProtKB-GOA protein2go curation tool and will have their manual annotations released via UniProtKB-GOA release pipelines into the UniProtKB and Human gene association files.
NTNU - Trondheim, Norwegian University of Science and Technology Annotations to gastrin genes submitted by the systems biology group at NTNU
Bacteriophage proteins Contact made with a mexican group investigating the possibility of using protein2go for annotation to bacteriophage proteins. No final decision reached.
APO-SYS Work carried out with the Apo-Sys EU Consortium, with a view to improving the annotations available for proteins involved in the apoptotic pathways.
Renal GO annotation initiative funded by Kidney Research UK. Requires Short list of activities carried out by Yasmin
Yasmin has presented the Renal GOA Initiative at the EBI's Industry Ontology and Engineering Workshop in October 2010 and at the Syskid Consortium Meeting in January 2011 held in Innsbruck. A poster was also presented at the KRUK Fellows Day in September 2010. The progress of the Renal Initiative has been reported to the Scientific Advisory Panel and members of the renal community in 3 quarterly newsletters (October 2010, January 2011 and April 2011, currently visible at ). The KRUK target list has increased to 2359 genes/proteins which includes human, mouse and canine orthologues. Annotation has been ongoing for priority target genes involved in kidney development, podocyte function, Loop of Henle and a list of target canine kidney genes showing diferential expression in reponse to neuropilin (provided by Prof Herbert Shramekar from Innsbruck). Annotation of the list of target genes provided by the GUDMAP Consortium with no GO annotation has been completed. Yasmin has been closely involved with curators, ontology editors and new field experts on development of the insect and xenopus related kidney development GO terms. Organized and attended a webex meeting with renal experts to discuss aspects of renal physiology relating to diuresis and natriuresis. An interim grant report was written and submitted to KRUK in October 2010. Yasmin has been involved in answering various goa help emails. Also been involved in the apoptosis target gene annotation and various associated phone conference meetings. She has also been working on manuscripts with Ruth Lovering and David Hill.
Verification of mappings to UniProtKB accessions in GO Consortium gp2protein files
The GOA group continues to provide groups in the GO Consortium with checks of the UniProtKB accessions applied in gp2protein mapping files. Annotation groups receive an email to indicate where in their file a secondary or deleted UniProtKB has been used. This email also (where possible) indicates suitable replacement UniProtKB accessions. Such checks are run and results emailed to annotation groups on the first of each month.
The browser continues to be developed:
- Improvements to the co-occurring terms table, including further explantation
- Improvements to the term search results display (obsolete term, synonym and defintion views)
- Improvements to support developing user-defined GO slims via the 'Your Terms' basket.
- There is now the option, within the Ancestor Chart, to show the direct child terms of the term being viewed. When viewing your chosen term in the Ancestor Chart display, click on 'Display' on the right of the page and change the 'show children' option from the default 'Hide' to 'Show', click on OK and the Ancestor Chart will display all of the direct child terms for the term you are viewing.
- The display of the has_part relationship in the Ancestor Chart has been adjusted to highlight that this relationship should be read in the opposite direction to the other relationships.
- QuickGO now provides links to the IntAct interaction database from GO protein complex term pages. For example, see 'septin complex' which links out to the curated protein complex in IntAct.
- The ancestor table view from the GO term page has been removed as we felt this is a confusing view of the ancestry of a GO term and can sometimes be inaccurate in its representation. The ancestor chart view will remain.
- The 'goslim_goa' predefined GO term set has been removed. It was felt that this term set now provides an inadequate top-level representation of the GO ontologies that could provide users with misleading results. A good alternative GO term set to use is the 'goslim_generic', which has recently been updated and refined to give a good coverage of general biological features within the Biological Process and Cellular Component ontologies.
Changes to UniProtKB GOA gene association files
UniProtKB-GOA now incorporates annotations from external groups that use a GO reference (GO_REF) or a MOD-specific reference that can be converted to an equivalent GO_REF using the mappings defined in http://www.geneontology.org/doc/GO.references in their reference field. Previously, UniProtKB-GOA only accepted annotations that used a PubMed identifier in this field.
An example of a GO_REF that we are now accepting is GO_REF:0000015 'Use of the 'No biological data' (ND) evidence code for Gene Ontology terms'. A description and complete list of the GO_REFs available can be found at http://www.geneontology.org/cgi-bin/references.cgi.
The Annotation Extension field (column 16) has been populated with the MGI data describing the cellular context of a GO annotation (using the Cell Type ontology)
Over the last month we have been working to provide a more complete display of the manual annotations that we integrate into the UniProtKB-GOA dataset from external annotation groups. Whereas previously the 'with' field (column 8) in our annotation file was left empty if a manual annotation did not include either UniProtKB or GO identifier, our files now displays 43 different gene, protein and chemical identifier types (such as WormBase, CHEBI and EcoCyc identifiers) in this field. This development ensures that integrated manual GO annotations display with the full set of information that curation groups have used when translating experimental data into a GO annotation.
The UniProtKB-GOA files also now contain a larger set of the manual annotations supplied by the GO Consortium's Reference Genome project (source: RefGenome). This project has generated inferred annotations for 47 species using GO Consortium manual annotations and phylogenetic trees from gene families. The Reference Genome project is fully described here: http://www.geneontology.org/GO.refgenome.shtml.
Inferred Biological Process GO annotations now included in the UniProtKB-GOA annotation set.
We are pleased to announce an additional set of GO annotations available in this release that have been automatically generated from the Molecular Function (MF) -> Biological Process (BP) inter-ontology relationships present in the GO OBO v1.2 format.
As many GO users do not currently reason over the GO inter-ontology relationships, a set of inferred annotations has been generated to improve the consistency of the Biological Process annotation set. These GO annotations are produced when an annotation has been made (either manually or electronically) to a Molecular Function term that, either directly or via one of its parent terms, has a relationship to a Biological Process term and where the Process term (or one of its children) has not already been used in the annotation set for the same gene product identifier. This inferred annotation set applies the same gene product identifier, reference and evidence code as the asserted function annotation and are generated from all sources of GO annotations, with only 'NOT'-qualified annotations being excluded. All such inferred GO annotations can be identified by the 'GOC' value in the 'assigned_by' field (column 15).
New automatic annotation pipeline: UniPathway
A UniPathway2GO mapping file being developed in collaboration with Anne Morgat.