- 1 Gene Ontology Annotation at UniProtKB, 2011
- 2 Staff:
- 3 Annotation Progress
- 4 Methods and strategies for annotation
- 5 Presentations and Publications
- 6 Other Highlights
Gene Ontology Annotation at UniProtKB, 2011
Report on the GOA team's activities between May and November 2011
PI and Team Leaders: Rolf Apweiler, Maria Jesus-Martin, Claire O’Donovan
Ben Bely, Gayatri Chavali, Reija Hieta, Duncan Legge, Michele Magrane, Wei Mun Chan, Sandra Orchard, Klemens Pichler, Diego Poggioli, Harminder Sehra, Eleanor Stanley
Yasmin Alam-Faruque, Emily Dimmer, Rachael Huntley, Prudence Mutowo, Tony Sawford
PI and Team Leaders: Ioannis Xenarios, Lydie Bougueleret, Alan Bridge, Sylvain Poux
Ghislaine Argoud-Puy, Andrea Auchincloss, Kristian Axelsen, Marie-Claude Blatter, Emmanuel Boutet, Lionel Breuza, Elizabeth Coudert, Isabelle Cusin, Paula Duek Roggli, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Janet James, Florence Jungo, Guillaume Keller , Philippe Lemercier, Damien Lieberherr, Patrick Masson, Ivo Pedruzzi, Catherine Rivoire, Bernd Roechert, Michel Schneider, Andre Stutz, Shyamala Sundaram, Michael Tognolli
All curators from the different UniProt teams (based at the EBI and SIB) use the web-based Protein2GO editor maintained and developed by the UniProtKB-GOA team.
In total the UniProt group has provided 387,667 taxonomic groups with GO annotation. Since May 2011 UniProtKB curators from the EBI and SIB locations have together contributed 21,954 manual GO annotations.
Currently the curators from the GOA and BHF-UCL projects have together completely annotated 76% of supplied human Reference Genome Targets.
Methods and strategies for annotation
- Literature curation:
Literature curation continues to be the major focus of our annotation efforts, with an emphasis on the use of experimental evidence codes.
- Computational annotation strategies:
UniProtKB provides IEA annotations using the following methods that use information extracted from UniProtKB, InterPro, the GO ontology, Ensembl and EnsemblCompara:
- UniProtKB Keyword 2GO (SPKW2GO)1,2
- UniProtKB Subcellular Locations2GO (SPSL2GO) 1,2
- Ensembl Compara
- EnsemblPlants Gramene Compara
1: mapping tables created and maintained by the UniProtKB group
2: electronic annotations generated by the UniProtKB group, using external resources combined with UniProtKB annotations.
- Priorities for annotation
UniProtKB curators annotate inline with UniProtKB priorities and curate to GO while carrying out UniProtKB annotation work.
The table showing the species prioritised for annotation is displayed below. This is in addition to the annotation projects involving animal toxins, submissions, proteins with 3D structures, enzymes, post-translational modifications. A number of these projects have required substantial work into developing appropriate GO Terms.
The curators in the UniProtKB-GOA team continue to put emphasis on the annotation of those genes selected for the Reference Genome Project, user-feedback as well as annotations for the grant deliverables from British Heart Foundation and Kidney Research UK funding.
Presentations and Publications
a. Papers with substantial GO content
1. Dimmer, E.C, Huntley R.P., Alam-Faruque Y., Sawford T., O’Donovan C., Martin M.J., Auchincloss A., Axelsen K., Argoud-Puy G., Bely B., Blatter M-C., Boutet E., Braconi-Quintaje S., Breuza L., Bridge A., Browne, P., Chan, W.M., Coudert, E., Cusin, I., Duek- Roggli P., Eberhardt E., Estreicher A., Famiglietti L., Ferro-Rojas S., Feuermann M., Gardner M., Gos A., Gruaz-Gumowski N., Hinz U., Hulo C., James J., Jimenez S., Jungo F., Keller G., Laiho K., Legge D., Lemercier P., Lieberherr D., Magrane M., Masson P., Moinat M., Pedruzzi I., Pichler K., Poggioli D., Poux S., Rivoire C., Roechert B., Schneider M., Sehra H., Stanley E., Stutz A., Sundaram S., Tognolli M., Bougueleret L., Xenarios I. and Apweiler, R. (2011) The UniProt-GO Annotation database in 2011 Nucleic Acids Res. [Manuscript accepted for publication]
2. Alam-Faruque, Y., Huntley, R.P, Khodiyar, V.K., Camon, E.B., Dimmer, E.C., Sawford, T., O’Donovan, C., Martin, M.J., Talmud, P.J., Scambler, P., Apweiler, R. and Lovering, R.L. (2011) The impact of focused Gene Ontology curation of specific mammalian systems. PLoS One [Manuscript accepted for publication]
b. Book Chapters
c. Presentations including Talks and Tutorials and Teaching
6-8th June: Joint British Renal Society/ Renal Association; Birmingham: Yasmin presented a poster.
June: Joint EBI-Wellcome Trust Summer School in Bioinformatics: Gene Ontology talk: Emily
2nd November: EBI Open Day: Life as a scientific database curator talk (UniProtKB/ UniProt-GOA): Yasmin
Prudence Mutowo joined UniProt as a full-time UniProt-GOA curator
A. Ontology Development Contributions:
Terms and content meeting organisation to improve the Apoptosis node. An annotation collaboration is underway with Pablo Porras Millan from the EBI Proteomics group towards improving annotations for the apoptosis pathway (Apo-Sys Consortium), which involved organising an Apoptosis GO Content Meeting during the Apo-Sys Consortium meeting at the EBI.
Creation of a new external2GO mapping file that uses the UniPathway resource is continuing and requiring GO content development effort as currently 366 UniPathway terms do not have an equivalent in GO e.g. palmatine biosynthesis
The development of new terms relating to aspects of kidney development is ongoing with the creation of a further 33 terms created earlier this year bringing the total number of kidney development terms to 479 (from 446 last year). The total list of terms can viewed at .
B. Annotation Outreach and User Advocacy Efforts:
dictyBase Curators manually annotating to GO in the dictyBase group have been moved across to using UniProtKB's GO curation tool: Protein2GO. UniProtKB-GOA now generates a gene association file for Dictyostellium that dictyBase will modify and enhance for their user community.
Tufts University; Human Fetal Development annotation collaboration. The UniProtKB-GOA group continues to provide annotation support to Heather Wick, a curator from Tufts University, who is working as a part of an NIH grant investigating proteins implicated in human fetal development (PI: Donna Slonim). Heather will use the UniProtKB-GOA protein2go curation tool and will have their manual annotations released via UniProtKB-GOA release pipelines into the UniProtKB and Human gene association files.
NTNU - Trondheim, Norwegian University of Science and Technology Annotations to gastrin genes submitted by the systems biology group at NTNU
Renal GO annotation initiative funded by Kidney Research UK.
Two newsletters have been sent to the renal community in July 2011 and October 2011 highlighting the progress of the Renal GOA Initiative.
Annotation is now complete for the list of gene products provided by the GUDMAP Consortium Edinburgh team.
The current annotation priority is that of a microarray specific renal target list (61/88 genes with little experimental GO annotation)(from Gene Expression Profiling in Glomeruli From Human Kidneys With Diabetic Nephropathy Hans J. Baelde et al, PMID 15042541). These proteins are being curated by Yasmin and Rachael.
Prudence is working on annotation of the Peroxisome (also called microbodies) which are organelles found in virtually all eukaryotic cells. They are involved in the catabolism of very long chain fatty acids, branched chain fatty acids, D-amino acids, polyamines, and biosynthesis of plasmalogens, etherphospholipids.
Yasmin attended the Kidney Research Fellows Day in September 2011 held at the Royal Society in Edinburgh.
Verification of mappings to UniProtKB accessions in GO Consortium gp2protein files
The GOA group continues to provide groups in the GO Consortium with checks of the UniProtKB accessions applied in gp2protein mapping files. Annotation groups receive an email to indicate where in their file a secondary or deleted UniProtKB has been used. This email also (where possible) indicates suitable replacement UniProtKB accessions. Such checks are run and results emailed to annotation groups on the first of each month.
The browser continues to be developed with a current focus on improving the usability of the tool. In line with this, changes to the layouts of some pages has improved, and documentation for certain features has been added.
Protein2GO curation tool
A new version of the Protein2GO tool was released to all curators in October. With a number of changes to improve the layout of the annotations displayed, improved display of controls for updating or transferring new annotations and greater integration of term-specific link outs to the QuickGO Ancestor Chart.
Link out to the IntAct editor are available to IntAct-trained curators.
New sanity checks have been included to reduce redundancy in annotation and to warn curators when inferred GO annotations could be created to an annotation set.
In addition, the tool now provides the ability for full-time GO curators to contribute to the annotation extension field (column 16); this is optionally displayed and will be automatically suppressed for all UniProt curators unless there is specific interest.
Changes to UniProtKB GOA gene association files
Annotation Extension field data from the UniProtKB source
Data in the annotation extension field (column 16) has started to be supplied in released manual GO annotations from the UniProtKB group.
Decrease in IntAct annotation set
From the August release, the UniProt GO annotation files contained a greatly reduced number of protein binding GO annotations from the IntAct database.
A subset of presumed reliable interactions is now extracted from the IntAct dataset with export determined using a simple scoring system developed by IntAct, coupled to a score threshold that has been deliberately chosen to exclude interactions supported by only one experimental observation. Further details of how interactions are scored can be found at the IntAct website (http://www.ebi.ac.uk/intact/pages/faq/faq.xhtml#4). This simple score-based filter is used in combination with a set of defined rules that excludes certain types of data, such as interactions that have been inferred but not experimentally proven.
New IEA annotation pipeline
UniProtKB-GOA are pleased to announce the inclusion in their database of electronic GO annotations created by EnsemblPlants/Gramene. The annotations are created by projection of GO annotations from Arabidopsis thaliana or Oryza sativa proteins onto proteins from one or more target species based on gene orthology obtained from Ensembl Compara. This first release contains almost 230,000 annotations to over 50,000 proteins covering 16 taxonomies including; poplar, maize, sorghum, grape and Physcomitrella. We hope this will be a valuable resource for the non-model plant species community. The annotations can be viewed and downloaded from the QuickGO browser here.
Move to UniProt Complete Proteome sets
With the imminent closure of the International Protein Index (IPI), the Human, Mouse, Rat, Zebrafish, Chicken and Cow UniProt GO annotation files (files named: gene_association.goa_[species], e.g., gene_association.goa_human) now uses UniProt Complete Proteome sets to determine the protein composition of these files.
This change has had a affect on all files, in particular the gene_association.goa_human file, which has increased in annotation count by 43.7% , as the file now includes GO annotations both to reviewed (Swiss-Prot) and unreviewed (TrEMBL) UniProtKB accessions. Any user wishing to only identify the reviewed (Swiss-Prot) UniProt protein annotation subset will be able continue to do so using the information supplied in the gp_information.goa_uniprot file, which can be found here. Alternatively, users can download the reviewed UniProtKB human GO annotation set from the UniProt QuickGO browser using this link.
New Species-specific, non-redundant UniProt files Species-specific UniProt GO annotation gene association files that include a filtering step to remove redundant electronic GO annotation predictions are now available from the GOA ftp site for the UniProt Complete Proteome sets of Dictyostellium discoideum (gene_association.goa_dicty.gz), Canis familiaris (gene_association.goa_dog.gz), Drosophila melanogaster (gene_association.goa_fly.gz), Caenorhabditis elegans (gene_association.goa_worm.gz), Saccharomyces cerevisiae (gene_association.goa_yeast) and Sus scrofa (gene_association.goa_pig.gz).
Inferred Cellular Component GO annotations now included in the UniProtKB-GOA annotation set.
We are pleased to announce an additional set of Cellular Component GO annotations available in this release that have been automatically generated from the 'occurs_in' relationship, made available as intersection tags in Biological Process terms in the GO OBO v1.2 format.
[Term] id: GO:0033579 ! protein amino acid galactosylation in endoplasmic reticulum intersection_of: GO:0042125 ! protein amino acid galactosylation intersection_of: occurs_in GO:0005783 ! endoplasmic reticulum
Annotations Included from the PAINT GO annotation Project
Annotations that apply the new manual GO evidence codes IBA, IBD, IKR and IRD are now available in the UniProtKB-GOA annotation set. For further information on these recently created manual evidence codes, please consult the GO website for code definitions: http://www.geneontology.org/GO.evidence.shtml. These types of evidenced GO annotations are currently being created by the GO Consortium's Reference Genome project, identified by the 'RefGenome' value in the assigned_by field (column 15).
New automatic annotation pipeline: UniPathway
A UniPathway2GO mapping file being developed in collaboration with Anne Morgat.