BHF-UCL Progress Report December 2008
The aim of the Cardiovascular GO Annotation Initiative (BHF-UCL, British Heart Foundation – University College London) is to provide GO annotation to human cardiovascular-associated genes. This project represents a successful collaboration between University College London (UCL) and the European Bioinformatics Institute (EBI); the annotations created by the UCL-based curators are made directly into the GOA database at the EBI. 4000 human genes have been identified as associated with cardiovascular processes and annotation priorities are agreed on an annual basis in consultation with the Co-Grant holders, the International Scientific Advisory Committee and the UCL-based GO curators. The Initiative aims to comprehensively annotate 2500 genes in 5 years. BHF-UCL has been a GOC member since July 2008.
- Dr Ruth Lovering, 1 FTE – Curator, BHF grant to November 2012
- Dr Varsha Khodiyar, 0.8 FTE – Curator, BHF grant to May 2013
No funding by GOC NIHGRI grant
The annotation progress reflects the priority of this project to annotate human genes, with 4317 GO terms associated to 438 human proteins (29th September 2008). Across all species BHF-UCL have annotated 631 proteins with over 5000 GO terms.
Methods and strategies for annotation
(please note % effort on literature curation vs. computational annotation methods)
The aim of this Initiative is to provide complete and deep annotation of 300 human proteins per year. The following approaches are taken to achieve this:
- A protein-centric approach is taken to GO annotation.
- To ensure a rapid improvement in the annotations available for cardiovascular associated proteins the curators spend a maximum of one day researching the literature associated with each protein.
- If complete annotation cannot be achieved in a day, the protein record is marked as first pass complete. The intention is to revisit these first pass proteins, hopefully with some expert scientist input, in the following year.
- The approved gene symbol (and relevant gene and protein aliases) are used to query a variety of biomedical search engines, including NCBI PubMed, iHOP and GOPubMed, to identify suitable papers for the GO annotation of each target protein (with highly researched genes the search is usually limited to human entries only).
- The curators will usually associate GO terms to all of the human proteins mentioned in each paper read, depending on the experimental evidence available (occasionally GO terms are associated with non-human proteins too).
- Preference is given to the use of experimental-based evidence codes, however these are only used when the curator is completely confident of the identity of the protein and its derivative species.
- Reviews are also used to provide an overview of the characteristics of a protein and an insight into the complete set of GO terms required.
- Experimental data relating to model organism proteins maybe included in our GO annotation process, through the direct annotation of the model organism protein and the use of the ‘inferred by sequence similarity’ evidence code to transfer the information to the orthologous human protein.
- When experimentally supported literature is unobtainable, due to insufficient information about the species the protein is derived from, the lack of access to a referenced paper, or simply because the knowledge is considered so well accepted that references are not supplied, author statements are used.
- When possible we associate the chronologically first paper that provides experimental evidence for the characteristic features of a given human protein.
- We aim to capture the knowledge about each protein using a limited number of papers, with experimental evidence.
- We do not annotate all relevant papers, if this will lead to repeated duplication of GO terms associated to the protein.
- GO terms are chosen by querying the GO files with QuickGO or AmiGO.
- Before assigning a GO term, its definition and position within the ontology are checked to ensure its suitability.
- The GO editorial office is contacted, via SourceForge, when a new GO term is required, or modifications are needed to an existing GO term.
Computational annotation strategies:
Priorities for annotation:
Human genes involved in cardiovascular-related processes, as agreed by the International Scientific Advisory.
Presentations and Publications
Papers with substantial GO content:
- Cardiovascular GO Annotation Initiative Year 1 Report: Why Cardiovascular GO?, Ruth C Lovering, Emily Dimmer, Varsha K Khodiyar, Daniel G Barrell, Peter Scambler, Mike Hubank, Rolf Apweiler, Philippa J Talmud. Proteomics 2008 May; 8(10): 1950-3. PMID: 18491309.
- Immunology on the GO, Alexander D Diehl, Evelyn B Camon, Ruth C Lovering. Immunology News 2008 May; 15-21.
- The Gene Ontology - Providing a functional role in Proteomic Studies, Emily C Dimmer, Rachael P Huntley, Daniel G Barrell, David Binns, Sorin Draghici, Evelyn B Camon, Mike Hubank, Philippa J Talmud, Rolf Apweiler, Ruth C Lovering. Proteomics 2008 July Epub. PMID: 18634107.
- Access to immunology through the Gene Ontology, Ruth C Lovering, Evelyn B Camon, Judith A Blake, Alexander D Diehl. Immunology 2008 Oct;125(2):154-60. PMID: 18798919.
- Improvements to Cardiovascular Gene Ontology, Ruth C Lovering, Emily C Dimmer and Philippa J Talmud. Atherosclerosis 2008, Nov 1. Epub. PMID: 19046747.
pdfs available at www.cardiovasculargeneontology.com
Presentations including Talks and Tutorials and Teaching:
- Invited presentation (20 min) entitled: The Cardiovascular Gene Ontology Annotation Initiative, Cardiovascular Initiative Workshop, Human Proteome Organisation 7th Annual World Congress, August 2008 Amsterdam, Netherlands.
- Plenary Speaker (15 min) entitled: Immunology's time to GO, 21st Century Diseases of the Western World, British Society for Immunology Annual Congress, November 2008, Glasgow, UK.
- The Cardiovascular Gene Ontology Initiative, Ruth Lovering, Varsha Khodiyar, Daniel Barrell, Emily Dimmer, Peter Scambler, Mike Hubank, Rolf Apweiler, Philippa Talmud. 77th European Atherosclerosis Society Meeting, April 2008, Istanbul, Turkey.
- The Cardiovascular Gene Ontology Initiative, Ruth Lovering, Varsha Khodiyar, Daniel Barrell, Emily Dimmer, Peter Scambler, Mike Hubank, Rolf Apweiler, Philippa Talmud. UCL Cardiovascular Science and Medicine day, June 2008, London, UK.
- Participate in the Cardiovascular Gene Ontology Annotation Initiative, Ruth Lovering, Varsha Khodiyar, Emily Dimmer, Daniel Barrell, Peter Scambler, Mike Hubank, Rolf Apweiler, Philippa Talmud. Human Proteome Organisation 7th Annual World Congress, August 2008 Amsterdam, Netherlands.
- Gene Ontology - a way forwards, Ruth Lovering, Varsha Khodiyar, Peter Scambler, Mike Hubank, Rolf Apweiler, Philippa Talmud. British Society for Immunology Annual Congress, November 2008, Glasgow, UK. (Listed as an additional activity in program).
Ontology Development Contributions:
The BHF-UCL team have made 161 Source Forge request (to 10/12/08) which has led to the creation of 290 new GO terms. Many of these requests were specific to cardiovascular processes, for example several led to substantial improvements in the terms available to describe plasma lipoprotein particles and the processes involved in the assembly, disassembly, modification and transport of these particles. (Note that plasma lipoprotein particles are often indicators of cardiovascular disease risk).
Annotation Outreach and User Advocacy Efforts:
The UCL GO curators are closely associated with the Cardiovascular Genetics group at UCL and have given 2 presentations at their group meetings. During one of these group meetings the annotation of the APOA family was discussed and modifications to the annotation of these genes were made to reflect this discussion.
Six cardiovascular-related societies or groups have given their support to this project and now provide a direct link to our website and help with the promotion of our Newsletter. To date the Initiative has circulated three newsletters, in April 2008, July 2008, October 2008, by direct email to the International Advisory Committee and individuals who have expressed an interest in this project; by indirect email, though the mailing lists of several cardiovascular related societies, as hard copies at meetings and through our web site.