BHF-UCL December 2012
BHF-UCL Summary, December 2012
The aim of the Cardiovascular GO Annotation Initiative (BHF-UCL, British Heart Foundation – University College London) is to provide GO annotation to human cardiovascular-associated genes. This project represents a successful collaboration between University College London (UCL) and the European Bioinformatics Institute (EBI); the annotations created by the UCL-based curators are made directly into the GOA database at the EBI. 4000 human genes have been identified as associated with cardiovascular processes and annotation priorities are agreed on an annual basis in consultation with the Co-Grant holders, the International Scientific Advisory Committee and the UCL-based GO curators. BHF-UCL has been a GOC member since July 2008.
- Dr Ruth Lovering, 1 FTE – UCL-based curator, BHF grant to November 2012, BHF scholarship to February 2014.
- Dr Varsha Khodiyar, 0.6 FTE – UCL-based curator, BHF grant to February 2013
- Tony Sawford, 0.25 FTE – EBI-based Software engineer, BHF grant to November 2012
* No funding via GOC NIHGRI grant
The annotation progress reflects the priority of this project to annotate human genes, with 19028 GO terms associated to 2190 human proteins (1st November 2007 to 24th November 2012). Across all species BHF-UCL have annotated 3,781 proteins with 27, 548 GO terms.
Methods and strategies for annotation
(please note % effort on literature curation vs. computational annotation methods)
Literature curation(100%): The aim of this Initiative is to provide complete and deep annotation of 300 human proteins per year. This is achieved through both protein-centric and process-centric targeting of proteins to annotate. The process-centric annotation enables the curators to gain a better understanding of the targeted a process. The protein-centric annotation is undertaken when annotating proteins on a specific cardiovascular relevant list, such as a Genome-Wide Association Study. In addition, we annotate proteins following requests from cardiovascular scientists or when annotated by attendees of our MSc module or 2-day annotation workshops. The following approaches are taken to achieve this:
- To ensure a rapid improvement in the annotations available for a large number of cardiovascular associated proteins the curators spend a maximum of one day researching the literature associated with each protein.
- The protein will be marked as ‘complete’ if the curator feels there are no further terms to add.
- If complete annotation cannot be achieved in a day, the protein record is marked as first pass complete. The intention is to revisit these first pass proteins, hopefully with some expert scientist input, in the following year.
- The approved gene symbol (and relevant gene and protein aliases) are used to query a variety of biomedical search engines, including NCBI PubMed, iHOP and GOPubMed, to identify suitable papers for the GO annotation of each target protein (with highly researched genes the search is usually limited to human entries only).
- The curators will usually associate GO terms to all of the human proteins mentioned in each paper read, depending on the experimental evidence available (occasionally GO terms are associated with non-human proteins too).
- Preference is given to the use of experimental-based evidence codes, however these are only used when the curator is completely confident of the identity of the protein and its derivative species.
- Reviews are also used to provide an overview of the characteristics of a protein and an insight into the complete set of GO terms required.
- Experimental data relating to model organism proteins maybe included in our GO annotation process, through the direct annotation of the model organism protein and the use of the ‘inferred by sequence similarity’ evidence code to transfer the information to the orthologous human protein.
- When experimentally supported literature is unobtainable, due to insufficient information about the species the protein is derived from, the lack of access to a referenced paper, or simply because the knowledge is considered so well accepted that references are not supplied, author statements are used.
- When possible we associate the chronologically first paper that provides experimental evidence for the characteristic features of a given human protein.
- We aim to capture the knowledge about each protein using a limited number of papers, with experimental evidence.
- We do not annotate all relevant papers, if this will lead to repeated duplication of GO terms associated to the protein.
- GO terms are chosen by querying the GO files with QuickGO or AmiGO.
- Before assigning a GO term, its definition and position within the ontology are checked to ensure its suitability.
- The GO editorial office is contacted, via SourceForge, when a new GO term is required, or modifications are needed to an existing GO term.
Computational annotation strategies
Priorities for annotation
Human genes involved in cardiovascular-related processes, as agreed by the International Scientific Advisory Panel. During the past year we have been focusing on the annotation of cardiac conduction associated genes and the human orthologs of genes that are associated with the process of heart jogging in zebrafish.
Presentations and Publications
Papers with substantial GO content
- The Impact of Focused Gene Ontology Curation of Specific Mammalian Systems. Yasmin Alam-Faruque, Rachael P. Huntley, Varsha K. Khodiyar, Evelyn B. Camon, Emily C. Dimmer, Tony Sawford, Maria J. Martin, Claire O'Donovan, Philippa J. Talmud, Peter Scambler, Rolf Apweiler, Ruth C. Lovering. 2011 PLoS ONE 6(12): e27541. doi:10.1371/journal.pone
Presentations including Talks and Tutorials and Teaching
- Ruth presented a 15 minute talk entitled ‘HVP and GO annotation’ during the Cardiac Genetics Interest Satellite Group meeting, within the Human Variome Project Biennial Meeting, Paris, June 2012.
- The BHF-UCL GO curators are closely associated with the Cardiovascular Genetics group at UCL and have given 6 presentations at their group meetings.
- Ruth presented and a poster entitled ‘Can HVP GO further? Expanding human Gene Ontology…’ at the Human Variome Project Biennial Meeting, Paris, June 2012.
- Ruth and Varsha presented a poster entitled ‘The impact of focused Gene Ontology annotation efforts on high-throughput data analysis’ at the UCL Computational Life and Medical Sciences Symposium, June 2012.
Ontology Development Contributions
During the past year Ruth has been involved in the cardiac conduction development working group. The initial working group met in November 2011 at UCL, this meeting brought together GO curators, editors and several UCL based expert scientists. To date 110 GO terms have been created as part of this ontology effort. In addition, since 1st December 2011 the BHF-UCL team have made 58 Source Forge request (to 13/12/11), the majority of which were relevant to cardiovascular processes, for example endocardial cushion cell differentiation, and atrioventricular canal development and protein localization to T-tubule. This brings the total number of GO terms created/requested by the BHF-UCL team to almost 1700.
- Newsletters: this year the BHF-UCL team has circulated four newsletters, in January, April, July, and October by direct email to the International Advisory Committee and individuals who have expressed an interest in this project; by indirect email, though the mailing lists of several cardiovascular related societies and to the UCL Department of Medicine mailing list and through our web site.
- MSc Project Student, Greg Rowe: the focus of his project was to capture the role of NOTCH signalling pathway genes in heart development. His review of 32 papers created 588 annotations to 60 proteins. The majority of the experimentally supported annotations were to mouse proteins, however this information has been transferred to the human ortholog whenever possible.
- Website redesign: Varsha has been working with Sonja Van Pragg, from Information Systems, to create a new look for our website, in keeping with the UCL web corporate identity.