UCL December 2014

From GO Wiki
Jump to: navigation, search


Overview

The aim of the University College London (UCL)-based annotation team is to provide GO annotation to human cardiovascular or Parkinson's disease relevant genes, as well as to submit protein-protein interaction data to IntAct. These projects are funded by the British Heart Foundation (BHF) and Parkinson’s UK (with the attribution BHF-UCL and Parkinsons UK-UCL, respectively). We have a successful collaboration between several UCL-based research groups, the European Bioinformatics Institute (EBI) and King's College, London. The annotations created by the UCL-based curators are made directly into the GOA database or the IntAct database at the EBI. 4000 human genes have been prioritised due to their association with cardiovascular processes. The priority gene list relevant to Parkinson's disease (PD) is currently at 214 and being actively developed. Annotation priorities are agreed on a regular basis in consultation with the Co-Grant holders, our International Scientific Advisory Committees and the UCL-based GO curators. The UCL-annotation team has been a GOC member since July 2008.


Staff

  • Dr Ruth Lovering, 1 FTE – UCL-based curator, BHF scholarship to February 2014, UCL funding to July 2018
  • Dr Nancy Campbell, 1 FTE – UCL-based IntAct and GO curator, BHF grant to July 2018 (returned from maternity leave in September 2014)
  • Dr Anna Melidoni, 1 FTE – UCL-based IntAct and GO curator, BHF grant (to 30 April 2014)
  • Dr Milagros Rodríguez-López 1 FTE – UCL-based IntAct curator, BHF grant (to 31 August 2014)
  • Dr Paul Denny 0.5 FTE – UCL-based curator and Parkinson's project co-ordinator, Parkinson's UK grant to December 2016 (joined 1 January 2014)
  • Dr Rebecca Foulger 0.8 FTE – UCL-based curator, Parkinson's UK grant to October 2016 (joined 1 May 2014)
  • Dr Rachael Huntley 1 FTE – UCL-based curator, BHF grant to July 2018 (joined 10 November 2014)
  • Tony Sawford, 0.35 FTE – EBI-based Software engineer, BHF grant to July 2018, Parkinson's UK grant to December 2016


No funding via GOC NIHGRI grant


Annotation Progress

Ruth has trained 3 curators in GO annotation, and the annotation progress reflects the priority of this project to annotate human genes.

To 22 November 2014, across all species BHF-UCL have annotated 4,395 proteins with 31,794 GO terms (including 21,954 terms manually associated with 2,438 human proteins), and Parkinson’s UK-UCL have annotated 480 proteins with 2,329 GO terms (including 1,355 terms manually associated with 256 human proteins).


Methods and strategies for annotation

Literature curation (100%):

- We annotate with both a protein-centric and process-centric focus. The process-centric annotation enables the curators to gain a better understanding of the process under focus.

- The protein-centric annotation is undertaken when annotating proteins on a specific cardiovascular or Parkinson’s-relevant list, such as loci identified by Genome-Wide Association Studies (GWAS). In addition, we annotate proteins following requests or suggestions from scientists working on cardiovascular disease or Parkinson’s Disease, or when checking annotations by attendees of our MSc module or 2-day annotation workshops.

- The protein-centric annotation is undertaken when annotating proteins on a specific cardiovascular or Parkinson’s-relevant list, such as loci identified by Genome-Wide Association Studies (GWAS). In addition, we annotate proteins following requests or suggestions from scientists working on cardiovascular disease or Parkinson’s Disease, or when checking annotations by attendees of our MSc module or 2-day annotation workshops.


Both the cardiovascular and Parkinson’s projects focus on the annotation of human proteins, and the following approaches are taken:

  • The approved gene symbol (and relevant gene and protein aliases) are used to query a variety of biomedical search engines, (including PubMed, and iHOP), and databases (e.g. UniProtKB, IntAct) to identify suitable papers for the GO annotation of each target protein (with highly researched genes the search is usually limited to human entries only). Papers are also identified based on communications with researchers in the field and papers highlighted at relevant conferences and workshops.
  • The curators will usually associate GO terms with all of the human proteins mentioned in each paper, depending on the experimental evidence available (occasionally GO terms are associated with non-human proteins too).
  • To ensure a rapid improvement in the annotations available for a large number of human proteins the curators aim to spend a maximum of one day researching the literature associated with each protein.
  • The protein is marked as ‘complete’ if the curator feels that a comprehensive annotation of the protein has been achieved.
  • Preference is given to the use of experimental-based evidence codes, however these are only used when the curator is completely confident of the identity of the protein and its derivative species.
  • Experimental data relating to model organism proteins may be included in the annotation of human proteins, through the direct annotation of the model organism protein and the use of the ‘inferred by sequence similarity’ evidence code to transfer the information to the orthologous human proteins. This is only undertaken when there is clear evidence for orthology but this orthology is not identified by the Ensembl-compara annotation pipeline.
  • When experimentally supported literature is unobtainable, due to insufficient information in the paper about the species the protein is derived from, the lack of access to a referenced paper, or simply because the knowledge is considered so well accepted that references are not supplied, author statements are applied.
  • Reviews are also used to provide an overview of the characteristics of a protein and an insight into the complete set of GO terms required, when the volume of literature is too extensive to interrogate within the time available.
  • We aim to capture the knowledge about each protein using a limited number of papers, with experimental evidence; we do not annotate all relevant papers, if this will lead to repeated duplication of GO terms associated to the protein.
  • GO terms are chosen by querying the GO files with QuickGO, AmiGO or OBO-Edit.
  • Before assigning a GO term, its definition and position within the ontology are checked to ensure its suitability.
  • When a new GO term is required or modifications are needed to an existing GO term, the GO editorial office is contacted via SourceForge, or the term is requested using Term Genie. Curators with write-access to the ontologies generally commit the changes themselves.

For the Parkinson’s Disease project, the following approaches are also taken:

  • Curators spend approximately three months on a prioritised topic; in 2014, the topics curated this year include ‘response to oxidative stress’, ‘synaptic transmission’, ‘response to endoplasmic reticulum stress’ and ‘mitophagy’.
  • Where experimental evidence does not exist for a prioritised human protein, annotations are made to orthologous mouse, rat or fly proteins, and the annotations transferred to the relevant human protein, based on sequence similarity. This approach is undertaken following consultations with our scientific advisory committee.
  • To ensure comprehensive annotation of the biological process is achieved, the curator spends two weeks at the end of each time period curating reviews on the topic, capturing annotations using the author statement evidence codes.


Computational annotation strategies:

UCL curators contribute to the improvement of automatic annotation pipelines by communicating any errors we identify to the relevant annotation provider.


Priorities for annotation

  • BHF funded project: Human genes involved in cardiovascular-related processes, as agreed by the International Scientific Advisory Committee. During the past year we have been focusing on the GO annotation of cardiac conduction associated proteins. In addition we have been capturing protein-protein interactions (PPIs) through the submission of PPIs to IntAct. The majority of PPIs submitted include proteins associated with GWAS lipid traits and we have revisited previously annotated papers to capture PPIs that had been submitted to GO.
  • Parkinson's UK funded project: Human genes involved in neurological-related processes, as agreed by the grant co-applicants, our International Scientific Advisory Committee and additional expert scientists.
  • CAFA project: We have assisted the GOA team with this project by curating the primary functions and processes of the proteins that are included on both the CAFA priority list and the UCL-priority lists. This has helped populate these targets with functional annotations to assist in the assessment of the CAFA competition.


Presentations and Publications

a. Papers with substantial GO content

  • “From zebrafish heart jogging genes to mouse and human orthologs: using Gene Ontology to investigate mammalian heart development.” Khodiyar V. K., Howe D., Talmud P., Breckenridge R., Lovering R. C. (2013). F1000Res. 2:242. PMID:24627794.
  • "A method for increasing expressivity of Gene Ontology annotations using a compositional approach." Rachael P Huntley, Midori A Harris, Yasmin Alam-Faruque, Judith A Blake, Seth Carbon, Heiko Dietze, Emily C Dimmer, Rebecca E Foulger, David P Hill, Varsha K Khodiyar, Antonia Lock, Jane Lomax, Ruth C Lovering, Prudence Mutowo-Meullenet, Tony Sawford, Kimberly Van Auken, Valerie Wood and Christopher J Mungall. 2014 BMC Bioinformatics 5(1):155. PMID:24885854.


b. Presentations including Talks and Tutorials and Teaching

  • The UCL GO curators are closely associated with the Cardiovascular Genetics and Molecular Neuroscience groups at UCL, and the UCL-London-School-Edinburgh-Bristol (UCLEB) consortium of population-based prospective studies and have given several presentations at their group meetings:
  1. Paul Denny: "Focusing the Gene Ontology on Parkinson's Disease-relevant Proteins", Abstract: An introduction to the Gene Ontology (GO) and description of our plans for applying GO annotation to proteins relevant to Parkinson's Disease. 40 minute presentation to Molecular Neuroscience group, UCL 24 January 2014 (talk).
  2. Ruth Lovering: "Review of MSc autism annotation project", 30 minute presentation to Cardiovascular Genetics, UCL, 11 February 2014 (talk).
  3. Paul Denny: "Parkinson's Gene Annotation". 20 minute presentation to the Cardiovascular Genetics group, UCL, 24 April 2014 (talk).
  4. Mila Rodriguez: “Cardiovascular Gene Annotation”. 20 minute presentation to the Cardiovascular Genetics group, UCL, 10 June 2014 (talk).
  5. Rebecca Foulger: ‘’Using Gene Ontology to Characterise Key Players in Parkinson’s Disease’. 20 minute presentation to the Cardiovascular Genetics group, UCL, 28 October 2014 (talk).
  6. Mila Rodriguez: “Cardiovascular Gene Annotation”. 20 minute presentation to the Cardiovascular Genetics group, UCL, 25 November 2014 (talk).
  • In addition we have participated in national meetings:
  1. Rebecca Foulger: ‘’Using Gene Ontology to Characterise Key Players in Parkinson’s Disease”. 3 November 2014, Parkinson’s UK Research Conference, York, UK. (talk).
  2. Paul Denny: “Making big data a reality for Parkinson’s”. 3 November 2014, Parkinson’s UK Research Conference, York, UK. (panel discussion).
  • In May 2014 the UCL team ran a 2-day GO annotation workshop at UCL. This workshop was attended by 30 UCL PhD students, post-docs, lecturers and professors, who learnt how to use some of the freely available biological databases, how to analyse high-throughput datasets and also contributed GO annotations to the GOA database. Several of the UCL scientists contributed annotations during the month after the workshop. The workshop was run by Ruth Lovering, and supported by Paul Denny and Rebecca Foulger (tutorial).
  • The UCL team taught a ‘bioinformatics’ module for Genetics of Human Disease MSc students this year. By focusing on the review of a GWAS risk-associated SNP the students constructively apply their newly acquired knowledge of a variety of online biological resources, including Ensembl, EntrezGene, IntAct, Cytoscape, UniProt, QuickGO, AmiGO and functional analysis tools. In addition, the students learn the importance of including full experimental detail in scientific publications.


c. Poster presentations

  • Rebecca Foulger: ‘’Using Gene Ontology to Characterise Key Players in Parkinson’s Disease’. 19 June 2014, UCL Neuroscience Symposium.
  • Paul Denny and Rebecca Foulger: ‘’Using Gene Ontology to Characterise Key Players in Parkinson’s Disease’. 23 July 2014, UK Parkinson’s Disease Consortium Celebration Day.
  • Ruth Lovering: “The Cardiovascular Gene Annotation Initiative: Impact on data analysis”. 25-26 September 2014 British Atherosclerosis Society meeting, Cambridge, UK.
  • Rebecca Foulger: ‘’Using Gene Ontology to Characterise Key Players in Parkinson’s Disease’. 3-4 October 2014, 2nd International Parkinson's Disease Symposium, Luxembourg.
  • Rebecca Foulger and Paul Denny: “Using Gene Ontology to Characterise Key Players in Parkinson’s Disease”. 3-4 November 2014, Parkinson’s UK Research Conference, York, UK.

Other Highlights:

A. Ontology Development Contributions:

This year, the BHF-UCL team has made 26 SourceForge requests and the PARL-UCL (Parkinson's UK) team has made 31 SourceForge requests. Through both the creation of terms via TermGenie, and direct editing of the ontologies, 151 terms were requested/created by PARL-UCL this year. Additional new requests from the BHF-UCL team brings the total number of GO terms requested and/or created by the BHF-UCL team over the course of the project to 1,885.

B. Annotation Outreach and User Advocacy Efforts:

  • Our 2-day GO annotation workshop at UCL, in May, provided 30 UCL researchers (see teaching listed above) with detailed information about GO, the use of GO within functional analysis tools and how to create GO annotations, as well as information about other bioinformatics resources.
  • Ruth Lovering participated in a workshop in September to develop bioinformatics training for clinicians, at the Genome Analysis Centre, Norwich. This workshop set out a roadmap toward developing a bioinformatics training program designed for the needs of the health and medical researchers. With the concept that if clinicians understand bioinformatics a plethora of genetic data becomes available to them.
  • The UCL team undertook a full term of teaching MSc students (see teaching listed above) and informed 23 UCL MSc students about Gene Ontology and other bioinformatic resources.
  • Rebecca Foulger attended a workshop for the Parkinson’s Disease map (http://minerva.uni.lu/MapViewer/?id=pdmap) at the University of Luxembourg on October 3rd, taking part in a focus group on the oxidative stress response. We are continuing to collaborate with PD-map and Reactome in curating proteins relevant to Parkinson’s Disease.

C. Other Highlights:

  • The UCL annotation teams continue to circulate quarterly newsletters for both projects. In the summer 2014 a twitter account was also set up (@UCLGene) for researchers and users to follow project news.
  • Paul Denny joined the group on 1 January 2014 to co-ordinate the Parkinson’s Disease project.
  • Rebecca Foulger joined the group on 1 May 2014 as a Parkinson’s Disease project curator.
  • Nancy Campbell returned from maternity leave on 17 September 2014, and is annotating telomere-associated proteins as part of the cardiovascular project.
  • Rachael Huntley joined the group on 10 November 2014 to annotate micro-RNAs using GO as part of the cardiovascular project.