UCL December 2015

From GO Wiki
Jump to: navigation, search


UCL functional annotation December, 2015

Overview

The aim of the University College London (UCL)-based functional annotation team is to provide GO annotation to human cardiovascular or Parkinson's disease relevant genes, as well as to submit protein-protein interaction data to IntAct. These projects are funded by the British Heart Foundation (BHF) and Parkinson’s UK (with the attribution BHF-UCL and Parkinsons UK-UCL, respectively). We have a successful collaboration between several UCL-based research groups, the European Bioinformatics Institute (EBI) and King's College, London. The annotations created by the UCL-based curators are made directly into the GOA database or the IntAct database at the EBI. 4000 human genes have been prioritised due to their association with cardiovascular processes. The priority gene list relevant to Parkinson's disease (PD) is currently at 214 and being actively developed. Annotation priorities are agreed on a regular basis in consultation with the Co-Grant holders, our International Scientific Advisory Committees and the UCL-based GO curators. The UCL-annotation team has been a GOC member since July 2008.

Staff

  • Dr Ruth Lovering, 1 FTE – UCL-based project manager and PI, UCL funding to July 2018
  • Dr Nancy Campbell, 1 FTE – UCL-based BHF IntAct and GO curator, BHF grant to July 2018
  • Dr Paul Denny 0.5 FTE – UCL-based Parkinson's curator and project co-ordinator, Parkinson's UK grant to December 2016
  • Dr Rebecca Foulger 0.8 FTE – UCL-based Parkinson's curator, Parkinson's UK grant to November 2016
  • Dr Rachael Huntley 1 FTE – UCL-based BHF curator, BHF grant to July 2018
  • Tony Sawford, 0.35 FTE – EBI-based Software engineer, BHF grant to July 2018, Parkinson's UK grant to December 2016

No funding via GOC NIHGRI grant

Annotation Progress

To 7 November 2015, across all species BHF-UCL have annotated 4,864 proteins with 34,941 GO terms (including 24,129 terms manually associated with 2,700 human proteins), and Parkinson’s UK-UCL have annotated 1118 proteins with 4,675 GO terms (including 3,027 terms manually associated with 697 human proteins). In addition, we are actively annotating microRNAs and protein complexes and have associated 740 GO terms with 110 RNAs and 104 GO terms with 31 protein complexes.

These figures are an under-estimate of the true number of annotations that have been created due to our inclusion of complex annotation extension data in the majority of annotation rows. This is particularly true for the annotation of microRNAs where multiple annotation statements are included to describe the gene regulation targets of a single microRNA Consequently, while we have associated 647 GO annotations to 88 microRNAs, unrolling the annotation extensions identifies 930 unique annotations.

Methods and strategies for annotation

Literature curation (100%):

We annotate with both a gene product-centric and process-centric focus.

  • The process-centric annotation enables the curators to gain a better understanding of the process under focus.
  • The gene product-centric annotation approach is undertaken when annotating proteins on a specific cardiovascular or Parkinson’s-relevant list, such as loci identified by Genome-Wide Association Studies (GWAS). In addition, we annotate gene products following requests or suggestions from scientists working on cardiovascular disease or Parkinson’s Disease, or when checking annotations by attendees of our MSc module or 2-day annotation workshops.

Both the cardiovascular and Parkinson’s projects focus on the annotation of human gene products, and the following approaches are taken:

  • The approved gene symbol (and relevant gene and protein aliases) are used to query a variety of biomedical search engines, (including PubMed, and iHOP), and databases (e.g. UniProtKB, IntAct, RNAcentral) to identify suitable papers for the GO annotation of each target gene product (with highly researched genes the search is usually limited to human entries only). Papers are also identified based on communications with researchers in the field and papers highlighted at relevant conferences and workshops.
  • The curators will usually associate GO terms with all of the human gene products mentioned in each paper, depending on the experimental evidence available (occasionally GO terms are associated with non-human proteins too).
  • To ensure a rapid improvement in the annotations available for a large number of human proteins the curators aim to spend a maximum of one day researching the literature associated with each protein.
  • The protein is marked as ‘complete’ if the curator feels that a comprehensive annotation of the protein has been achieved.
  • Preference is given to the use of experimental-based evidence codes, however these are only used when the curator is completely confident of the identity of the gene product and its derivative species.
  • Experimental data relating to model organism proteins may be included in the annotation of human gene product, through the direct annotation of the model organism gene product and the use of the ‘inferred by sequence similarity’ evidence code to transfer the information to the orthologous human gene products. This is only undertaken when there is clear evidence for orthology but this orthology is not identified by the Ensembl Compara annotation pipeline.
  • When experimentally supported literature is unobtainable, due to insufficient information in the paper about the species the gene product is derived from, the lack of access to a referenced paper, or simply because the knowledge is considered so well accepted that references are not supplied, author statements are applied.
  • Reviews are also used to provide an overview of the characteristics of a gene product and an insight into the complete set of GO terms required, when the volume of literature is too extensive to interrogate within the time available.
  • We aim to capture the knowledge about each gene product using a limited number of papers, with experimental evidence; we do not annotate all relevant papers, if this will lead to repeated duplication of GO terms associated to the gene product.
  • GO terms are chosen by querying the GO files with QuickGO, AmiGO or OBO-Edit.
  • Before assigning a GO term, its definition and position within the ontology are checked to ensure its suitability.
  • When a new GO term is required or modifications are needed to an existing GO term, the GO editorial office is contacted via GitHub, or the term is requested using Term Genie. Curators with write-access to the ontologies generally commit the changes themselves.
  • We also use the annotation extension field to provide the most specific and descriptive information in each GO annotation that we can to represent the experimental data available.
  • To ensure comprehensive annotation of the proteins involved in a specific biological process is achieved, the curator spends two weeks at the end of each time period curating reviews on the topic, capturing annotations using the author statement evidence codes.
Computational annotation strategies:

UCL curators contribute to the improvement of automatic annotation pipelines by communicating any errors we identify to the relevant annotation provider. Priorities for annotation:

  • BHF funded project: Human gene products involved in cardiovascular-related processes, as agreed by the International Scientific Advisory Committee. During the past year we have been focusing on the GO annotation of cardiac conduction associated proteins, folic acid metabolism, hereditary hematochromatosis as well as telomere associated proteins. We have also prioritised annotation of cardiovascular-relevant miRNAs that are in clinical trial. In addition we have been capturing protein-protein interactions (PPIs) through the submission of PPIs to IntAct. We have developed a pipeline to facilitate the creation of a PSICQUIC dataset containing all protein interaction data available from the GOC website. Although this data is not IMEx standard it is higher quality than text mined data and is now readily available for cytoscape users.
  • Parkinson's UK funded project: Human genes involved in neurological-related processes, as agreed by the grant co-applicants, our International Scientific Advisory Committee and additional expert scientists. Curators spend approximately three months on a prioritised topic; in 2015, the topics curated include ‘WNT signaling’, ‘ERAD (endoplasmic reticulum-associated protein degradation) pathway’, ‘unfolded protein response’, ‘autophagy’ and high-throughput imaging data generated by Parkinson’s funded projects.

Presentations and Publications

a. Papers with substantial GO content

  1. Manzoni, C., Denny, P., Lovering, R. C., & Lewis, P. A. (2015). Computational analysis of the LRRK2 interactome. PeerJ, 2015 (2). doi:10.7717/peerj.778
  2. Kalea AZ, Hoteit R, Suvan J, Lovering RC, Palmen J, Cooper JA, Khodiyar VK, Harrington Z, Humphries SE, D'Aiuto F. (2015). Upregulation of gingival tissue miR-200b in obese periodontitis subjects. Journal of Dental Research, 94, 59S-69S. doi:10.1177/0022034514568197
  3. Patel, S., Roncaglia, P., & Lovering, R. C. (2015). Using Gene Ontology to describe the role of the neurexin-neuroligin-SHANK complex in human, mouse and rat and its relevance to autism. BMC BIOINFORMATICS, 16, UNSP 186. doi:10.1186/s12859-015-0622-0
  4. Blake JA, Christie KR, Dolan ME, Drabkin HJ, Hill DP, Ni L, Sitnikov D, Burgess S, Buza T, Gresham C, McCarthy F, Pillai L, Wang H, Carbon S, Dietze H, Lewis SE, Mungall CJ, Munoz-Torres MC, Feuermann M, Gaudet P, Basu S, Chisholm RL, Dodson RJ, Fey P, Mi H, Thomas PD, Muruganujan A, Poudel S, Hu JC, Aleksander SA, McIntosh BK, Renfro DP, Siegele DA, Attrill H, Brown NH, Tweedie S, Lomax J, Osumi-Sutherland D, Parkinson H, Roncaglia P, Lovering RC, Talmud PJ, Humphries SE, Denny P, Campbell NH, Foulger RE, Chibucos MC, Gwinn Giglio M, Chang HY, Finn R, Fraser M, Mitchell A, Nuka G, Pesseat S, Sangrador A, Scheremetjew M, Young SY, Stephan R, Harris MA, Oliver SG, Rutherford K, Wood V, Bahler J, Lock A, Kersey PJ, McDowall MD, Staines DM, Dwinell M, Shimoyama M, Laulederkind S, Hayman GT, Wang SJ, Petri V, D'Eustachio P, Matthews L, Balakrishnan R, Binkley G, Cherry JM, Costanzo MC, Demeter J, Dwight SS, Engel SR, Hitz BC, Inglis DO, Lloyd P, Miyasato SR, Paskov K, Roe G, Simison M, Nash RS, Skrzypek MS, Weng S, Wong ED, Berardini TZ, Li D, Huala E, Argasinska J, Arighi C, Auchincloss A, Axelsen K, Argoud-Puy G, Bateman A, Bely B, Blatter MC, Bonilla C, Bougueleret L, Boutet E, Breuza L, Bridge A, Britto R, Casals C, Cibrian-Uhalte E, Coudert E, Cusin I, Duek-Roggli P, Estreicher A, Famiglietti L, Gane P, Garmiri P, Gos A, Gruaz-Gumowski N, Hatton-Ellis E, Hinz U, Hulo C, Huntley R, Jungo F, Keller G, Laiho K, Lemercier P, Lieberherr D, MacDougall A, Magrane M, Martin M, Masson P, Mutowo P, O'Donovan C, Pedruzzi I, Pichler K, Poggioli D, Poux S, Rivoire C, Roechert B, Sawford T, Schneider M, Shypitsyna A, Stutz A, Sundaram S, Tognolli M, Wu C, Xenarios I, Chan J, Kishore R, Sternberg PW, Van Auken K, Muller HM, Done J, Li Y, Howe D, Westerfield M. (2015). Gene Ontology Consortium: Going forward. Nucleic Acids Research, 43 (D1), D1049-D1056. doi:10.1093/nar/gku1179.

b. Presentations including Talks and Tutorials and Teaching

  • Talks at international meetings:
  • International Parkinson's Disease Genomics Consortium Meeting. UCL, London, 11th-12th September 2015. Paul gave a 20 min presentation entitled: Parkinson’s Disease and Gene Ontology Annotation.
  • 10th International Meeting on MicroRNAs, non-coding RNAs and Genome Editing. Peterhouse College, University of Cambridge, 2-3 November 2015 Rachael gave a 20 minute presentation entitled MicroRNA functional annotation.
  • Talks at UK meetings:
  • RNAcentral Consortium and Scientific Advisory Board Meeting. European Bioinformatics Institute (EMBL-EBI), Hinxton, 30 November-1 December 2015Rachael gave a 15 minute presentation entitled MicroRNA functional annotation.
  • 1 week teaching GO annotation to VU curators.
  • The UCL GO curators are closely associated with the Cardiovascular Genetics and Molecular Neuroscience groups at UCL, and the UCL-London-School-Edinburgh-Bristol (UCLEB) consortium of population-based prospective studies and have given several presentations at their group meetings.
  • In April 2015 the UCL team ran a 2-day GO annotation workshop at UCL. This workshop was attended by 30 UCL PhD students, post-docs, lecturers and professors, who learnt how to use some of the freely available biological databases, how to analyse high-throughput datasets and also contributed GO annotations to the GOA database. Several of the UCL scientists contributed annotations during the month after the workshop.
  • The UCL team taught a ‘bioinformatics’ module for Genetics of Human Disease MSc students this year. By focusing on the review of a GWAS risk-associated SNP the students constructively apply their newly acquired knowledge of a variety of online biological resources, including Ensembl, NCBIGene, IntAct, Cytoscape, UniProt, QuickGO, AmiGO and functional analysis tools. In addition, the students learn the importance of including full experimental detail in scientific publications.
  • We trained 2 Genetics of Human Disease MSc students to annotate and they each submitted around 300 annotations and received distinctions for their project and MScs.

c. Poster presentations

  • 83rd European Atherosclerosis Society Congress, Glasgow, 22nd-25th March 2015. Poster presented by Ruth (during moderated poster session) entitled: The Cardiovascular Gene Annotation Initiative: Current and Future Aims.
  • First UK Autophagy Network Meeting. Warwick, 6th-7th May 2015. Poster presented by Paul entitled: Autophagy to GO.
  • UCL Neuroscience Symposium. UCL, 19th June 2015. Poster presented by Paul entitled: Using Gene Ontology to characterise key participants in Parkinson's disease.

Other Highlights:

A. Ontology Development Contributions:

  • Through both the creation of terms via TermGenie, and direct editing of the ontologies by the GO editors, 390 terms were requested/created by the UCL team this year.
  • Rachael has spent considerable time working on the microRNA GO annotation guidelines, with a paper submitted to RNA recently.
  • Ruth and Rachael have been working on the documentation for the annotation extension relations and identifying relations that need deprecation, annotations that need revisions and improving the guidelines for the application of annotation extension information.

B. Annotation Outreach and User Advocacy Efforts:

  • Our 2-day GO annotation workshop at UCL, in May, provided 30 UCL researchers (see teaching listed above) with detailed information about GO, the use of GO within functional analysis tools and how to create GO annotations, as well as information about other bioinformatics resources.
  • The UCL team undertook a full term of teaching MSc students (see teaching listed above) and informed 25 UCL MSc students about Gene Ontology and other bioinformatic resources.

C. Other Highlights:

  • The UCL annotation teams continue to circulate quarterly newsletters for both projects and create regular tweets about our project news @UCLGene.