MGI December 2016
[please include FTEs working on GOC tasks designating as well how many FTEs funding by GOC NIHGRI grant]
Karen R Christie*
Mary E Dolan
Harold J Drabkin*
* Funded entirely or partially by GO
|Total Genes annotated with at least one GO term of any kind||24213||24224||-11*||-0.05|
|Total non-IEA Annotation|
|Total Number of Genes:||24032||23979||53||0.22|
|Annotation by Direct Experiment|
|MGI Curated Mouse Genes||12624||12433||191||1.54|
|MGI Curated Annotations||89907||87159||2748||3.15|
|GOA Curated Mouse Genes:||5424||5075||349||6.88|
|GOA Curated Annotations:||33530||30177||3353||11.11|
|Annotation by Orthology|
|Total Genes Annotated by Orthology||12067||11866||201||1.69|
|Total Orthology Annotation||106607||102212||4395||4.30|
|Genes Annotated by Human Orthology Load (GOA)||10942||10701||241||2.25|
|Total Annotation by Human Orthology Load||71680||68129||3551||5.21|
|Genes Annotated by Rat Orthology Load (RGD)||4849||4696||153||3.26|
|Total Annotations by Rat Orthology Load||31405||30769||636||2.07|
|Genes Annotated by Phylogeny||8153||6400||1753||27.39|
|Total Annotations by Phylogeny||29434||22522||6912||30.69|
|Total Genes with IEA Annotations||14815||14724||91||0.62|
|Total IEA Annotations||82481||100509||-18028||-17.94|
|Total Genes with SwissProt to GO Annotations||14440||14337||103||0.72|
|Total SwissProt to GO Annotations||57420||56888||532||0.94|
|Total Genes with Interpro to GO Annotations||10d103||9966||137||1.37|
|Total Interpro to GO Annotations||24074||24408||-334||-1.37|
|Total Genes with EC to GO Annotations*||817||1709||-892||-52.19|
|Total EC to GO Annotations *||987||19213||-18226||-94.96|
* Loss due to EC2GO refactoring (no annotations to EC root terms).
Methods and strategies for annotation
Literature curation continues to be the major focus of our annotation efforts.
Computational annotation strategies:
As always, current strategies involve use of translation table to mine SwissProt keywords, InterPro domains, and EC numbers for IEA annotation. These are performed automatically on a nightly basis and require little human intervention.
Harold Drabkin monitors weekly QC reports on manual and automatic annotation stats, and responds to questions about specific annotations as required.
Priorities for annotation
- Isoform curation (Harold, Karen, Protein Ontology project); focusing on genes that have isoforms or whose products are modified, and co-ordinate with the Protein Ontology Protein Complex project.
- Genes with no GO annotation but with literature (Li and Dmitry)
- Genes with only IEA annotation but with literature (Li)
- Genes marked as having GO annotation completed, but now having new literature (Dmitry)
- Genes that have an annotation to one of the three root nodes of GO, but have new literature (David)
- Dmitry has been focused on annotation or miRNAs in MGI
- Annotation of ciliary genes (Karen)
- Annotation of metabolic genes, glycolysis,pyruvate metabolism, and carbohydrate catabolism in general (David)
- Autophagy genes
Presentations and Publications
a. Papers with substantial GO content
- High-throughput discovery of novel developmental phenotypes. Dickinson ME, Flenniken AM, Ji X, Teboul L, Wong MD, White JK, Meehan TF, Weninger WJ, Westerberg H, Adissu H, Baker CN, Bower L, Brown JM, Caddle LB, Chiani F, Clary D, Cleak J, Daly MJ, Denegre JM, Doe B, Dolan ME, Edie SM, Fuchs H, Gailus-Durner V, Galli A, Gambadoro A, Gallegos J, Guo S, Horner NR, Hsu CW, Johnson SJ, Kalaga S, Keith LC, Lanoue L, Lawson TN, Lek M, Mark M, Marschall S, Mason J, McElwee ML, Newbigging S, Nutter LM, Peterson KA, Ramirez-Solis R, Rowland DJ, Ryder E, Samocha KE, Seavitt JR, Selloum M, Szoke-Kovacs Z, Tamura M, Trainor AG, Tudose I, Wakana S, Warren J, Wendling O, West DB, Wong L, Yoshiki A; International Mouse Phenotyping Consortium; Jackson Laboratory; Infrastructure Nationale PHENOMIN, Institut Clinique de la Souris (ICS); Charles River Laboratories; MRC Harwell; Toronto Centre for Phenogenomics; Wellcome Trust Sanger Institute; RIKEN BioResource Center, MacArthur DG, Tocchini-Valentini GP, Gao X, Flicek P, Bradley A, Skarnes WC, Justice MJ, Parkinson HE, Moore M, Wells S, Braun RE, Svenson KL, de Angelis MH, Herault Y, Mohun T, Mallon AM, Henkelman RM, Brown SD, Adams DJ, Lloyd KC, McKerlie C, Beaudet AL, Bucan M, Murray SA. Nature. 2016 Sep 14;537(7621):508-514. doi: 10.1038/nature19356.
- Hill DP, D'Eustachio P, Berardini TZ, Mungall CJ, Renedo N, Blake JA. Modeling biochemical pathways in the gene ontology. Database (Oxford). 2016 Sep 1;2016. pii: baw126. doi: 10.1093/database/baw126. PMID: 27589964
c. Poster presentations
- Judith A. Blake. Genes, Orthologs, and Human Diseases: How Model Organism Databases and the Gene Ontology Empower Knowledge Discovery. The Allied Genetics Conference (TAGC) 2016, Orlando
- Karen R. Christie. Phylogenetically based Gene Ontology (GO) Annotations using the Phylogenetic Annotation and INference Tool (PAINT.The Allied Genetics Conference (TAGC) 2016, Orlando
- Harold J. Drabkin Functional annotation of proteoforms in the Mouse Genome Database using the Protein Ontology. The Allied Genetics Conference (TAGC) 2016, Orlando
- Li Ni. ‘What Does This Gene Do’: Data presentation in Mouse Genome Informatics for the scientific community including neuroscientists. Neuroscience 2016, San Diego, CA
A. Ontology Development Contributions:
- David Hill co-led the ontology development group with Melanie Courtot, co-organizing the weekly ontology development calls. Addressed GH requests for ontology terms and improvement, and revamped autophagy part of the ontology with Ruth¹s group, Marc Feuermann and Paola Roncaglia.
- Harold Drabkin is adding new molecular function terms to aid in mapping Metacyc identifiers to GO terms.
B. Annotation Outreach and User Advocacy Efforts:
- The Protein Ontology project continues to provide a web interface (http/pir.georgetown.edu/cgi-bin/pro/race_pro) whereby functional annotation using the GO can be applied to PRO submissions.
- Harold Drabkin continues to serve on the GO-help rota.
- David Hill is now co-managing the annotation group with Kimberly Van Auken (WormBase).
C. Other Highlights:
- Karen Christie serves as the MGI representative on the PAINT curation team. As a member of this team, Karen curates Panther families in PAINT to propagate annotations based on evolutionary relationships. She also files bug reports on PAINT and contribute to the improvement of the PAINT software.
- Mary Dolan: Data analysis and visualization for the MGI GO group. QC for GO-related projects: PRO to MGI mapping, mouse reference proteome, production of GPI and GPA files.*
D. Noctua annotation tool.
- David Hill tested the new Noctua tool for LEGO modeling in GO, including development and design of standards for modeling and began creating production LEGO models. He is now training new annotators in the use of Noctua at 4 international workshops
- David worked with GOC software engineers to generate annotation files that can be loaded into model organism databases and with MGI software engineers to load Noctua-derived annotations into MGI. This included standardization of identifiers that are used as annotation objects in Noctua. He also worked on the standardization of GPAD and GPI files as the new exchange format for traditional GO annotations. MGI is now the first database using Noctua in a production environment.