MGI December 2016: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
(Created page with "Category:MGI Overview: = Staff: = [please include FTEs working on GOC tasks designating as well how many FTEs funding by GOC NIHGRI grant] Judith Blake* Karen R Christ...")
 
 
(10 intermediate revisions by the same user not shown)
Line 10: Line 10:
Karen R Christie*
Karen R Christie*


Mary E Dolan*
Mary E Dolan


Harold J Drabkin*
Harold J Drabkin*
Line 16: Line 16:
David Hill*
David Hill*


Li Ni*
Li Ni


Dmitry Sitnikov
Dmitry Sitnikov
Line 24: Line 24:
= Annotation Progress =
= Annotation Progress =


{| style="border-spacing:0;"
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| <center>'''Annotation Type '''</center>
| style="border:0.0069in solid #00000a;padding:0.0694in;"| <center>'''Dec 5 2016 '''</center>
| style="border:0.0069in solid #00000a;padding:0.0694in;"| <center>'''Dec 5, 2015 '''</center>
| style="border:0.0069in solid #00000a;padding:0.0694in;"| <center>'''Change '''</center>
| style="border:0.0069in solid #00000a;padding:0.0694in;"| <center>'''% change '''</center>


{| class="wikitable" cellpadding="5"
!Annotation Type !! Dec 5 2014  !! Dec 5, 2015 !! Change !!% change
|-
|-
|Total Genes annotated with at least one GO term of any kind || 24226  ||24224 ||-2*||0.01
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Genes annotated with at least one GO term of any kind  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 24213
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 24224  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -11*
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -0.05
 
|-
|-
| Total Annotations: || 341687 || 362727 || 21040 || 6.2
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Annotations:  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 360758
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 362727  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -1969
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -0.54
 
|-
|-
| colspan=5 align=center bgcolor=white| '''Total non-IEA Annotation'''  
| colspan="6"  style="border:0.0069in solid #00000a;padding:0.0694in;"| '''Total non-IEA Annotation '''
 
|-
|-
|  Total Number of Genes: || 23844 || 23979 ||135  || 0.5
| colspan="2" style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Number of Genes:  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 24032
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 23979  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 53
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 0.22
 
|-
|-
|Total Annotations: ||242025 ||262218 || 20193||7.7 ||
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Annotations:  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 278277
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 262218  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 16059
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 6.12
 
|-
|-
|colspan=5 align=center bgcolor=white| '''Annotation by Direct Experiment'''
| colspan="6"  style="border:0.0069in solid #00000a;padding:0.0694in;"| '''Annotation by Direct Experiment '''
 
|-
|-
|MGI Curated Mouse Genes ||12170 ||12433 ||263||2.2 ||
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| MGI Curated Mouse Genes  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 12624
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 12433  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 191
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 1.54
 
|-
|-
|MGI Curated Annotations ||82573 ||87159 ||4586 || 5.6||
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| MGI Curated Annotations  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 89907
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 87159  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 2748
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 3.15
 
|-
|-
|GOA Curated Mouse Genes: ||4565 ||5075 ||510 ||11.2 ||
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| GOA Curated Mouse Genes:  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 5424
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 5075  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 349
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 6.88
 
|-
|-
|GOA Curated Annotations: ||26002 ||30177 ||4175 ||16.1 ||
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| GOA Curated Annotations:  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 33530
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 30177  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 3353
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 11.11
 
|-
|-
|colspan=5 align=center bgcolor=white| '''Annotation by Orthology'''
| colspan="6"  style="border:0.0069in solid #00000a;padding:0.0694in;"| '''Annotation by Orthology '''
 
|-
|-
|Total Genes Annotated by Orthology ||11435
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Genes Annotated by Orthology  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 12067
||11866 ||431 ||3.8 ||
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 11866  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 201
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 1.69
 
|-
|-
|Total Orthology Annotation ||92787
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Orthology Annotation  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 106607
||102212 ||9425 ||10.2 ||
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 102212  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 4395
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 4.30
 
|-
|-
|Genes Annotated by Human Orthology Load (GOA) ||10207 ||10701 || 494 ||4.8 ||
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Genes Annotated by Human Orthology Load (GOA)  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 10942
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 10701  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 241
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 2.25
 
|-
|-
|Total Annotation by Human Orthology Load ||61355 ||68129 || 6774|| 11.0||
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Annotation by Human Orthology Load  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 71680
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 68129  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 3551
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 5.21
 
|-
|-
|Genes Annotated by Rat Orthology Load (RGD) ||4415 ||4696 ||281 ||6.4 ||
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Genes Annotated by Rat Orthology Load (RGD)  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 4849
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 4696  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 153
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 3.26
 
|-
|-
|Total Annotations by Rat Orthology Load || 27006||30769 || 3763|| 13.9||
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Annotations by Rat Orthology Load  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 31405
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 30769  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 636
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 2.07
 
|-
|-
|colspan=5 align=center bgcolor=white| '''IEA Annotation'''
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Genes Annotated by Phylogeny
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 8153
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 6400
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 1753
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 27.39
 
|-
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Annotations by Phylogeny
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 29434
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 22522
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 6912
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 30.69
 
|-
| colspan="6"  style="border:0.0069in solid #00000a;padding:0.0694in;"| '''IEA Annotation '''
 
|-
|-
|Total Genes with IEA Annotations ||14602 ||14724 || 122|| 0.8 ||
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Genes with IEA Annotations  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 14815
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 14724  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 91
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 0.62
 
|-
|-
|Total IEA Annotations ||99662 ||100509 || 847||0.8 ||  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total IEA Annotations  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 82481
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 100509  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -18028
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -17.94
 
|-
|-
|Total Genes with SwissProt to GO Annotations ||14211
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Genes with SwissProt to GO Annotations  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 14440
||14337 ||126||0.9 ||
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 14337  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 103
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 0.72
 
|-
|-
|Total SwissProt to GO Annotations ||55788 || 56888 || 1100||2.0 ||
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total SwissProt to GO Annotations  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 57420
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 56888  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 532
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 0.94
 
|-
|-
|Total Genes with Interpro to GO Annotations ||10054
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Genes with Interpro to GO Annotations  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 10d103
||9966 ||-88 || -0.9||  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 9966  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 137
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 1.37
 
|-
|-
|Total Interpro to GO Annotations ||25511
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Interpro to GO Annotations  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 24074
||24408 ||-1103 ||-4.3||
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 24408  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -334
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -1.37
 
|-
|-
|Total Genes with EC to GO Annotations ||1734 ||1709 ||25 || 1.5||
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Genes with EC to GO Annotations*
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 817
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 1709  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -892
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -52.19
 
|-
|-
|Total EC to GO Annotations ||18363 ||19213 ||850 ||4.6 ||
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total EC to GO Annotations *
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 987
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 19213  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -18226
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -94.96


|}
|}
 
'''<nowiki>* Loss due to EC2GO refactoring (no annotations to EC root terms).</nowiki>'''
* Loses due to gene merges, withdrawals, or marker type changes (gene to pseudogene).


= Methods and strategies for annotation =
= Methods and strategies for annotation =
Line 119: Line 239:
=  Presentations and Publications =
=  Presentations and Publications =
a.  Papers with substantial GO content
a.  Papers with substantial GO content
*High-throughput discovery of novel developmental phenotypes. Dickinson ME, Flenniken AM, Ji X, Teboul L, Wong MD, White JK, Meehan TF, Weninger WJ, Westerberg H, Adissu H, Baker CN, Bower L, Brown JM, Caddle LB, Chiani F, Clary D, Cleak J, Daly MJ, Denegre JM, Doe B, Dolan ME, Edie SM, Fuchs H, Gailus-Durner V, Galli A, Gambadoro A, Gallegos J, Guo S, Horner NR, Hsu CW, Johnson SJ, Kalaga S, Keith LC, Lanoue L, Lawson TN, Lek M, Mark M, Marschall S, Mason J, McElwee ML, Newbigging S, Nutter LM, Peterson KA, Ramirez-Solis R, Rowland DJ, Ryder E, Samocha KE, Seavitt JR, Selloum M, Szoke-Kovacs Z, Tamura M, Trainor AG, Tudose I, Wakana S, Warren J, Wendling O, West DB, Wong L, Yoshiki A; International Mouse Phenotyping Consortium; Jackson Laboratory; Infrastructure Nationale PHENOMIN, Institut Clinique de la Souris (ICS); Charles River Laboratories; MRC Harwell; Toronto Centre for Phenogenomics; Wellcome Trust Sanger Institute; RIKEN BioResource Center, MacArthur DG, Tocchini-Valentini GP, Gao X, Flicek P, Bradley A, Skarnes WC, Justice MJ, Parkinson HE, Moore M, Wells S, Braun RE, Svenson KL, de Angelis MH, Herault Y, Mohun T, Mallon AM, Henkelman RM, Brown SD, Adams DJ, Lloyd KC, McKerlie C, Beaudet AL, Bucan M, Murray SA. Nature. 2016 Sep 14;537(7621):508-514. doi: 10.1038/nature19356.


* Drabkin HJ, Christie KR, Dolan ME, Hill DP, Ni L, Sitnikov D, Blake JA. Application of comparative biology in GO functional annotation: the mouse model. Mamm Genome. 2015 26, Issue 9, pp 574-58,  PubMed PMID: 26141960.
* Hill DP, D'Eustachio P, Berardini TZ, Mungall CJ, Renedo N, Blake JA. Modeling biochemical pathways in the gene ontology. Database (Oxford). 2016 Sep 1;2016. pii: baw126. doi: 10.1093/database/baw126. PMID: 27589964
* Dolan ME, Baldarelli RM, Bello SM, Ni L, McAndrews MS, Bult CJ, Kadin JA, Richardson JE, Ringwald M, Eppig JT, Blake JA. Orthology for comparative genomics in the mouse genome database. Mamm Genome. 2015 Aug;26(7-8):305-13. doi: 10.1007/s00335-015-9588-5. Epub 2015 Jul 30. PubMed PMID:26223881; PubMed Central PMCID: PMC4534493
* Gene Ontology Consortium. Gene Ontology Consortium: going forward. Nucleic Acids Res. 2015 Jan;43(Database issue):D1049-56. doi: 10.1093/nar/gku1179. Epub 2014 Nov 26. PubMed PMID: 25428369; PubMed Central PMCID: PMC4383973.
b.  Presentations including Talks and Tutorials and Teaching
 
*to be filled in


c. Poster presentations
c. Poster presentations


* David Hill, Peter D' Eustachio, Nikolai Renedo, Judith Blake. Linking GO Pathways to the Bigger Picture. July 2015. ISMB/ECCB, Dublin Ireland (audience: Computational Biologists)
* Judith A. Blake. Genes, Orthologs, and Human Diseases: How Model Organism Databases and the Gene Ontology Empower Knowledge Discovery. The Allied Genetics Conference (TAGC) 2016, Orlando
* Karen Christie and Judith Blake. Comprehensive Gene Ontology annotation of ciliary genes in the laboratory mouse. July 2015. FASEB Biology of Cilia & Flagella, Snowmass, CO (audience: Cilia researchers)
*Karen R. Christie. Phylogenetically based Gene Ontology (GO) Annotations using the Phylogenetic Annotation and INference Tool (PAINT.The Allied Genetics Conference (TAGC) 2016, Orlando
* Harold J. Drabkin Functional annotation of proteoforms in the Mouse Genome Database using the Protein Ontology. The Allied Genetics Conference (TAGC) 2016, Orlando
* Li Ni. ‘What Does This Gene Do’: Data presentation in Mouse Genome Informatics for the scientific community including neuroscientists. Neuroscience 2016, San Diego, CA


= Other Highlights: =
= Other Highlights: =
Line 136: Line 254:
A. Ontology Development Contributions:
A. Ontology Development Contributions:


* David Hill continues working with the ontology development group to implement logical definitions for GO terms.
* David Hill co-led the ontology development group with Melanie Courtot, co-organizing the weekly ontology development calls.  Addressed GH requests for ontology terms and improvement, and revamped autophagy part of the ontology with Ruth¹s group, Marc Feuermann and Paola Roncaglia.
* David Hill is in the rotation to address GO-ontology GITHUB items.
* Harold Drabkin is adding new molecular function terms to aid in mapping Metacyc identifiers to GO terms.
* David Hill has completed an overhaul of carbohydrate catabolic processes to the TCA cycle.
* David Hill has been working with a focus group on representing autophagy in the ontology.




Line 145: Line 261:
* The Protein Ontology project continues to provide a web interface (http/pir.georgetown.edu/cgi-bin/pro/race_pro) whereby functional annotation using the GO can be applied to PRO submissions.
* The Protein Ontology project continues to provide a web interface (http/pir.georgetown.edu/cgi-bin/pro/race_pro) whereby functional annotation using the GO can be applied to PRO submissions.
* Harold Drabkin continues to serve on the GO-help rota.
* Harold Drabkin continues to serve on the GO-help rota.
* Judith Blake and Karen Christie are working with Astrid Laegrid and Martin Kuiper of the Norwegian University of Science and Technology in Trondheim to determine how to incorporate GO annotations for mammalian (human, mouse, and rat) transcription factors, and their target genes, made by this group.
* Karen Christie reviewed and finalized annotations for human ciliary genes made using UniProt's Protein2GO tool by Lora Nacheva, who was a student in Tony Gibson's group at the EMBL in Germany.
* David Hill is now co-managing the annotation group with Kimberly Van Auken (WormBase).
* David Hill is now co-managing the annotation group with Kimberly Van Auken (WormBase).


Line 152: Line 266:
C.  Other Highlights:
C.  Other Highlights:
    
    
<span style="color:#FF0000">* As the designated coordinator of the MGI/GO project with the GO Reference Genome project, Li Ni participates in annotations of genes assigned by the Reference Genome Project, oversees the curation of Reference Genome Genes for the mouse group. Li responds and resolves questions about MGI GO annotations for the reference genome annotation project genes, and especially responds and resolves questions from the lead PAINT curator (see Reference Genome Project report for a description of PAINT). Li is also part of the PAINT curation team.</span>
* Karen Christie serves as the MGI representative on the PAINT curation team. As a member of this team, Karen curates Panther families in PAINT to propagate annotations based on evolutionary relationships. She also files bug reports on PAINT and contribute to the improvement of the PAINT software.  


* Karen Christie serves as the MGI representative on the PAINT curation team. As a member of this team, Karen curates Panther families in PAINT to propagate annotations based on evolutionary relationships. She also files bug reports on PAINT and contribute to the improvement of the PAINT software.  
* Mary Dolan: Data analysis and visualization for the MGI GO group. QC for GO-related projects: PRO to MGI mapping, mouse reference proteome, production of GPI and GPA files.<nowiki>*</nowiki>


* Mary Dolan works with other members (particularly Seth Carbon) of the GO software team to develop automated unit tests for AmiGO2 in order to facilitate releases.
D. Noctua annotation tool.
*David Hill tested the new Noctua tool for LEGO modeling in GO, including development and design of standards for modeling and began creating production LEGO models. He is now training  new annotators in the use of Noctua at 4 international workshops
* David worked with GOC software engineers to generate annotation files that can be loaded into model organism databases and with MGI software engineers to load Noctua-derived annotations into MGI. This included standardization of identifiers that are used as annotation objects in Noctua.  He also worked on the standardization of GPAD and GPI files as the new exchange format for traditional GO annotations. MGI is now the first database using Noctua in a production environment.

Latest revision as of 14:07, 14 December 2016

Overview:

Staff:

[please include FTEs working on GOC tasks designating as well how many FTEs funding by GOC NIHGRI grant]

Judith Blake*

Karen R Christie*

Mary E Dolan

Harold J Drabkin*

David Hill*

Li Ni

Dmitry Sitnikov

* Funded entirely or partially by GO

Annotation Progress

Annotation Type
Dec 5 2016
Dec 5, 2015
Change
% change
Total Genes annotated with at least one GO term of any kind 24213 24224 -11* -0.05
Total Annotations: 360758 362727 -1969 -0.54
Total non-IEA Annotation
Total Number of Genes: 24032 23979 53 0.22
Total Annotations: 278277 262218 16059 6.12
Annotation by Direct Experiment
MGI Curated Mouse Genes 12624 12433 191 1.54
MGI Curated Annotations 89907 87159 2748 3.15
GOA Curated Mouse Genes: 5424 5075 349 6.88
GOA Curated Annotations: 33530 30177 3353 11.11
Annotation by Orthology
Total Genes Annotated by Orthology 12067 11866 201 1.69
Total Orthology Annotation 106607 102212 4395 4.30
Genes Annotated by Human Orthology Load (GOA) 10942 10701 241 2.25
Total Annotation by Human Orthology Load 71680 68129 3551 5.21
Genes Annotated by Rat Orthology Load (RGD) 4849 4696 153 3.26
Total Annotations by Rat Orthology Load 31405 30769 636 2.07
Genes Annotated by Phylogeny 8153 6400 1753 27.39
Total Annotations by Phylogeny 29434 22522 6912 30.69
IEA Annotation
Total Genes with IEA Annotations 14815 14724 91 0.62
Total IEA Annotations 82481 100509 -18028 -17.94
Total Genes with SwissProt to GO Annotations 14440 14337 103 0.72
Total SwissProt to GO Annotations 57420 56888 532 0.94
Total Genes with Interpro to GO Annotations 10d103 9966 137 1.37
Total Interpro to GO Annotations 24074 24408 -334 -1.37
Total Genes with EC to GO Annotations* 817 1709 -892 -52.19
Total EC to GO Annotations * 987 19213 -18226 -94.96

* Loss due to EC2GO refactoring (no annotations to EC root terms).

Methods and strategies for annotation

Literature curation:

Literature curation continues to be the major focus of our annotation efforts.

Computational annotation strategies:

As always, current strategies involve use of translation table to mine SwissProt keywords, InterPro domains, and EC numbers for IEA annotation. These are performed automatically on a nightly basis and require little human intervention.

Harold Drabkin monitors weekly QC reports on manual and automatic annotation stats, and responds to questions about specific annotations as required.

Priorities for annotation

  • Isoform curation (Harold, Karen, Protein Ontology project); focusing on genes that have isoforms or whose products are modified, and co-ordinate with the Protein Ontology Protein Complex project.
  • Genes with no GO annotation but with literature (Li and Dmitry)
  • Genes with only IEA annotation but with literature (Li)
  • Genes marked as having GO annotation completed, but now having new literature (Dmitry)
  • Genes that have an annotation to one of the three root nodes of GO, but have new literature (David)
  • Dmitry has been focused on annotation or miRNAs in MGI
  • Annotation of ciliary genes (Karen)
  • Annotation of metabolic genes, glycolysis,pyruvate metabolism, and carbohydrate catabolism in general (David)
  • Autophagy genes

Presentations and Publications

a. Papers with substantial GO content

  • High-throughput discovery of novel developmental phenotypes. Dickinson ME, Flenniken AM, Ji X, Teboul L, Wong MD, White JK, Meehan TF, Weninger WJ, Westerberg H, Adissu H, Baker CN, Bower L, Brown JM, Caddle LB, Chiani F, Clary D, Cleak J, Daly MJ, Denegre JM, Doe B, Dolan ME, Edie SM, Fuchs H, Gailus-Durner V, Galli A, Gambadoro A, Gallegos J, Guo S, Horner NR, Hsu CW, Johnson SJ, Kalaga S, Keith LC, Lanoue L, Lawson TN, Lek M, Mark M, Marschall S, Mason J, McElwee ML, Newbigging S, Nutter LM, Peterson KA, Ramirez-Solis R, Rowland DJ, Ryder E, Samocha KE, Seavitt JR, Selloum M, Szoke-Kovacs Z, Tamura M, Trainor AG, Tudose I, Wakana S, Warren J, Wendling O, West DB, Wong L, Yoshiki A; International Mouse Phenotyping Consortium; Jackson Laboratory; Infrastructure Nationale PHENOMIN, Institut Clinique de la Souris (ICS); Charles River Laboratories; MRC Harwell; Toronto Centre for Phenogenomics; Wellcome Trust Sanger Institute; RIKEN BioResource Center, MacArthur DG, Tocchini-Valentini GP, Gao X, Flicek P, Bradley A, Skarnes WC, Justice MJ, Parkinson HE, Moore M, Wells S, Braun RE, Svenson KL, de Angelis MH, Herault Y, Mohun T, Mallon AM, Henkelman RM, Brown SD, Adams DJ, Lloyd KC, McKerlie C, Beaudet AL, Bucan M, Murray SA. Nature. 2016 Sep 14;537(7621):508-514. doi: 10.1038/nature19356.
  • Hill DP, D'Eustachio P, Berardini TZ, Mungall CJ, Renedo N, Blake JA. Modeling biochemical pathways in the gene ontology. Database (Oxford). 2016 Sep 1;2016. pii: baw126. doi: 10.1093/database/baw126. PMID: 27589964

c. Poster presentations

  • Judith A. Blake. Genes, Orthologs, and Human Diseases: How Model Organism Databases and the Gene Ontology Empower Knowledge Discovery. The Allied Genetics Conference (TAGC) 2016, Orlando
  • Karen R. Christie. Phylogenetically based Gene Ontology (GO) Annotations using the Phylogenetic Annotation and INference Tool (PAINT.The Allied Genetics Conference (TAGC) 2016, Orlando
  • Harold J. Drabkin Functional annotation of proteoforms in the Mouse Genome Database using the Protein Ontology. The Allied Genetics Conference (TAGC) 2016, Orlando
  • Li Ni. ‘What Does This Gene Do’: Data presentation in Mouse Genome Informatics for the scientific community including neuroscientists. Neuroscience 2016, San Diego, CA

Other Highlights:

A. Ontology Development Contributions:

  • David Hill co-led the ontology development group with Melanie Courtot, co-organizing the weekly ontology development calls. Addressed GH requests for ontology terms and improvement, and revamped autophagy part of the ontology with Ruth¹s group, Marc Feuermann and Paola Roncaglia.
  • Harold Drabkin is adding new molecular function terms to aid in mapping Metacyc identifiers to GO terms.


B. Annotation Outreach and User Advocacy Efforts:

  • The Protein Ontology project continues to provide a web interface (http/pir.georgetown.edu/cgi-bin/pro/race_pro) whereby functional annotation using the GO can be applied to PRO submissions.
  • Harold Drabkin continues to serve on the GO-help rota.
  • David Hill is now co-managing the annotation group with Kimberly Van Auken (WormBase).


C. Other Highlights:

  • Karen Christie serves as the MGI representative on the PAINT curation team. As a member of this team, Karen curates Panther families in PAINT to propagate annotations based on evolutionary relationships. She also files bug reports on PAINT and contribute to the improvement of the PAINT software.
  • Mary Dolan: Data analysis and visualization for the MGI GO group. QC for GO-related projects: PRO to MGI mapping, mouse reference proteome, production of GPI and GPA files.*

D. Noctua annotation tool.

  • David Hill tested the new Noctua tool for LEGO modeling in GO, including development and design of standards for modeling and began creating production LEGO models. He is now training new annotators in the use of Noctua at 4 international workshops
  • David worked with GOC software engineers to generate annotation files that can be loaded into model organism databases and with MGI software engineers to load Noctua-derived annotations into MGI. This included standardization of identifiers that are used as annotation objects in Noctua. He also worked on the standardization of GPAD and GPI files as the new exchange format for traditional GO annotations. MGI is now the first database using Noctua in a production environment.