Difference between revisions of "MGI December 2017"

From GO Wiki
Jump to navigation Jump to search
 
(19 intermediate revisions by 6 users not shown)
Line 27: Line 27:
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| <center>'''Annotation Type '''</center>
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| <center>'''Annotation Type '''</center>
| style="border:0.0069in solid #00000a;padding:0.0694in;"| <center>'''Dec 5 2016 '''</center>
| style="border:0.0069in solid #00000a;padding:0.0694in;"| <center>'''Dec 5 2016 '''</center>
| style="border:0.0069in solid #00000a;padding:0.0694in;"| <center>'''replace '''</center>
| style="border:0.0069in solid #00000a;padding:0.0694in;"| <center>'''Dec 5 2017 '''</center>
| style="border:0.0069in solid #00000a;padding:0.0694in;"| <center>'''Change '''</center>
| style="border:0.0069in solid #00000a;padding:0.0694in;"| <center>'''Change '''</center>
| style="border:0.0069in solid #00000a;padding:0.0694in;"| <center>'''% change '''</center>
| style="border:0.0069in solid #00000a;padding:0.0694in;"| <center>'''% change '''</center>
Line 34: Line 34:
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Genes annotated with at least one GO term of any kind  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Genes annotated with at least one GO term of any kind  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 24213
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 24213
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 24224
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 24529
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -11*
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 316
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -0.05
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 1.3


|-
|-
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Annotations:  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Annotations:  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 360758
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 360758
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 362727
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 375767
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -1969
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 15009
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -0.54
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 4.2


|-
|-
Line 51: Line 51:
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Number of Genes:  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Number of Genes:  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 24032
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 24032
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 23979
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 24344
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 53
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 312
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 0.22
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 1.3


|-
|-
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Annotations:  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Annotations:  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 278277
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 278277
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 262218
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 295718
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 16059
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 17441
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 6.12
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 6.3


|-
|-
Line 68: Line 68:
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| MGI Curated Mouse Genes  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| MGI Curated Mouse Genes  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 12624
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 12624
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 12433
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 12836
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 191
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 212
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 1.54
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 1.7


|-
|-
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| MGI Curated Annotations  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| MGI Curated Annotations  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 89907
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 89907
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 87159
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 91935
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 2748
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 2028
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 3.15
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 2.3


|-
|-
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| GOA Curated Mouse Genes:  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| GOA Curated Mouse Genes:  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 5424
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 5424
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 5075
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 5996
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 349
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 572
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 6.88
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 10.5


|-
|-
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| GOA Curated Annotations:  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| GOA Curated Annotations:  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 33530
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 33530
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 30177
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 38212
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 3353
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 4682
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 11.11
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 14.0


|-
|-
Line 99: Line 99:
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Genes Annotated by Orthology  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Genes Annotated by Orthology  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 12067
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 12067
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 11866
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 12654
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 201
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 587
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 1.69
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 4.9


|-
|-
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Orthology Annotation  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Orthology Annotation  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 106607
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 106607
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 102212
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 117460
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 4395
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 10853
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 4.30
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 10.2


|-
|-
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Genes Annotated by Human Orthology Load (GOA)  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Genes Annotated by Human Orthology Load (GOA)  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 10942
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 10942
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 10701
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 11653
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 241
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 711
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 2.25
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 6.5


|-
|-
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Annotation by Human Orthology Load  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Annotation by Human Orthology Load  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 71680
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 71680
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 68129
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 80897
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 3551
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 9217
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 5.21
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 12.9


|-
|-
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Genes Annotated by Rat Orthology Load (RGD)  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Genes Annotated by Rat Orthology Load (RGD)  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 4849
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 4849
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 4696
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 4980
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 153
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 131
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 3.26
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 2.7


|-
|-
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Annotations by Rat Orthology Load  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Annotations by Rat Orthology Load  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 31405
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 31405
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 30769
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 33117
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 636
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 1712
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 2.07
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 5.5


|-
|-
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Genes Annotated by Phylogeny  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Genes Annotated by Phylogeny  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 8153
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 8153
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 6400
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 8447
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 1753
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 294
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 27.39
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 3.6


|-
|-
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Annotations by Phylogeny
| style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Annotations by Phylogeny
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 29434
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| 29434
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 22522
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 30476
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 6912
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 1042
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 30.69
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 3.5


|-
|-
Line 158: Line 158:
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Genes with IEA Annotations  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Genes with IEA Annotations  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 14815
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 14815
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 14724
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 14975
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 91
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 160
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 0.62
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 1.1


|-
|-
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total IEA Annotations  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total IEA Annotations  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 82481
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 82481
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 100509
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 80049
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -18028
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -2432
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -17.94
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -2.9


|-
|-
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Genes with SwissProt to GO Annotations  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Genes with SwissProt to GO Annotations  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 14440
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 14440
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 14337
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 14606
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 103
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 166
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 0.72
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 1.1


|-
|-
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total SwissProt to GO Annotations  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total SwissProt to GO Annotations  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 57420
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 57420
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 56888
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 56515
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 532
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -905
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 0.94
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -1.6


|-
|-
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Genes with Interpro to GO Annotations  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Genes with Interpro to GO Annotations  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 10103
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 10103
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 9966
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 9879
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 137
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -224
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 1.37
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -2.2


|-
|-
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Interpro to GO Annotations  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Interpro to GO Annotations  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 24074
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 24074
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 24408
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 22564
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -334
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -1510
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -1.37
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -6.3


|-
|-
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Genes with EC to GO Annotations*  
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total Genes with EC to GO Annotations*  
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 817
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 817
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 1709
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 789
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -892
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -28
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -52.19
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -3.4


|-
|-
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total EC to GO Annotations *
| colspan="2"  style="border:0.0069in solid #00000a;padding:0.0694in;"| Total EC to GO Annotations *
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 987
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 987
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 19213
| style="border:0.0069in solid #00000a;padding:0.0694in;"| 970
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -18226
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -17
| style="border:0.0069in solid #00000a;padding:0.0694in;"| -94.96
| style="border:0.0069in solid #00000a;padding:0.0694in;"| --1.7


|}
|}
'''<nowiki>* Loss due to EC2GO refactoring (no annotations to EC root terms).</nowiki>'''


= Methods and strategies for annotation =
= Methods and strategies for annotation =
'''''Literature curation:'''''
Literature curation continues to be the major focus of our annotation efforts.
'''''Computational annotation strategies:'''''
As always, current strategies involve use of translation table to mine SwissProt keywords, InterPro domains, and EC numbers for IEA annotation. These are performed automatically on a nightly basis and require little human intervention.


Harold Drabkin monitors weekly QC reports on manual and automatic annotation stats, and responds to questions about specific annotations as required.


'''''Priorities for annotation'''''
'''''Priorities for annotation'''''
Line 231: Line 222:
* Genes with only IEA annotation but with literature (Li)
* Genes with only IEA annotation but with literature (Li)
* Genes marked as having GO annotation completed, but now having new literature (Dmitry)
* Genes marked as having GO annotation completed, but now having new literature (Dmitry)
* Genes that have an annotation to one of the three root nodes of GO, but have new literature (David)
* Genes that have an annotation to one of the three root nodes of GO, but have new literature (Karen, David)
* Dmitry has been focused on annotation or miRNAs in MGI
* Dmitry has been focused on annotation or miRNAs in MGI
* Annotation of ciliary genes (Karen)
* Annotation of ciliary genes (Karen)
* Annotation of metabolic genes, glycolysis,pyruvate metabolism, and carbohydrate catabolism in general (David)
* Annotation of metabolic genes, glycolysis,pyruvate metabolism, and carbohydrate catabolism in general (David)
* Autophagy genes
* Autophagy genes
'''''Literature curation:'''''
Literature curation continues to be the major focus of our annotation efforts.  Recently the MGI team implemented new literature mining and curation support mechanisms to improve efficiency of the identification of relevant literature and to mark-up and track data curation efforts.  Evaluation of literature used across MGI shows that althrough the number of papers in PubMed associated with 'mouse' as the experimental organism, the number of papers relevant to genetics and genomics research has remained roughly steady.  That said, however, the number of papers reported on in papers has increased, leading to increased curation effort per paper overall.
'''''Computational annotation strategies:'''''
As always, current strategies involve use of translation table to mine SwissProt keywords, InterPro domains, and EC numbers for IEA annotation. These are performed automatically on a nightly basis and require little human intervention.
Harold Drabkin monitors weekly QC reports on manual and automatic annotation stats, and responds to questions about specific annotations as required.


=  Presentations and Publications =
=  Presentations and Publications =
a.  Papers with substantial GO content
a.  Papers with substantial GO content
*start filling in
* Roncaglia P, van Dam TJP, Christie KR, Nacheva L, Toedt G, Huynen MA, Huntley RP, Gibson TJ, Lomax J. The Gene Ontology of eukaryotic cilia and flagella. Cilia. 2017 Nov 16;6:10. doi: 10.1186/s13630-017-0054-8. eCollection 2017. PMID:29177046
 
*
*


c. Poster presentations
c. Poster presentations


* start filling in
* Christie KR, Roncaglia P, van Dam TJP, Gibson TJ, Lomax J, Blake JA. Comprehensive Gene Ontology annotation of ciliary genes in the laboratory mouse. Biocuration 2017, Stanford, March 26-29, 2017
 
* Dolan, ME and Blake, JA. Constructing a multi-species GO Slim. GOC October 2-4, 2017.
 
* Ni L, Drabkin HJ, Christie KR, Arighi CN, Wu CH and Blake JA. Functional annotation of proteoforms in the Mouse Genome Database using the Protein Ontology. ISMB 2017, Prague, Czech Republic, July 21 - July 25, 2017


= Other Highlights: =
= Other Highlights: =
Line 251: Line 255:
A. Ontology Development Contributions:
A. Ontology Development Contributions:


* David Hill co-led the ontology development group with Melanie Courtot, co-organizing the weekly ontology development calls.  Addressed GH requests for ontology terms and improvement, and revamped autophagy part of the ontology with Ruth¹s group, Marc Feuermann and Paola Roncaglia.
* David Hill led the ontology development group with help from Kimberly VanAuken. There is now a much closer association with the ontology developers and the annotation group. David and Kimberly meet at least once a week.
* Harold Drabkin is adding new molecular function terms to aid in mapping Metacyc identifiers to GO terms.
* Harold Drabkin is adding new molecular function terms to aid in mapping Metacyc identifiers to GO terms for plant metabolic enzymes.
* David Hill, Karen Christie and Harold Drabkin are core members of the GitHub ontology ticket rota. They are joined by Pascale Gaudet and Kimberly Van Auken.
* David Hill, Karen Christie and Harold Drabkin attended ontology developers' training workshops twice this calendar year. Those workshops resulted in the ontology group switching over to GitHub and Protege-based maintenance of the ontology.
* Karen Christie, as part of the cilia ontology working group, completed work (now published) on the cilia ontology development project.
* David Hill completed work on the autophagy annotation-ontology development project.
* David Hill attended the Protein Ontology workshop as a GO representative.




B. Annotation Outreach and User Advocacy Efforts:
B. Annotation Outreach and User Advocacy Efforts:
* The Protein Ontology project continues to provide a web interface (http/pir.georgetown.edu/cgi-bin/pro/race_pro) whereby functional annotation using the GO can be applied to PRO submissions.
* The Protein Ontology project continues to provide a web interface (http/pir.georgetown.edu/cgi-bin/pro/race_pro) whereby functional annotation using the GO can be applied to PRO submissions.
* Harold Drabkin continues to serve on the GO-help rota.
* Harold Drabkin continues to serve on the GO-help rota.
* David Hill is now co-managing the annotation group with Kimberly Van Auken (WormBase).




C. Other Highlights:
C. Noctua annotation tool.
 
* David Hill, Karen Christie, and Harold Drabkin attend bimonthly meetings on the development of the Noctua tool to improve its functionality to become a production level tool for curation of GO annotations throughout the GO Consortium.
* Karen Christie serves as the MGI representative on the PAINT curation team. As a member of this team, Karen curates Panther families in PAINT to propagate annotations based on evolutionary relationships. She also files bug reports on PAINT and contribute to the improvement of the PAINT software.  
* All MGI curators continue to test the new Noctua tool for GO-CAM modeling in GO, including development and design of standards for modeling and began creating production GO-CAM models.


* Mary Dolan: Data analysis and visualization for the MGI GO group. QC for GO-related projects: PRO to MGI mapping, mouse reference proteome, production of GPI and GPA files.<nowiki>*</nowiki>


D. Noctua annotation tool.
D. Other Highlights: 
*David Hill tested the new Noctua tool for LEGO modeling in GO, including development and design of standards for modeling and began creating production LEGO models. He is now training  new annotators in the use of Noctua at 4 international workshops
* Karen Christie serves as the MGI representative on the PAINT curation team. As a member of this team, Karen curates Panther families in PAINT to propagate annotations based on evolutionary relationships. She also files bug reports on PAINT and contribute to the improvement of the PAINT software.
* David worked with GOC software engineers to generate annotation files that can be loaded into model organism databases and with MGI software engineers to load Noctua-derived annotations into MGI. This included standardization of identifiers that are used as annotation objects in Noctua.  He also worked on the standardization of GPAD and GPI files as the new exchange format for traditional GO annotations. MGI is now the first database using Noctua in a production environment.
* Mary Dolan: Data analysis and visualization for the MGI GO group. QC for GO-related projects: PRO to MGI mapping (attended PRO meeting October 2017), resolving mouse reference proteome (ongoing work as part of MGI-AGR).
* Mary Dolan applied the MGI GOSlim methodology for use by the AGR. Poster "Constructing a multi-species GO Slim" presented at GOC Meeting October 2017.

Latest revision as of 11:50, 13 December 2017

Overview:

Staff:

[please include FTEs working on GOC tasks designating as well how many FTEs funding by GOC NIHGRI grant]

Judith Blake*

Karen R Christie*

Mary E Dolan

Harold J Drabkin*

David Hill*

Li Ni

Dmitry Sitnikov

* Funded entirely or partially by GO

Annotation Progress

Annotation Type
Dec 5 2016
Dec 5 2017
Change
% change
Total Genes annotated with at least one GO term of any kind 24213 24529 316 1.3
Total Annotations: 360758 375767 15009 4.2
Total non-IEA Annotation
Total Number of Genes: 24032 24344 312 1.3
Total Annotations: 278277 295718 17441 6.3
Annotation by Direct Experiment
MGI Curated Mouse Genes 12624 12836 212 1.7
MGI Curated Annotations 89907 91935 2028 2.3
GOA Curated Mouse Genes: 5424 5996 572 10.5
GOA Curated Annotations: 33530 38212 4682 14.0
Annotation by Orthology
Total Genes Annotated by Orthology 12067 12654 587 4.9
Total Orthology Annotation 106607 117460 10853 10.2
Genes Annotated by Human Orthology Load (GOA) 10942 11653 711 6.5
Total Annotation by Human Orthology Load 71680 80897 9217 12.9
Genes Annotated by Rat Orthology Load (RGD) 4849 4980 131 2.7
Total Annotations by Rat Orthology Load 31405 33117 1712 5.5
Genes Annotated by Phylogeny 8153 8447 294 3.6
Total Annotations by Phylogeny 29434 30476 1042 3.5
IEA Annotation
Total Genes with IEA Annotations 14815 14975 160 1.1
Total IEA Annotations 82481 80049 -2432 -2.9
Total Genes with SwissProt to GO Annotations 14440 14606 166 1.1
Total SwissProt to GO Annotations 57420 56515 -905 -1.6
Total Genes with Interpro to GO Annotations 10103 9879 -224 -2.2
Total Interpro to GO Annotations 24074 22564 -1510 -6.3
Total Genes with EC to GO Annotations* 817 789 -28 -3.4
Total EC to GO Annotations * 987 970 -17 --1.7

Methods and strategies for annotation

Priorities for annotation

  • Isoform curation (Harold, Li, Protein Ontology project); focusing on genes that have isoforms or whose products are modified, and co-ordinate with the Protein Ontology Protein Complex project.
  • Genes with no GO annotation but with literature (Li and Dmitry)
  • Genes with only IEA annotation but with literature (Li)
  • Genes marked as having GO annotation completed, but now having new literature (Dmitry)
  • Genes that have an annotation to one of the three root nodes of GO, but have new literature (Karen, David)
  • Dmitry has been focused on annotation or miRNAs in MGI
  • Annotation of ciliary genes (Karen)
  • Annotation of metabolic genes, glycolysis,pyruvate metabolism, and carbohydrate catabolism in general (David)
  • Autophagy genes

Literature curation:

Literature curation continues to be the major focus of our annotation efforts. Recently the MGI team implemented new literature mining and curation support mechanisms to improve efficiency of the identification of relevant literature and to mark-up and track data curation efforts. Evaluation of literature used across MGI shows that althrough the number of papers in PubMed associated with 'mouse' as the experimental organism, the number of papers relevant to genetics and genomics research has remained roughly steady. That said, however, the number of papers reported on in papers has increased, leading to increased curation effort per paper overall.

Computational annotation strategies:

As always, current strategies involve use of translation table to mine SwissProt keywords, InterPro domains, and EC numbers for IEA annotation. These are performed automatically on a nightly basis and require little human intervention.

Harold Drabkin monitors weekly QC reports on manual and automatic annotation stats, and responds to questions about specific annotations as required.

Presentations and Publications

a. Papers with substantial GO content

  • Roncaglia P, van Dam TJP, Christie KR, Nacheva L, Toedt G, Huynen MA, Huntley RP, Gibson TJ, Lomax J. The Gene Ontology of eukaryotic cilia and flagella. Cilia. 2017 Nov 16;6:10. doi: 10.1186/s13630-017-0054-8. eCollection 2017. PMID:29177046

c. Poster presentations

  • Christie KR, Roncaglia P, van Dam TJP, Gibson TJ, Lomax J, Blake JA. Comprehensive Gene Ontology annotation of ciliary genes in the laboratory mouse. Biocuration 2017, Stanford, March 26-29, 2017
  • Dolan, ME and Blake, JA. Constructing a multi-species GO Slim. GOC October 2-4, 2017.
  • Ni L, Drabkin HJ, Christie KR, Arighi CN, Wu CH and Blake JA. Functional annotation of proteoforms in the Mouse Genome Database using the Protein Ontology. ISMB 2017, Prague, Czech Republic, July 21 - July 25, 2017

Other Highlights:

A. Ontology Development Contributions:

  • David Hill led the ontology development group with help from Kimberly VanAuken. There is now a much closer association with the ontology developers and the annotation group. David and Kimberly meet at least once a week.
  • Harold Drabkin is adding new molecular function terms to aid in mapping Metacyc identifiers to GO terms for plant metabolic enzymes.
  • David Hill, Karen Christie and Harold Drabkin are core members of the GitHub ontology ticket rota. They are joined by Pascale Gaudet and Kimberly Van Auken.
  • David Hill, Karen Christie and Harold Drabkin attended ontology developers' training workshops twice this calendar year. Those workshops resulted in the ontology group switching over to GitHub and Protege-based maintenance of the ontology.
  • Karen Christie, as part of the cilia ontology working group, completed work (now published) on the cilia ontology development project.
  • David Hill completed work on the autophagy annotation-ontology development project.
  • David Hill attended the Protein Ontology workshop as a GO representative.


B. Annotation Outreach and User Advocacy Efforts:

  • The Protein Ontology project continues to provide a web interface (http/pir.georgetown.edu/cgi-bin/pro/race_pro) whereby functional annotation using the GO can be applied to PRO submissions.
  • Harold Drabkin continues to serve on the GO-help rota.


C. Noctua annotation tool.

  • David Hill, Karen Christie, and Harold Drabkin attend bimonthly meetings on the development of the Noctua tool to improve its functionality to become a production level tool for curation of GO annotations throughout the GO Consortium.
  • All MGI curators continue to test the new Noctua tool for GO-CAM modeling in GO, including development and design of standards for modeling and began creating production GO-CAM models.


D. Other Highlights:

  • Karen Christie serves as the MGI representative on the PAINT curation team. As a member of this team, Karen curates Panther families in PAINT to propagate annotations based on evolutionary relationships. She also files bug reports on PAINT and contribute to the improvement of the PAINT software.
  • Mary Dolan: Data analysis and visualization for the MGI GO group. QC for GO-related projects: PRO to MGI mapping (attended PRO meeting October 2017), resolving mouse reference proteome (ongoing work as part of MGI-AGR).
  • Mary Dolan applied the MGI GOSlim methodology for use by the AGR. Poster "Constructing a multi-species GO Slim" presented at GOC Meeting October 2017.