MGI, March 2010: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
 
(14 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Category:MGI]]
= Mouse Genome Informatics March, 2010 =
= Mouse Genome Informatics March, 2010 =
= Overview =
= Overview =
Line 10: Line 11:


David Hill
David Hill
Terry Meehan


Li Ni
Li Ni
Line 15: Line 18:
Dmitry Sitnikov
Dmitry Sitnikov


Mary Dolan  
Mary Dolan
 


= Annotation Progress =
= Annotation Progress =
Line 29: Line 31:
| style="border-top:0.0007in solid #000000;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| '''Annotation Type'''
| style="border-top:0.0007in solid #000000;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| '''Annotation Type'''
| style="border-top:0.0007in solid #000000;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| '''01_Sept_09'''
| style="border-top:0.0007in solid #000000;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| '''01_Sept_09'''
| style="border-top:0.0007in solid #000000;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| '''22_Mar_10'''
| style="border-top:0.0007in solid #000000;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| '''11_Mar_10'''
| style="border-top:0.0007in solid #000000;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| '''Change'''
| style="border-top:0.0007in solid #000000;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| '''Change'''
| style="border:0.0007in solid #000000;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| '''% Change'''
| style="border:0.0007in solid #000000;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| '''% Change'''
Line 99: Line 101:
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">16145</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">16145</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">15942</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">15942</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">62</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">-203</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:0.0007in solid #000000;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">0.39</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:0.0007in solid #000000;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">-1.92</div>


|-
|-
Line 113: Line 115:
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">10533</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">10533</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">10592</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">10592</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">-3</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">59</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:0.0007in solid #000000;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">-0.03</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:0.0007in solid #000000;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">0.56</div>


|-
|-
Line 127: Line 129:
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">1491</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">1491</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">1248</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">1248</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">13</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">-243</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:0.0007in solid #000000;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">0.89</div>
| style="border-top:none;border-bottom:0.0007in solid #000000;border-left:0.0007in solid #000000;border-right:0.0007in solid #000000;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">16.30</div>


|-
|-
Line 139: Line 141:
|}
|}


'**' We have now tagged all genes in MGI that have no annotation or only IEA annotation with annotation to one or more of the three Gene Ontology root terms as needed. These genes were  also then marked as “annotation complete.”  This now allows for notification when GO-appropriate literature becomes indexed to these genes as it comes in through our normal triage process.  
<nowiki>** We have now tagged all genes in MGI that have no annotation or only IEA annotation, and have no literature selected for GO,  with annotation to one or more of the three Gene Ontology root terms as needed. These genes were  also then marked as “annotation complete.”  This now allows for notification when GO-appropriate literature becomes indexed to these genes as it comes in through our normal triage process. (HJD) </nowiki>


'***' We have added a curator-monitored pipeline whereby annotations based on experimental evidence made to rat genes by RGD curators are added to MGI as annotations to mouse genes based on sequence orthology. This has helped fill in annotation to certain under-represented metabolic processes not normally studied in mouse due to their small size.  
<nowiki>*** We have added a curator-monitored pipeline whereby annotations based on experimental evidence made to rat genes by RGD curators are added to MGI as annotations to mouse genes based on sequence orthology. This has helped fill in annotation to certain under-represented metabolic processes not normally studied in mouse due to their small size. (HJD) </nowiki>




Line 149: Line 151:
''Literature curation:''
''Literature curation:''


Literature curation continues to be the major focus of our annotation efforts. Wecontinue to explore natural language processing tools to aid in identifying papers that are primarily focused on aspects of lung development, with the aid of Karen Dowel.
Literature curation continues to be the major focus of our annotation efforts. We are currently using ProMiner to improve associating incoming literature to genes in MGI (indexing). This association is used to drive various QC reports used for identifying annotation priorities.




Line 161: Line 163:
# Genes assigned by Reference Genome Project (everyone)
# Genes assigned by Reference Genome Project (everyone)
# Isoform curation (Harold, Protein Ontology project); now co-ordinating with 1 by focusing on reference genes that have isoforms.
# Isoform curation (Harold, Protein Ontology project); now co-ordinating with 1 by focusing on reference genes that have isoforms.
# Genes with no GO annotation but with literature (Li and Dmitry)  
# Genes with no GO annotation but with literature (Li and Dmitry) (see <nowiki>**</nowiki> above)
# Genes identified as being important in lung development (Dmitry)
# Genes identified as being important in lung development (Dmitry)
# Genes marked as having GO annotation completed, but now having new literature (Dmitry)
# Genes marked as having GO annotation completed, but now having new literature (Dmitry)


= Presentations and Publications =
= Presentations and Publications =
Alterovitz G, Xiang M, Hill DP, Lomax J, Liu J, Cherkassky M, Mungall C, Harris MA, Dolan ME, Blake JA, Ramoni MF. (2010). Ontology Engineering.  Nature Biotechnol. Feb;28(2):128-30.


Mungall CJ, Bada M, Berardini TZ, Deegan J, Ireland A, Harris MA, Hill DP, Lomax J. (2010). Cross-Product Extensions of the Gene Ontology. J. Biomed. Inform. Feb. 9 [Epub ahead of print]
Diehl AD, Augustine AD, Blake JA, Cowell LG, Gold ES, Gondré-Lewis TA, Masci AM, Meehan TF, Morel PA, Nijnik A, Peters B, Pulendran B, Scheuermann RH, Yao QA, Zand MS, Mungall CJ. (2010) Hematopoietic cell types: Prototype for a revised cell ontology. J Biomed Inform. [http://dx.doi.org/10.1016/j.jbi.2010.01.006 doi:10.1016/j.jbi.2010.01.006].


Alterovitz G, Xiang M, Hill DP, Lomax J, Liu J, Cherkassky M, Mungall C, Harris MA, Dolan ME, Blake JA, Ramoni MF. (2010). Ontology Engineering.  Nature Biotechnol. Feb;28(2):128-30.
Dowell, KG, McAndrews-Hill MS, Hill DP, Drabkin HJ, Blake JA. (2009). Integrating Text Mining into the MGI Biocuration Workflow Database. Bap019.


The Gene Ontology Consortium. (2010). The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Res. 38:D331-5.  
Hill DP, Berardini TZ, Howe DG, Van Auken KM. (2009). Representing Ontogeny Through Ontology: A Developmental Biologist’s Guide to The Gene Ontology. Mol Reprod. Dev. 77(4):314-29.


Hill DP, Berardini TZ, Howe DG, Van Auken KM. (2009). Representing Ontogeny Through Ontology: A Developmental Biologist’s Guide to The Gene Ontology. Mol Reprod. Dev. 77(4):314-29.
Mungall CJ, Bada M, Berardini TZ, Deegan J, Ireland A, Harris MA, Hill DP, Lomax J. (2010). Cross-Product Extensions of the Gene Ontology. J. Biomed. Inform. [http://dx.doi.org/10.1016/j.jbi.2010.02.002 doi:10.1016/j.jbi.2010.02.002]


Dowell, KG, McAndrews-Hill MS, Hill DP, Drabkin HJ, Blake JA. (2009). Integrating Text Mining into the MGI Biocuration Workflow Database. Bap019.
The Gene Ontology Consortium. (2010). The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Res. 38:D331-5.  




Line 186: Line 189:
1. David Hill has worked on a team with Tanya Berardini, Chris Mungall, Midori Harris, Jen Deegan and Jane Lomax to develop cross-products within the three GO namespaces. The regulation cross-products have been released in the extended GO. David and Tanya are now quality checking the internal biological process cross-products.
1. David Hill has worked on a team with Tanya Berardini, Chris Mungall, Midori Harris, Jen Deegan and Jane Lomax to develop cross-products within the three GO namespaces. The regulation cross-products have been released in the extended GO. David and Tanya are now quality checking the internal biological process cross-products.


2. David Hill has worked Tanya Berardini to continuing to add interontology links between MF and BP.
2. David Hill has worked with Tanya Berardini to continuing to add interontology links between MF and BP.


3. David Hill has worked with Varsha Khodiyar, Tanya Berardini, Doug Howe, Susan Tweedie, Ruth Lovering and community experts to expand the heart development portion of the ontology.
3. David Hill has worked with Varsha Khodiyar, Tanya Berardini, Doug Howe, Susan Tweedie, Ruth Lovering and community experts to expand the heart development portion of the ontology.
Line 197: Line 200:


7. David Hill and Tanya Berardini added or modified 231 terms as a result of attending the American Society for Cell Biology Meeting in December.
7. David Hill and Tanya Berardini added or modified 231 terms as a result of attending the American Society for Cell Biology Meeting in December.
8. Alexander Diehl is part of the GO Signaling Working Group led by Jen Deegan that has recently implemented improvements in the representation of signaling in the GO.
9. Alexander Diehl is project leader for the Cell Ontology (with Chris Mungall) and Terry Meehan is a full-time curator working on the Cell Ontology.  Terry is currently focusing on implementing cross-products for hematopoietic cell ontology terms, while Alex is working on general improvements to the CL and organization a Cell Ontology Workshop in May 2010.


= Annotation Outreach and User Advocacy Efforts: =
= Annotation Outreach and User Advocacy Efforts: =
Line 205: Line 212:


= Other Highlights: =
= Other Highlights: =
We are now suppling a GAF 2.0 format file to GOC with column 16 and 17 data filled in. This file is also available directly from our own FTP site. Currently, the data in column 16 refers to the cell type that the experiment supporting the annotation was carried out in, by use of cell ontology terms.
We are now suppling a GAF 2.0 format file to GOC with column 16 and 17 data filled in.(HJD)  This file is also available directly from our own FTP site. Currently, the data in column 16 refers to the cell type that the experiment supporting the annotation was carried out in, by use of cell ontology terms. MGI curators has been using several ontologies for many years to detail specific annotations. The ontologies used include Cell, Evidence Code, Adult Mouse Anatomy, Embryonic Mouse Anatomy, and Psi-Mod.

Latest revision as of 15:45, 4 June 2014

Mouse Genome Informatics March, 2010

Overview

Staff:

Judith Blake

Alexander Diehl

Harold J Drabkin

David Hill

Terry Meehan

Li Ni

Dmitry Sitnikov

Mary Dolan

Annotation Progress

We continue to put emphasis on those genes selected for the Reference Genome Project. Additional emphasis has been placed on certain genes associated with lung development.


MGI GO STATS as of March 22, 2010


Annotation Type 01_Sept_09 11_Mar_10 Change % Change
Total Genes annotated (with at least one GO term of any kind):
18188
35191*
16993
93.43
Total Manual Annotation
Number of Genes
11177
33228**
22051
197.29
Orthology:
708
4006
3298***
465.82
IEA Annotation
SwissProt to GO
16145
15942
-203
-1.92
Interpro to GO
10533
10592
59
0.56
EC to GO
1491
1248
-243
16.30
* 100% of current gene models

** We have now tagged all genes in MGI that have no annotation or only IEA annotation, and have no literature selected for GO, with annotation to one or more of the three Gene Ontology root terms as needed. These genes were also then marked as “annotation complete.” This now allows for notification when GO-appropriate literature becomes indexed to these genes as it comes in through our normal triage process. (HJD)

*** We have added a curator-monitored pipeline whereby annotations based on experimental evidence made to rat genes by RGD curators are added to MGI as annotations to mouse genes based on sequence orthology. This has helped fill in annotation to certain under-represented metabolic processes not normally studied in mouse due to their small size. (HJD)


Methods and strategies for annotation

Literature curation:

Literature curation continues to be the major focus of our annotation efforts. We are currently using ProMiner to improve associating incoming literature to genes in MGI (indexing). This association is used to drive various QC reports used for identifying annotation priorities.


Computational annotation strategies:

As always current strategies involve use of translation table to mine SwissProt Keywords and InterPro domains for IEA annotation. These are performed automatically on a nightly basis and require little human intervention.


Priorities for annotation

  1. Genes assigned by Reference Genome Project (everyone)
  2. Isoform curation (Harold, Protein Ontology project); now co-ordinating with 1 by focusing on reference genes that have isoforms.
  3. Genes with no GO annotation but with literature (Li and Dmitry) (see ** above)
  4. Genes identified as being important in lung development (Dmitry)
  5. Genes marked as having GO annotation completed, but now having new literature (Dmitry)

Presentations and Publications

Alterovitz G, Xiang M, Hill DP, Lomax J, Liu J, Cherkassky M, Mungall C, Harris MA, Dolan ME, Blake JA, Ramoni MF. (2010). Ontology Engineering. Nature Biotechnol. Feb;28(2):128-30.

Diehl AD, Augustine AD, Blake JA, Cowell LG, Gold ES, Gondré-Lewis TA, Masci AM, Meehan TF, Morel PA, Nijnik A, Peters B, Pulendran B, Scheuermann RH, Yao QA, Zand MS, Mungall CJ. (2010) Hematopoietic cell types: Prototype for a revised cell ontology. J Biomed Inform. doi:10.1016/j.jbi.2010.01.006.

Dowell, KG, McAndrews-Hill MS, Hill DP, Drabkin HJ, Blake JA. (2009). Integrating Text Mining into the MGI Biocuration Workflow Database. Bap019.

Hill DP, Berardini TZ, Howe DG, Van Auken KM. (2009). Representing Ontogeny Through Ontology: A Developmental Biologist’s Guide to The Gene Ontology. Mol Reprod. Dev. 77(4):314-29.

Mungall CJ, Bada M, Berardini TZ, Deegan J, Ireland A, Harris MA, Hill DP, Lomax J. (2010). Cross-Product Extensions of the Gene Ontology. J. Biomed. Inform. doi:10.1016/j.jbi.2010.02.002

The Gene Ontology Consortium. (2010). The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Res. 38:D331-5.


b. Presentations including Talks and Tutorials and Teaching

A. Ontology Development Contributions:

1. David Hill has worked on a team with Tanya Berardini, Chris Mungall, Midori Harris, Jen Deegan and Jane Lomax to develop cross-products within the three GO namespaces. The regulation cross-products have been released in the extended GO. David and Tanya are now quality checking the internal biological process cross-products.

2. David Hill has worked with Tanya Berardini to continuing to add interontology links between MF and BP.

3. David Hill has worked with Varsha Khodiyar, Tanya Berardini, Doug Howe, Susan Tweedie, Ruth Lovering and community experts to expand the heart development portion of the ontology.

4. David Hill has worked with Yasmin Alam-Faruque, Midori Harris, Becky Folger, Doug Howe, Emily Dimmer, Rachel Huntley and community experts to expand the kidney development portion of the ontology.

5. David Hill and Tanya Berardini are maintaining the automated quality checks of the ontology on an ongoing basis.

6. David Hill and Harold Drabkin are working with Jane Lomax, Midori Harris, Tanya Berardini and Jane Lomax to align the representation of biochemicals in GO.

7. David Hill and Tanya Berardini added or modified 231 terms as a result of attending the American Society for Cell Biology Meeting in December.

8. Alexander Diehl is part of the GO Signaling Working Group led by Jen Deegan that has recently implemented improvements in the representation of signaling in the GO.

9. Alexander Diehl is project leader for the Cell Ontology (with Chris Mungall) and Terry Meehan is a full-time curator working on the Cell Ontology. Terry is currently focusing on implementing cross-products for hematopoietic cell ontology terms, while Alex is working on general improvements to the CL and organization a Cell Ontology Workshop in May 2010.

Annotation Outreach and User Advocacy Efforts:

The Protein Ontology project is providing a web interface (http/pir.georgetown.edu/cgi-bin/pro/race_pro ) whereby functional annotation using the GO can be applied to PRO submissions. These are reviewed by Cecilia Arighi of Georgetown. At present, only the PRO curators (Georgetown and MGI) are using the tool, but it is available to anyone.

Harold and David, along with Emily Dimmer, are mentoring Heather Wick of Tufts University, for annotation of human fetal lung development

Other Highlights:

We are now suppling a GAF 2.0 format file to GOC with column 16 and 17 data filled in.(HJD) This file is also available directly from our own FTP site. Currently, the data in column 16 refers to the cell type that the experiment supporting the annotation was carried out in, by use of cell ontology terms. MGI curators has been using several ontologies for many years to detail specific annotations. The ontologies used include Cell, Evidence Code, Adult Mouse Anatomy, Embryonic Mouse Anatomy, and Psi-Mod.