MGI May 2011: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
 
(12 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Category:MGI]]
= Mouse Genome Informatics Summary, May 2011 =
= Mouse Genome Informatics Summary, May 2011 =
= Overview =
= Overview =
= Staff: =
= Staff: =
Judith Blake*
Judith Blake*
Line 17: Line 16:


Dmitry Sitnikov
Dmitry Sitnikov


* Funded entirely or partially by GO
* Funded entirely or partially by GO
Line 26: Line 23:




<center>'''MGI GO STATS as of May 2011'''</center>
'''''MGI GO STATS as of May 2011'''''
 




{| style="border-spacing:0;"
{| style="border-spacing:0;"
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| Annotation Type
|| Annotation Type
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| 01_Dec_10
|| 01_Dec_10
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| 05_May_1
|| 05_May_1
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| Change
|| Change
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| % Change
|| % Change


|-
|-
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| Total Genes annotated (with at least one GO term of any kind)
|| Total Genes annotated (with at least one GO term of any kind)
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>33906*</center>
|| 33906*
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>33693**</center>
|| 33693**
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>-213</center>
|| -213
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>-0.63%</center>
|| -0.63%


|-
|-
| colspan="5"  style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| Total Manual Annotation
|| Total Manual Annotation
| colspan="4" |  


|-
|-
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| Number of Genes
|| Number of Genes
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>32670</center>
|| 32670
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>32611</center>
|| 32611
 
|| -59
 
|| -0.18%
 
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>-59</center>
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>-0.18%</center>


|-
|-
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| Orthology
|| Orthology
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>8187</center>
|| 8187
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>8739</center>
|| 8739
 
|| 552
 
|| 6.74%
 
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>552</center>
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>6.74%</center>


|-
|-
| colspan="5"  style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| IEA Annotation
|| IEA Annotation
| colspan="4" |  


|-
|-
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| SwissProt to GO
|| SwissProt to GO
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>15690</center>
|| 15690
 
|| 15684
 
|| -6
 
|| -0.04%
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>15684</center>
 
 
 
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>-6</center>
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>-0.04%</center>


|-
|-
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| Interpro to GO
|| Interpro to GO
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>10530</center>
|| 10530
 
|| 10333
 
|| -197
 
|| -1.87%
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>10333</center>
 
 
 
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>-197</center>
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>-1.87%</center>


|-
|-
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| EC to GO
|| EC to GO
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>1033</center>
|| 1033
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>985</center>
|| 985
 
|| -48
 
|| -4.65%
 
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>-48</center>
| style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>-4.65%</center>


|-
|-
| colspan="5"  style="border:0.0069in solid #00000a;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <nowiki>*100% of current gene models</nowiki>
|| <nowiki>*100% of current gene models** Drops reflect changes in gene number (vs pseudogenes,etc.). </nowiki>
 
| colspan="4" |
<nowiki>** Drops reflect changes in gene number (vs pseudogenes,etc.). </nowiki>


|}
|}
The total gene annotations have dropped mostly due to losses of markers from class gene to a non-gene type, such as pseudogene, clusters,e tc. When this happens any root annotations, and annotation by orthology are automatically removed.


 
== Methods and strategies for annotation ==
The total gene annotations have dropped mostly due to losses of markers from class "gene" to a "non-gene" type, such as pseudogene, clusters,e tc. When this happens any root annotations, and annotation by orthology are automatically removed.
'''''Literature curation:'''''
 
==Methods and strategies for annotation==
 
 
''Literature curation:''


Literature curation continues to be the major focus of our annotation efforts.
Literature curation continues to be the major focus of our annotation efforts.


''Computational annotation strategies:''  
'''''Computational annotation strategies:'''''  


As always current strategies involve use of translation table to mine SwissProt keywords, InterPro domains, and EC numbers for IEA annotation. These are performed automatically on a nightly basis and require little human intervention.
As always current strategies involve use of translation table to mine SwissProt keywords, InterPro domains, and EC numbers for IEA annotation. These are performed automatically on a nightly basis and require little human intervention.


'''''Priorities for annotation'''''


''Priorities for annotation''
* Genes assigned by Reference Genome Project (everyone)
 
* Isoform curation (Harold, Protein Ontology project); now co-ordinating with 1 by focusing on reference genes that have isoforms, and co-ordinated with the Protein Ontology Protein Complex project.  
# Genes assigned by Reference Genome Project (everyone)
* Genes with no GO annotation but with literature (Li and Dmitry)  
# Isoform curation (Harold, Protein Ontology project); now co-ordinating with 1 by focusing on reference genes that have isoforms, and co-ordinated with the Protein Ontology Protein Complex project.  
* Genes with only IEA annotation but with literature (Li)  
# Genes with no GO annotation but with literature (Li and Dmitry)  
* Genes identified as being important in lung development (Dmitry)
# Genes with only IEA annotation but with literature (Li)  
* Genes marked as having GO annotation completed, but now having new literature (Dmitry)
# Genes identified as being important in lung development (Dmitry)
* Dmitry has been focused on annotation or miRNAs in MGI
# Genes marked as having GO annotation completed, but now having new literature (Dmitry)


= Presentations and Publications =
= Presentations and Publications =
 
= Ontology Development Contributions: =
= A. Ontology Development Contributions:=
* 1. David Hill has worked on a team with Tanya Berardini, Chris Mungall, Midori Harris and Jane Lomax to develop cross-products within and among the three GO namespaces.
 
* 2. David Hill continues to work with Chris Mungall and Tanya Berardini to add inter-ontology links in GO.
 
* 3. David Hill has worked with Yasmin Alam-Faruque, Doug Howe, Midori Harris, Susan Tweedie, Becky Foulger and community experts to expand the kidney development portion of GO.
*1. David Hill has worked on a team with Tanya Berardini, Chris Mungall, Midori Harris and Jane Lomax to develop cross-products within and among the three GO namespaces.
* 4. Alexander Diehl is project leader for the Cell Ontology (with Chris Mungall) and Terry Meehan is a full-time curator working on the Cell Ontology. Terry has finished implementing cross-products for hematopoietic cell ontology terms and is working on general improvements to the CL, and has moved on to import of FMA cell types, among others. Alex is focusing improvements to the representation of neurons in the CL, part of an ongoing collaboration with the International Neuroinformatics Framework (INCF). In May 2010 we held a very successful Cell Ontology Workshop at The Jackson Laboratory where many issues regarding the longterm development of the ontology were settled. We will hold another workshop on neurons in the first quarter of 2011. (See separate progress report for the Cell Ontology). In summer 2010 Alex mentored Morgan V. Hightshoe, a returned member of the Jackson Laboratory Summer Student Program, and Wade Valleau, a local high school intern in linked projects to revise the representation of nervous system cell types in the Cells Ontology. Wade received support from INCF for his work. Alex continues to act as the GOC liaison to the Infectious Disease Ontology and Vaccine Ontology groups and to act on term requests for the GO from those groups, and is active in the GO Signaling and Virus content development groups.
*2. David Hill continues to work with Chris Mungall and Tanya Berardini to add inter-ontology links in GO.
* 5. David Hill and Jane Lomax oversee the biological content development of GO. In particular, all new developmental biology-related terms submitted to SourceForge are handled by David Hill.
*3. David Hill has worked with Yasmin Alam-Faruque, Doug Howe, Midori Harris, Susan Tweedie, Becky Foulger and community experts to expand the kidney development portion of GO.
* 6. David Hill and Harold Drabkin have been working with Tanya Berardini, Chris Mungall, Midori Harris, Jane Lomax and ChEBI curators to align GO with ChEBI. This will result in the first set of cross-products with GO and an external ontology.
*4. Alexander Diehl is project leader for the Cell Ontology (with Chris Mungall) and Terry Meehan is a full-time curator working on the Cell Ontology. Terry has finished implementing cross-products for hematopoietic cell ontology terms and is working on general improvements to the CL, and has moved on to import of FMA cell types, among others. Alex is focusing improvements to the representation of neurons in the CL, part of an ongoing collaboration with the International Neuroinformatics Framework (INCF). In May 2010 we held a very successful Cell Ontology Workshop at The Jackson Laboratory where many issues regarding the longterm development of the ontology were settled. We will hold another workshop on neurons in the first quarter of 2011. (See separate progress report for the Cell Ontology). In summer 2010 Alex mentored Morgan V. Hightshoe, a returned member of the Jackson Laboratory Summer Student Program, and Wade Valleau, a local high school intern in linked projects to revise the representation of nervous system cell types in the Cells Ontology. Wade received support from INCF for his work. Alex continues to act as the GOC liaison to the Infectious Disease Ontology and Vaccine Ontology groups and to act on term requests for the GO from those groups, and is active in the GO Signaling and Virus content development groups.
* 7. David Hill and Karen Christie have been collaborating to revise and update the transcription area of GO.
*5. David Hill and Jane Lomax oversee the biological content development of GO. In particular, all new developmental biology-related terms submitted to SourceForge are handled by David Hill.
*6. David Hill and Harold Drabkin have been working with Tanya Berardini, Chris Mungall, Midori Harris, Jane Lomax and ChEBI curators to align GO with ChEBI. This will result in the first set of cross-products with GO and an external ontology.
*7. David Hill and Karen Christie have been collaborating to revise and update the transcription area of GO.<br><br>


= Annotation Outreach and User Advocacy Efforts: =
= Annotation Outreach and User Advocacy Efforts: =
* The Protein Ontology project continues to provide a web interface (http/pir.georgetown.edu/cgi-bin/pro/race_pro) whereby functional annotation using the GO can be applied to PRO submissions. These are reviewed by Cecilia Arighi of Georgetown. At present, only the PRO curators (Georgetown and MGI) are using the tool, but it is available to anyone.
* The Protein Ontology project continues to provide a web interface (http/pir.georgetown.edu/cgi-bin/pro/race_pro) whereby functional annotation using the GO can be applied to PRO submissions. These are reviewed by Cecilia Arighi of Georgetown. At present, only the PRO curators (Georgetown and MGI) are using the tool, but it is available to anyone.
* Harold is mentoring two curators in Donna Slonim's group at Tufts, Heather Wick and Craig Fournier. They are focusing on genes involved in human fetal development.
* David Hill continues to serve on the GO-help rota.
* David Hill continues to serve on the GO-help rota.
* Terry Meehan, Chris Mungall, and Alex Diehl (Univ. of Buffalo) continue their editing of the cell ontology (CL) including cross-products to the GO. Chris is helping David Hill in 'mining 'notes in MGI's GO annotations that reference to CL identifiers.


= Other Highlights: =
= Other Highlights: =
*We are now suppling a GAF 2.0 format file to GOC with column 16 cell type and column 17 isoform data filled in. This file is also available directly from our own FTP site.
* We are developing an extended Editorial Interface (EI) for GO annotation that adds additional fields for cross referencing other ontologies (Cell, Anatomy,etc.). Currently, data of this nature is keep in a Private Structured Notes field. As such, it is not easily amenable to retrieval and qc. Moving each item to its own field will allow us to more readily supply data for columns 16 and 17 of the GAF files.
 
will put in image here.


*Dmitry Sitnikov finishes up collaboration with Larry Hunter's group (Dr. Mike Bada) to establish a large, high-quality, corpora of full-text publications, expertly annotated with expressive knowledge representations, to improve the performance of a wide variety of biochemical text mining systems and to create new approaches to text mining. This involves systematically training and evaluating a broad sample of information extraction methods for key tasks, including concept and relationship identification. This effort can also be useful in synonym improvements to the GO between GO terms and certain biological concepts and terminology used in the literature. A total of 98 articles have been annotated for molecular function and process. Additionally, Dmitry is working to improve the usefulless of MGI internal GO QC reports.  
* Dmitry Sitnikov finishes up collaboration with Larry Hunter's group (Dr. Mike Bada) to establish a large, high-quality, corpora of full-text publications, expertly annotated with expressive knowledge representations, to improve the performance of a wide variety of biochemical text mining systems and to create new approaches to text mining. This involves systematically training and evaluating a broad sample of information extraction methods for key tasks, including concept and relationship identification. This effort can also be useful in synonym improvements to the GO between GO terms and certain biological concepts and terminology used in the literature. A total of 98 articles have been annotated for molecular function and process. Additionally, Dmitry is working to improve the usefulless of MGI internal GO QC reports.  


*As the designated coordinator of the MGI/GO project with the GO Reference Genome project, Li Ni participates in annotations of genes assigned by the Reference Genome Project, maintain the mouse Reference Genome list on MGI GO wiki and Google spreadsheet, maintain the Reference Genome status table on GO wiki, oversees the curation of Reference Genome Genes for the mouse group. Li responds and resolves questions about MGI GO annotations for the reference genome annotation project genes, and especially responds and resolves questions from the lead PAINT curator (see Reference Genome Project report for a description of PAINT).
* As the designated coordinator of the MGI/GO project with the GO Reference Genome project, Li Ni participates in annotations of genes assigned by the Reference Genome Project, maintain the mouse Reference Genome list on MGI GO wiki and Google spreadsheet, maintain the Reference Genome status table on GO wiki, oversees the curation of Reference Genome Genes for the mouse group. Li responds and resolves questions about MGI GO annotations for the reference genome annotation project genes, and especially responds and resolves questions from the lead PAINT curator (see Reference Genome Project report for a description of PAINT).  


*Mary Dolan has been involved in a collaboration with Carol Bult at MGI on aligning gene ontology annotations for mouse genes assigned to MouseCyc pathways (See http://www.informatics.jax.org/pathways.shtml) and exploring computational methods for associating functional, pathway, and phenotypic data. Mary also provides various files for the Reference Genome Project, for example, a report to assess the GO annotation status of PANTHER families and subfamilies based on annotations for all reference genome organism genes in the groups.  
* Mary Dolan has been involved in a collaboration with Carol Bult at MGI on aligning gene ontology annotations for mouse genes assigned to MouseCyc pathways (See http://www.informatics.jax.org/pathways.shtml) and exploring computational methods for associating functional, pathway, and phenotypic data. Mary also provides various files for the Reference Genome Project, for example, a report to assess the GO annotation status of PANTHER families and subfamilies based on annotations for all reference genome organism genes in the groups.  


* Collaboration with BioGRID: We have just begun a collaboration with the BioGRID (Biological General Repository for Interaction Datasets) to add protein binding annotations (IPI evidence code), as well as annotations based on genetic interactions ((G()  curated for GO at MGI into BioGRID. The protein binding annotations are augmented in many cases by the use of the expanded evidence code ontology (ECO) to supply more detail as to the type of experiment used for the annotation. Harold has supplied supporting files to Rose Oughtred, Senior Scientific Curator for the Genome Databases Group at the Lewis-Sigler Institute for Integrative Genomics, Princeton for initial incorporation of these annotations.
* Collaboration with BioGRID: We continue a collaboration with the BioGRID (Biological General Repository for Interaction Datasets) for protein binding and genetic interaaction annotation. Harold has recently custom SQL data to Rose Oughtred, Senior Scientific Curator for the Genome Databases Group at the Lewis-Sigler Institute for Integrative Genomics, Princeton, to flag literature in MGI dealing with high-throughput analysis.

Latest revision as of 15:48, 4 June 2014

Mouse Genome Informatics Summary, May 2011

Overview

Staff:

Judith Blake*

Alexander Diehl*

Mary Dolan*

Harold J Drabkin*

David Hill*

Li Ni

Dmitry Sitnikov

  • Funded entirely or partially by GO

Annotation Progress

We continue to put emphasis on those genes selected for the Reference Genome Project. Additional emphasis has been placed on certain genes associated with lung development.


MGI GO STATS as of May 2011


Annotation Type 01_Dec_10 05_May_1 Change % Change
Total Genes annotated (with at least one GO term of any kind) 33906* 33693** -213 -0.63%
Total Manual Annotation
Number of Genes 32670 32611 -59 -0.18%
Orthology 8187 8739 552 6.74%
IEA Annotation
SwissProt to GO 15690 15684 -6 -0.04%
Interpro to GO 10530 10333 -197 -1.87%
EC to GO 1033 985 -48 -4.65%
*100% of current gene models** Drops reflect changes in gene number (vs pseudogenes,etc.).

The total gene annotations have dropped mostly due to losses of markers from class gene to a non-gene type, such as pseudogene, clusters,e tc. When this happens any root annotations, and annotation by orthology are automatically removed.

Methods and strategies for annotation

Literature curation:

Literature curation continues to be the major focus of our annotation efforts.

Computational annotation strategies:

As always current strategies involve use of translation table to mine SwissProt keywords, InterPro domains, and EC numbers for IEA annotation. These are performed automatically on a nightly basis and require little human intervention.

Priorities for annotation

  • Genes assigned by Reference Genome Project (everyone)
  • Isoform curation (Harold, Protein Ontology project); now co-ordinating with 1 by focusing on reference genes that have isoforms, and co-ordinated with the Protein Ontology Protein Complex project.
  • Genes with no GO annotation but with literature (Li and Dmitry)
  • Genes with only IEA annotation but with literature (Li)
  • Genes identified as being important in lung development (Dmitry)
  • Genes marked as having GO annotation completed, but now having new literature (Dmitry)
  • Dmitry has been focused on annotation or miRNAs in MGI

Presentations and Publications

Ontology Development Contributions:

  • 1. David Hill has worked on a team with Tanya Berardini, Chris Mungall, Midori Harris and Jane Lomax to develop cross-products within and among the three GO namespaces.
  • 2. David Hill continues to work with Chris Mungall and Tanya Berardini to add inter-ontology links in GO.
  • 3. David Hill has worked with Yasmin Alam-Faruque, Doug Howe, Midori Harris, Susan Tweedie, Becky Foulger and community experts to expand the kidney development portion of GO.
  • 4. Alexander Diehl is project leader for the Cell Ontology (with Chris Mungall) and Terry Meehan is a full-time curator working on the Cell Ontology. Terry has finished implementing cross-products for hematopoietic cell ontology terms and is working on general improvements to the CL, and has moved on to import of FMA cell types, among others. Alex is focusing improvements to the representation of neurons in the CL, part of an ongoing collaboration with the International Neuroinformatics Framework (INCF). In May 2010 we held a very successful Cell Ontology Workshop at The Jackson Laboratory where many issues regarding the longterm development of the ontology were settled. We will hold another workshop on neurons in the first quarter of 2011. (See separate progress report for the Cell Ontology). In summer 2010 Alex mentored Morgan V. Hightshoe, a returned member of the Jackson Laboratory Summer Student Program, and Wade Valleau, a local high school intern in linked projects to revise the representation of nervous system cell types in the Cells Ontology. Wade received support from INCF for his work. Alex continues to act as the GOC liaison to the Infectious Disease Ontology and Vaccine Ontology groups and to act on term requests for the GO from those groups, and is active in the GO Signaling and Virus content development groups.
  • 5. David Hill and Jane Lomax oversee the biological content development of GO. In particular, all new developmental biology-related terms submitted to SourceForge are handled by David Hill.
  • 6. David Hill and Harold Drabkin have been working with Tanya Berardini, Chris Mungall, Midori Harris, Jane Lomax and ChEBI curators to align GO with ChEBI. This will result in the first set of cross-products with GO and an external ontology.
  • 7. David Hill and Karen Christie have been collaborating to revise and update the transcription area of GO.

Annotation Outreach and User Advocacy Efforts:

  • The Protein Ontology project continues to provide a web interface (http/pir.georgetown.edu/cgi-bin/pro/race_pro) whereby functional annotation using the GO can be applied to PRO submissions. These are reviewed by Cecilia Arighi of Georgetown. At present, only the PRO curators (Georgetown and MGI) are using the tool, but it is available to anyone.
  • David Hill continues to serve on the GO-help rota.
  • Terry Meehan, Chris Mungall, and Alex Diehl (Univ. of Buffalo) continue their editing of the cell ontology (CL) including cross-products to the GO. Chris is helping David Hill in 'mining 'notes in MGI's GO annotations that reference to CL identifiers.

Other Highlights:

  • We are developing an extended Editorial Interface (EI) for GO annotation that adds additional fields for cross referencing other ontologies (Cell, Anatomy,etc.). Currently, data of this nature is keep in a Private Structured Notes field. As such, it is not easily amenable to retrieval and qc. Moving each item to its own field will allow us to more readily supply data for columns 16 and 17 of the GAF files.

will put in image here.

  • Dmitry Sitnikov finishes up collaboration with Larry Hunter's group (Dr. Mike Bada) to establish a large, high-quality, corpora of full-text publications, expertly annotated with expressive knowledge representations, to improve the performance of a wide variety of biochemical text mining systems and to create new approaches to text mining. This involves systematically training and evaluating a broad sample of information extraction methods for key tasks, including concept and relationship identification. This effort can also be useful in synonym improvements to the GO between GO terms and certain biological concepts and terminology used in the literature. A total of 98 articles have been annotated for molecular function and process. Additionally, Dmitry is working to improve the usefulless of MGI internal GO QC reports.
  • As the designated coordinator of the MGI/GO project with the GO Reference Genome project, Li Ni participates in annotations of genes assigned by the Reference Genome Project, maintain the mouse Reference Genome list on MGI GO wiki and Google spreadsheet, maintain the Reference Genome status table on GO wiki, oversees the curation of Reference Genome Genes for the mouse group. Li responds and resolves questions about MGI GO annotations for the reference genome annotation project genes, and especially responds and resolves questions from the lead PAINT curator (see Reference Genome Project report for a description of PAINT).
  • Mary Dolan has been involved in a collaboration with Carol Bult at MGI on aligning gene ontology annotations for mouse genes assigned to MouseCyc pathways (See http://www.informatics.jax.org/pathways.shtml) and exploring computational methods for associating functional, pathway, and phenotypic data. Mary also provides various files for the Reference Genome Project, for example, a report to assess the GO annotation status of PANTHER families and subfamilies based on annotations for all reference genome organism genes in the groups.
  • Collaboration with BioGRID: We continue a collaboration with the BioGRID (Biological General Repository for Interaction Datasets) for protein binding and genetic interaaction annotation. Harold has recently custom SQL data to Rose Oughtred, Senior Scientific Curator for the Genome Databases Group at the Lewis-Sigler Institute for Integrative Genomics, Princeton, to flag literature in MGI dealing with high-throughput analysis.