MGI December 2012: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
 
(27 intermediate revisions by 7 users not shown)
Line 1: Line 1:
= Mouse Genome Informatics Summary, December  2011 =
[[Category:MGI]]
= Mouse Genome Informatics Summary, December  2012 =
= Overview =
= Overview =
= Staff: =
= Staff: =
Line 6: Line 7:
Karen R Christie*
Karen R Christie*


Alexander Diehl*
Mary E Dolan*
 
Mary Dolan*


Harold J Drabkin*
Harold J Drabkin*
Line 21: Line 20:


= Annotation Progress =
= Annotation Progress =
We continue to put emphasis on those genes selected for the Reference Genome Project. Additional emphasis is placed on genes with experimental literature but with no GO annotations.




'''''MGI GO STATS as of May 2011'''''
'''''MGI GO STATS as of December 2012'''''


This year our manual annotation effort has focused on obtaining annotations for genes that had no annotations or on obtaining deeper manual  annotations to terms that were annotated with IEA or one of the root nodes of GO.


{| style="border-spacing:0;"
{| style="border-spacing:0;"
Line 103: Line 102:
| style="border-top:none;border-bottom:0.0069in solid #000000;border-left:0.0069in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">4007</div>
| style="border-top:none;border-bottom:0.0069in solid #000000;border-left:0.0069in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">4007</div>
| style="border-top:none;border-bottom:0.0069in solid #000000;border-left:0.0069in solid #000000;border-right:0.0069in solid #000000;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">28.5</div>
| style="border-top:none;border-bottom:0.0069in solid #000000;border-left:0.0069in solid #000000;border-right:0.0069in solid #000000;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">28.5</div>
|-
| style="border-top:none;border-bottom:0.0069in solid #000000;border-left:0.0069in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>'''Annotation by Orthology'''</center>
| style="border-top:none;border-bottom:0.0069in solid #000000;border-left:none;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>&nbsp;</center>
| style="border-top:none;border-bottom:0.0069in solid #000000;border-left:none;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>&nbsp;</center>
| style="border-top:none;border-bottom:0.0069in solid #000000;border-left:none;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>&nbsp;</center>
| style="border-top:none;border-bottom:0.0069in solid #000000;border-left:none;border-right:0.0069in solid #000000;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>&nbsp;</center>


|-
|-
Line 110: Line 116:
| style="border-top:none;border-bottom:none;border-left:0.0069in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">539</div>
| style="border-top:none;border-bottom:none;border-left:0.0069in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">539</div>
| style="border-top:none;border-bottom:none;border-left:0.0069in solid #000000;border-right:0.0069in solid #000000;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">6.0</div>
| style="border-top:none;border-bottom:none;border-left:0.0069in solid #000000;border-right:0.0069in solid #000000;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <div align="right">6.0</div>
|-
| style="border-top:0.0069in solid #000000;border-bottom:0.0069in solid #000000;border-left:0.0069in solid #000000;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>'''Annotation by Orthology'''</center>
| style="border-top:0.0069in solid #000000;border-bottom:0.0069in solid #000000;border-left:none;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>&nbsp;</center>
| style="border-top:0.0069in solid #000000;border-bottom:0.0069in solid #000000;border-left:none;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>&nbsp;</center>
| style="border-top:0.0069in solid #000000;border-bottom:0.0069in solid #000000;border-left:none;border-right:none;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>&nbsp;</center>
| style="border-top:0.0069in solid #000000;border-bottom:0.0069in solid #000000;border-left:none;border-right:0.0069in solid #000000;padding-top:0in;padding-bottom:0in;padding-left:0.075in;padding-right:0.075in;"| <center>&nbsp;</center>


|-
|-
Line 217: Line 216:


|}
|}




Line 231: Line 232:
'''''Priorities for annotation'''''
'''''Priorities for annotation'''''


* Genes assigned by Reference Genome Project (everyone)
* Isoform curation (Harold, Karen, Protein Ontology project); focusing on genes that have isoforms or whose products are modified, and co-ordinate with the Protein Ontology Protein Complex project.  
* Isoform curation (Harold, Protein Ontology project); now co-ordinating with 1 by focusing on reference genes that have isoforms, and co-ordinated with the Protein Ontology Protein Complex project.  
* Genes with no GO annotation but with literature (Li and Dmitry)  
* Genes with no GO annotation but with literature (Li and Dmitry)  
* Genes with only IEA annotation but with literature (Li)
* Genes with only IEA annotation but with literature (Li)
* Genes identified as being important in lung development (Dmitry)
* Genes marked as having GO annotation completed, but now having new literature (Dmitry)
* Genes marked as having GO annotation completed, but now having new literature (Dmitry)
* Genes that have an annotation to one of the three root nodes of GO, but have new literature (David, Karen, Dmitry)
* Dmitry has been focused on annotation or miRNAs in MGI
* Dmitry has been focused on annotation or miRNAs in MGI


= Presentations and Publications =
= Presentations and Publications =
Nucleic Acid research title here: in press
 
Drabkin HJ, Blake JA; for the Mouse Genome Informatics Database. Manual Gene
Ontology annotation workflow at the Mouse Genome Informatics Database. Database
(Oxford). 2012 Oct 29;2012(0):bas045. Print 2012. PubMed PMID: 23110975; PubMed
Central PMCID: PMC3483533.
 
Taşan M, Drabkin HJ, Beaver JE, Chua HN, Dunham J, Tian W, Blake JA, Roth FP.
A Resource of Quantitative Functional Annotation for Homo sapiens Genes. G3
(Bethesda). 2012 Feb;2(2):223-33. Epub 2012 Feb 1. PubMed PMID: 22384401; PubMed
Central PMCID: PMC3284330.
 
Thomas PD, Wood V, Mungall CJ, Lewis SE, Blake JA; Gene Ontology Consortium.
On the Use of Gene Ontology Annotations to Assess Functional Similarity among
Orthologs and Paralogs: A Short Report. PLoS Comput Biol. 2012;8(2):e1002386.
Epub 2012 Feb 16. PubMed PMID: 22359495; PubMed Central PMCID: PMC3280971.
 
Bada M, Eckert M, Evans D, Garcia K, Shipley K, Sitnikov D, Baumgartner WA,
Cohen KB, Verspoor K, Blake JA and Hunter LE.  Concept annotation in the CRAFT
corpus.  BMC Bioinformatics 2012, 13:161 doi:10.1186/1471-2105-13-161.  PMID:
22776079 [PubMed - in process] PMCID: PMC3476437.
 
Dolan ME, Mungall CJ, Dietze H and Blake JA.
A Simplified Method for Creating a Cell Cycle Ontology for the Laboratory Mouse.
Poster presented at CSHALS 2012 http://www.iscb.org/cms_addon/conferences/cshals2012/posters/MDolan_MouseCCO_CSHALS2012.pdf


= Ontology Development Contributions: =
= Ontology Development Contributions: =
* 1. David Hill, working with Tanya Berardini, Chris Mungall, Midori Harris and Jane Lomax develop cross-products within and among the three GO namespaces.
* 1. David Hill continues working with Tanya Berardini, Chris Mungall, Paola Roncaglia and Jane Lomax develop cross-products within and among the three GO namespaces.
* 2. David Hill works with Chris Mungall and Tanya Berardini to add inter-ontology links in GO.
* 3. David Hill and Jane Lomax oversee the biological content development of GO. David now has a regular spot in the rotation to address GO Sourceforge items.
* 3. Alexander Diehl is project leader for the Cell Ontology (with Chris Mungall) and Terry Meehan was a full-time curator working on the Cell Ontology. Terry worked on general improvements to the CL, and import of FMA cell types, among others. Alex expanded the representation of neurons in the CL, part of an ongoing collaboration with the International Neuroinformatics Framework (INCF). In summer 2011 Terry mentored Christopher Carr, a former participant of the Jackson Laboratory Summer Student Program, and two new participants, Krystal English and Danielle Penny. Chris worked on identifying mouse genotypes that share autistic-like phenotypes; Krystal identified developmental processes linked to interstitial lung disease; Penny worked on using spatial and temporal qualities of gene expression to prioritize candidate genes involved in autism. All three made extensive use of various ontologies including the GO for their analyses.
* 4. David Hill and Harold Drabkin, have worked with Tanya Berardini, Chris Mungall, Paola Roncaglia, Jane Lomax and ChEBI curators to align GO with ChEBI. GO-CHEBI cross-products are now use.
* 4. Alex continues to act as the GOC liaison to the Infectious Disease Ontology and Vaccine Ontology groups and to act on term requests for the GO from those groups, and is active in the GO Signaling and Virus content development groups.
* 5. David Hill has worked with the GO ontology development and software groups to develop a web-based tool for requesting new terms.
* 5. David Hill and Jane Lomax oversee the biological content development of GO. In particular, all new developmental biology-related terms submitted to SourceForge are handled by David Hill.
* 6. Harold Drabkin is working on improvement of the the representation of tRNA modification
* 6. David Hill and Harold Drabkin continue working with Tanya Berardini, Chris Mungall, Midori Harris, Jane Lomax and ChEBI curators to align GO with ChEBI. This will result in the first set of cross-products with GO and an external ontology.
* 7. David Hill has worked with the GO editorial office and the BHF-UCL group to represent cardiac conduction in the ontology.
* 7. David Hill and Karen Christie continue to revise and update the transcription area of GO.


= Annotation Outreach and User Advocacy Efforts: =
= Annotation Outreach and User Advocacy Efforts: =
* The Protein Ontology project continues to provide a web interface (http/pir.georgetown.edu/cgi-bin/pro/race_pro) whereby functional annotation using the GO can be applied to PRO submissions. Harold attended the Protein Society meeting in Boston to try to encourage bench scientists to use the too.  
* The Protein Ontology project continues to provide a web interface (http/pir.georgetown.edu/cgi-bin/pro/race_pro) whereby functional annotation using the GO can be applied to PRO submissions.
* David Hill and Harold Drabkin serve on the GO-help rota.
* Harold Drabkin serves on the GO-help rota.
* Terry Meehan, Chris Mungall, and Alex Diehl (Univ. of Buffalo) continue their editing of the cell ontology (CL) including cross-products to the GO. Chris is helping David Hill in 'mining 'notes in MGI's GO annotations that reference to CL identifiers.
* Judith Blake and Karen Christie are working with Rama Balakrishnan to coordinate with Astrid Laegrid of the Norwegian University of Science and Technology in Trondheim to determine how to incorporate GO annotations for transcription factors made by this group.


= Other Highlights: =
= Other Highlights: =
* As the designated coordinator of the MGI/GO project with the GO Reference Genome project, Li Ni participates in annotations of genes assigned by the Reference Genome Project, maintain the mouse Reference Genome list on MGI GO wiki and Google spreadsheet, maintain the Reference Genome status table on GO wiki, oversees the curation of Reference Genome Genes for the mouse group. Li responds and resolves questions about MGI GO annotations for the reference genome annotation project genes, and especially responds and resolves questions from the lead PAINT curator (see Reference Genome Project report for a description of PAINT).  
* As the designated coordinator of the MGI/GO project with the GO Reference Genome project, Li Ni participates in annotations of genes assigned by the Reference Genome Project, oversees the curation of Reference Genome Genes for the mouse group. Li responds and resolves questions about MGI GO annotations for the reference genome annotation project genes, and especially responds and resolves questions from the lead PAINT curator (see Reference Genome Project report for a description of PAINT). Li is also part of the PAINT curation team.  


* Mary Dolan has been involved in a collaboration with Carol Bult at MGI on aligning gene ontology annotations for mouse genes assigned to MouseCyc pathways (See http://www.informatics.jax.org/pathways.shtml) and exploring computational methods for associating functional, pathway, and phenotypic data. Mary also provides various files for the Reference Genome Project, for example, a report to assess the GO annotation status of PANTHER families and subfamilies based on annotations for all reference genome organism genes in the groups.
* Mary Dolan has begun working with other members of the GO software team on Galaxy for GO. In the past, most groups have used Galaxy for sequence analysis. The focus of the GO Galaxy initiative will be functional analysis, implementing GO tools in Galaxy. Mary also provides various files for the Reference Genome Project, for example, a report to assess the GO annotation status of PANTHER families and subfamilies based on annotations for all reference genome organism genes in the groups.

Latest revision as of 15:46, 4 June 2014

Mouse Genome Informatics Summary, December 2012

Overview

Staff:

Judith Blake*

Karen R Christie*

Mary E Dolan*

Harold J Drabkin*

David Hill*

Li Ni

Dmitry Sitnikov

* Funded entirely or partially by GO

Annotation Progress

MGI GO STATS as of December 2012

This year our manual annotation effort has focused on obtaining annotations for genes that had no annotations or on obtaining deeper manual annotations to terms that were annotated with IEA or one of the root nodes of GO.

Annotation Type
05_Dec_2011
05_Dec_2012
Change
% Change
Total Genes annotated with at least one GO term of any kind
25109
25452
343
1.4
Total annotations:
272241
286957
14716
5.4
Total non-IEA Annotation
 
Total Number of Genes
24029
24550
521
2.2
Total Annotations
178496
193088
14592
8.2
Annotation by Direct Experiment
 
 
 
 
MGI Curated Genes
10332
11104
772
7.5
MGI Curated Annotations
64165
70615
6450
10.1
GOA Curated Genes
2910
3475
565
19.4
GOA Curated Annotations
14048
18055
4007
28.5
Annotation by Orthology
 
 
 
 
Genes Annotated by Orthology Total
8912
9451
539
6.0
Total Orthology Annotations
62606
68621
6015
9.6
Genes Annotated by Human Orthology Load (GOA)
7171
7756
585
8.2
Total Annotation by Human Orthology Load
36796
42237
5441
14.8
Genes annotated by Rat Orthology Load (RGD)
3963
4083
120
3.0
Total Annotations by Rat Orthology Load
22390
23323
933
4.2
IEA Annotation
 
 
 
 
Total Genes with IEA
16319
16372
53
0.3
Total IEA annotations
93745
93869
124
0.1
Total Genes with SwissProt to GO
15803
15971
168
1.1
Total SwissProt to GO Annotations
62283
64733
2450
3.9
Total Genes with Interpro to GO
9781
9792
11
0.1
Total Interpro to GO Annotations
29954
27804
-2150
-7.2
Total Genes with EC to GO Annotations
930
944
14
1.5
Total EC to GO annotations
1508
1332
-176
-11.7



Methods and strategies for annotation

Literature curation:

Literature curation continues to be the major focus of our annotation efforts.

Computational annotation strategies:

As always, current strategies involve use of translation table to mine SwissProt keywords, InterPro domains, and EC numbers for IEA annotation. These are performed automatically on a nightly basis and require little human intervention.

Priorities for annotation

  • Isoform curation (Harold, Karen, Protein Ontology project); focusing on genes that have isoforms or whose products are modified, and co-ordinate with the Protein Ontology Protein Complex project.
  • Genes with no GO annotation but with literature (Li and Dmitry)
  • Genes with only IEA annotation but with literature (Li)
  • Genes marked as having GO annotation completed, but now having new literature (Dmitry)
  • Genes that have an annotation to one of the three root nodes of GO, but have new literature (David, Karen, Dmitry)
  • Dmitry has been focused on annotation or miRNAs in MGI

Presentations and Publications

Drabkin HJ, Blake JA; for the Mouse Genome Informatics Database. Manual Gene Ontology annotation workflow at the Mouse Genome Informatics Database. Database (Oxford). 2012 Oct 29;2012(0):bas045. Print 2012. PubMed PMID: 23110975; PubMed Central PMCID: PMC3483533.

Taşan M, Drabkin HJ, Beaver JE, Chua HN, Dunham J, Tian W, Blake JA, Roth FP. A Resource of Quantitative Functional Annotation for Homo sapiens Genes. G3 (Bethesda). 2012 Feb;2(2):223-33. Epub 2012 Feb 1. PubMed PMID: 22384401; PubMed Central PMCID: PMC3284330.

Thomas PD, Wood V, Mungall CJ, Lewis SE, Blake JA; Gene Ontology Consortium. On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report. PLoS Comput Biol. 2012;8(2):e1002386. Epub 2012 Feb 16. PubMed PMID: 22359495; PubMed Central PMCID: PMC3280971.

Bada M, Eckert M, Evans D, Garcia K, Shipley K, Sitnikov D, Baumgartner WA, Cohen KB, Verspoor K, Blake JA and Hunter LE. Concept annotation in the CRAFT corpus. BMC Bioinformatics 2012, 13:161 doi:10.1186/1471-2105-13-161. PMID: 22776079 [PubMed - in process] PMCID: PMC3476437.

Dolan ME, Mungall CJ, Dietze H and Blake JA. A Simplified Method for Creating a Cell Cycle Ontology for the Laboratory Mouse. Poster presented at CSHALS 2012 http://www.iscb.org/cms_addon/conferences/cshals2012/posters/MDolan_MouseCCO_CSHALS2012.pdf

Ontology Development Contributions:

  • 1. David Hill continues working with Tanya Berardini, Chris Mungall, Paola Roncaglia and Jane Lomax develop cross-products within and among the three GO namespaces.
  • 3. David Hill and Jane Lomax oversee the biological content development of GO. David now has a regular spot in the rotation to address GO Sourceforge items.
  • 4. David Hill and Harold Drabkin, have worked with Tanya Berardini, Chris Mungall, Paola Roncaglia, Jane Lomax and ChEBI curators to align GO with ChEBI. GO-CHEBI cross-products are now use.
  • 5. David Hill has worked with the GO ontology development and software groups to develop a web-based tool for requesting new terms.
  • 6. Harold Drabkin is working on improvement of the the representation of tRNA modification
  • 7. David Hill has worked with the GO editorial office and the BHF-UCL group to represent cardiac conduction in the ontology.

Annotation Outreach and User Advocacy Efforts:

  • The Protein Ontology project continues to provide a web interface (http/pir.georgetown.edu/cgi-bin/pro/race_pro) whereby functional annotation using the GO can be applied to PRO submissions.
  • Harold Drabkin serves on the GO-help rota.
  • Judith Blake and Karen Christie are working with Rama Balakrishnan to coordinate with Astrid Laegrid of the Norwegian University of Science and Technology in Trondheim to determine how to incorporate GO annotations for transcription factors made by this group.

Other Highlights:

  • As the designated coordinator of the MGI/GO project with the GO Reference Genome project, Li Ni participates in annotations of genes assigned by the Reference Genome Project, oversees the curation of Reference Genome Genes for the mouse group. Li responds and resolves questions about MGI GO annotations for the reference genome annotation project genes, and especially responds and resolves questions from the lead PAINT curator (see Reference Genome Project report for a description of PAINT). Li is also part of the PAINT curation team.
  • Mary Dolan has begun working with other members of the GO software team on Galaxy for GO. In the past, most groups have used Galaxy for sequence analysis. The focus of the GO Galaxy initiative will be functional analysis, implementing GO tools in Galaxy. Mary also provides various files for the Reference Genome Project, for example, a report to assess the GO annotation status of PANTHER families and subfamilies based on annotations for all reference genome organism genes in the groups.