MGI Progress Report for October 2008

From GO Wiki
Jump to: navigation, search

Staff:

Judith Blake

Alexander Diehl

Harold J Drabkin

David Hill

Li Ni

Dmitry Sitnikov

Mary Dolan

Annotation Progress

Two items of significance: We have archived and removed all IEA annotations using GO Fish. Historically, these annotations using gene names were initially used to add annotation to MGI genes when the GO was first implimented. They are now deemed outdated. Addtionally, we have also removed Fantom annotation (using the RCA code) from our system as there is no easy way to update these, and they were growing stale. Furthermore, a great many of them are duplicated by our IEA annotation process.


MGI GO STATS as of Dec. 09, 2008


Annotation Type 09_Dec_08 30_Nov_07 Change % Change
Total Genes annotated (at least GO term of any kind):
18083
18318
-235
-1.29
Total Hand Annotation
Number of Genes
10855
10058
797
7.92
Orthology:
682
575
107
18.61
"IEA"
SwissProt to GO
16043
14776
1,267
8.57
Interpro to GO
10631
8979
1,652
18.40
EC to GO
1510
1379
131
9.50
GO Fish
0
1832
-1832
-100%

Methods and strategies for annotation

Literature curation:

Literature curation continues to be the main focus of our annotation efforts. We have streamlined our literature triaging process in order to be able to spend more time curating the papers and less time getting them into the MGI system.

Priorities for annotation

  1. Genes assigned by Reference Genome Project (everyone)
  2. Channel proteins (Harold, Protein Ontology project)
  3. Genes previously annotated by GOFish and Fantom that now have no GO annotations, but that have literature (soon to be completed)
  4. Genes with no GO annotation but with literature (Li and Dmitry)
  5. Genes marked as having GO annotation completed, but now having new literature (Dmitry)

Computational annotation strategies:

Current strategies involve use of translation table to mine SwissProt Keywords and InterPro domains for IEA annotation. These are performed automatically on a nightly basis and require little human intervention.


Presentations and Publications

a. Papers with substantial GO content

Alterovitz, G., Xiang, M., Hill, D., Lomax, J., Liu, J., Mungall, C., Harris, M., Dolan, M.E., Blake, J.A., Ramoni, M.F. Engineering Biomedical Ontologies: The Gene Ontology. [In Revision].

Dimmer, E.C., Huntley, R.P., Barrell, D.G., Binns, D., Camon, E., Hubank, M., Blake, J.A., Apweiler, R., Talmud, P.J., Lovering, R.C. The Gene Ontology; Providing a Functional Role in Proteomic Studies. [Submitted: Proteomics].

Lovering, R.C., Camon, E.B., Blake, J.A., Diehl, A.D. Access to Immunology through the Gene Ontology. [In Press: Immunology].

Dolan, M.E., and Blake, J.A., Using ontology visualization to facilitate access to knowledge about human disease genes [In press: Applied Ontology].

Blake, J.A. and Harris, M. “The Gene Ontology (GO) Project: Structured Vocabularies for Molecular Biology and Their Application to Genome and Expression Analysis.” In /Current Protocols in Bioinformatics. Wiley Inc. [In Press]

Tasan M, Tian W, Hill DP, Gibbons FD, Blake JA, Roth FP. (2008) An en masse phenotype and function prediction system for Mus musculus. Genome Biol. 9 Suppl 1:S8.

Peña-Castillo L, Tasan M, Myers CL, Lee H, Joshi T, Zhang C, Guan Y, Leone M, Pagnani A, Kim WK, Krumpelman C, Tian W, Obozinski G, Qi Y, Mostafavi S, Lin GN, Berriz GF, Gibbons FD, Lanckriet G, Qiu J, Grant C, Barutcuoglu Z, Hill DP, Warde-Farley D, Grouios C, Ray D, Blake JA, Deng M, Jordan MI, Noble WS, Morris Q, Klein-Seetharaman J, Bar-Joseph Z, Chen T, Sun F, Troyanskaya OG, Marcotte EM, Xu D, Hughes TR, Roth FP. (2008) A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biol. 2008;9. Suppl 1:S2.

Hill DP, Smith B, McAndrews-Hill MS, Blake JA. (2008) Gene Ontology annotations: what they mean and where they come from. BMC Bioinformatics. Apr 29;9 Suppl 5:S2.

Drabkin, HJ., Arighi CN, Wu, CH, and Blake, JA. 2008. Functional Annotation of Protein Isoforms and Modified Forms. Proceedings of the 2008 International Conference on Bioinformatics & Computational Biology Volume 2, 701-707.


b. Presentations including Talks and Tutorials and Teaching

Drabkin, HD; 2008 Avian Gene Ontology Workshop, Starksville MI, May 21-22 , What is the GO? and What is a GO Annotation

c. Poster presentations


Berardini, Hill, Rhee and Blake. Society for Developmental Biology, Homeodomain proteins in mice and plants: What we know and what we don't.


Mary E. Dolan, Chris Mungall, Tanya Z. Berardini, David P. Hill, John Day-Richter, Jane Lomax for the Gene Ontology Consortium. Describing Biological Regulation In The Gene Ontology" at Biology of Genomes, Cold Spring Harbor Lab, May 08


Li Ni, Mary E Dolan, Alex D Diehl, Harold Drabkin, David P Hill, Dmitry Sitnikov and Judith A Blake. The Comprehensive Functional Annotation Of Mouse Genes And Gene Products Using The Gene Ontology (GO) Genome Informatics 1-5 November 2007 Cold Spring Harbor, NY


Li Ni, Carol J. Bult, Jim A. Kadin, Joel E. Richardson, Martin Ringwald, Janan T. Eppig , Judith A. Blake, and the Mouse Genome Informatics Group Data management and Biological Knowledge Representation in the Mouse, ISMB 2008 conference 19-23 July 2008, Toronto


Other Highlights:

A. Ontology Development Contributions:


1. Expansion of macromolecular complex terms in Cellular Component.

In an effort to expand the representation of macromolecular cmplexes in the GO, William Kornahrens, a summer intern at MGI, created a list of potential complex terms to be added based on scanning several textbook sources, as well as the CORUM resource. This resource is database of experimentally verified complexes in mammals, focusing on human and mouse. Over 1200 macromolecular complexes were identified as mostly missing from the GO, as well as terms that were added to the GO as synonyms for pre-existing complex terms. As a result of this study. over 61 new complex terms have been added to the GO.


2. Function-Process links

Harold has continued exploring and creating function-process links concerning various metabolic pathways such as purine and pyrimidine biosynthesis, fatty acid synthesis and oxidation, and several others.


3. David Hill co-managed ontology development with Midori Harris. He worked on a team with Tanya Berardini, Chris Mungall and Jane Lomax that resulted in the implementation of the regulates relationship in the biological Process ontology. He also represented the GO at the relationship ontology and OBO foundry meetings of the NCBO, and was instrumental in defining the algebra of relationships that can be used to infer relationships through the GO graph, and 2orked with Tanya Berardini and Chris Mungall to address quality control reports that can now be generated due to the use of the reasoner.


B. Annotation Outreach and User Advocacy Efforts:

Harold participated in a workshop at the AgBase conference in May 2008.

C. Other:

Dmitry Sitnikov is involved in a NLP project in collaboration with Dr. Larry Hunter's group (Dr. Mike Bada, Amanda Howard). The goal is to establish a "golden standard" for annotating GO Molecular Function and Biological Process as well as a syntactic context for biomedical literature. Creation of a large, high-quality, corpora of full-text publications, expertly annotated with expressive knowledge representations, will lead to significant improvements in the performance of a wide variety of biochemical text mining systems and to the creation of new approaches to text mining. This involves systematically training and evaluating a broad sample of information extraction methods for key tasks, including concept and relationship identification.