Mouse Genome Informatics Summary, 2009
Harold J Drabkin*
- Funded entirely or partially by GO
We continue to put emphasis on those genes selected for the Reference Genome Project. Additional emphasis has been placed on certain genes associated with lung development.
|Annotation Type||09_Dec_08||09_Dec_09||Change||% Change|
|Total Genes annotated (with at least one GO term of any kind):||
|Total Manual Annotation|
|Number of Genes||
|SwissProt to GO||
|Interpro to GO||
|EC to GO||
|* 62% of current gene models|
Methods and strategies for annotation
Literature curation continues to be the major focus of our annotation efforts. We continue to explore natural language processing tools to aid in identifying papers that are primarily focused on aspects of lung development, with the aid of Karen Dowell.
Computational annotation strategies:
As always current strategies involve use of translation table to mine SwissProt Keywords and InterPro domains for IEA annotation. These are performed automatically on a nightly basis and require little human intervention.
Priorities for annotation
- Genes assigned by Reference Genome Project (everyone)
- Isoform curation (Harold, Protein Ontology project); now co-ordinating with 1 by focusing on reference genes that have isoforms.
- Genes with no GO annotation but with literature (Li and Dmitry)
- Genes with only IEA annotation but with literature (Li)
- Genes identified as being important in lung development (Dmitry)
- Genes marked as having GO annotation completed, but now having new literature (Dmitry)
Presentations and Publications
Alterovitz G, Xiang M, Hill DP, Lomax J, Liu J, Cherkassky M, Mungall C, Harris MA, Dolan ME, Blake JA, Ramoni MF. Ontology Engineering. Nature Biotech [Accepted, In Press].
Hill DP, Berardini TZ, Howe DG, Van Auken KM. 2009. Representing Ontogeny Through Ontology: A Developmental Biologist’s Guide to The Gene Ontology. Mol Reprod & Dev. (In Press)
Dowell, KG, McAndrews-Hill MS, Hill DP, Drabkin HJ, Blake JA. Integrating Text Mining into the MGI Biocuration Workflow (2009) Database (In Press)
Diehl AD, Deckhut Augustine A, Blake JA, Cowell LG, Gold ES, Gondré-Lewis TA, Masci AM, Meehan TF, Morel PA, NIAID Cell Ontology Working Group, Nijnik A, Peters B, Pulendran B, Scheuermann RH, Yao QA, Zand MS, Mungall CJ. 2009. Hematopoietic Cell Types: Prototype for a Revised Cell Ontology. Proceeding of the International Conference on Biomedical Ontology, July 24-26, 2009, University of Buffalo, NY.
Masci AM, Arighi CN, Diehl AD, Lieberman AE, Mungall C, Scheuermann RH, Smith B, Cowell LG. 2009. An improved ontological representation of dendritic cells as a paradigm for all cell types. BMC Bioinformatics, 10:70.
Feltrin E, Campanaro S, Diehl AD, Ehler E, Faulkner G, Fordham J, Gardin C, Harris M, Hill D, Knoell R, Laveder P, Mittempergher L, Nori A, Reggiani C, Sorrentino V, Volpe P, Zara I, Valle G, Deegan J Nee Clark. 2009. Muscle Research and Gene Ontology: New standards for improved data integration. BMC Medical Genomics, 2:6.
Dolan ME, Blake JA. 2009. Using ontology visualization to facilitate access to knowledge about human disease genes. Applied Ontology 4(1):35-49.
Joslyn C, Baddeley B, Blake J, Bult C, Dolan M, Riensche R, Rodland K, Sanfilippo A, White A. 2009. Automated Annotation-Based Bio-Ontology Alignment with Structural Validation, submitted as conference paper to the International Conference on Biomedical Ontology, July 24-26, 2009, University of Buffalo, NY. Available from Nature Precedings <http://dx.doi.org/10.1038/npre.2009.3518.1> (2009).
Dolan ME, Evsikov AV, Blake JA, Bult CJ. MouseCyc: a pathways approach to integration of mouse functional, phenotype and expression data. Poster presented at ISMB2009.
Reference Genome Group of the Gene Ontology Consortium. The Gene Ontology's Reference Genome Project: a unified framework for functional annotation across species, PLoS Comput Biol. 2009 Jul;5(7):e1000431. Epub 2009 Jul 3
Sam LT, Mendonca EA, Li J, Blake J, Friedman C, Lussier YA. 2009. PhenoGO: an integrated resource for the multiscale mining of clinical and biological data. BMC Bioinformatics 10(Suppl 2):S8.
Sitnikov DM, Bada M, Blake JA, Hunter L. Biocurational text mining research using Gene Ontology (GO). Poster presented at Genome Informatics 2009.
Hill DP, Sitnikov D, Blake JA. 2009. Using gene ontology to study branching morphogenesis in mice. Developmental Biology 331: 454.
b. Presentations including Talks and Tutorials and Teaching
Blake, JB. 2009. Plenary Speaker: "Biomedical Literature in the Clouds: Ontologies, Data, and the Semantic Web" North Atlantic Health Sciences Libraries Meeting, Oct 25-27.
Blake, JB. 2009. Invited Speaker: "Digital Annotation for Production Bioinformatics Systems" BioCreative 11.5 Workshop, Madrid, Spain. Oct 7-9.
Blake,JB. 2009 Invited Participant/Speaker: "Orthologs, Paralogs, and Functional Annotations using the Gene Ontology" Quest for Orthologs Meeting, Cambridge-Hinxton Genome Center, U.K. July 3 – 5.
Blake, JB. 2009 Invited Speaker: "Evidence and Inference: Comparative Biology in the Age of Genomics" University of Chicago, Chicago, Illinois. February 5.
Alexander D. Diehl. 2009. Invited Speaker, “Hematopoietic Cell Types: Prototype for a Revised Cell Ontology,” at the International Conference on Biomedical Ontology, July 24-26, Buffalo, NY.
Alexander D. Diehl. 2009. Invited Speaker, “Introduction to the Gene Ontology,” at the Cell Behavior Ontology Workshop, May 4-6, National Institutes of Health, Bethesda, MD.
David Hill 2009 Invited Speaker: "Gene ontology recapitulates ontogeny: Using the gene ontology to study development" MR&D meeting, Brown University.
David Hill 2009. "Model Organism Databases: did you know you supply the data?" , The Jackson Laboratory Graduate Student Association, Bar Harbor.
A. Ontology Development Contributions:
1. David Hill has worked on a team with Tanya Berardini, Chris Mungall, Midori Harris, Jen Deegan and Jane Lomax to develop cross-products within and among the three GO namespaces. These have not been officially released.
2. David Hill has worked with Chris Mungall and Tanya Berardini to introduce the first inter-ontology links in GO. These include regulates links between BP and MF and part_of links between MF and BP.
3. Dmitry Sitnikov and David Hill are extending the lung development branch of Biological Process to adequately annotate genes identified as being important in the development of lung cancers.
4. David Hill has expanded the development of the salivary gland, prostate gland, placenta and mammary gland in the in the BP ontology.
5. Approximately 150 terms were added to the BP ontology as a result of David Hill and Tanya Berardini attending the Society for Developmental Biology meeting.
6. Approximately 50 terms were added to the BP ontology as a result of David Hill attending the MR&D meeting.
7. Approximately 250 terms were added to the BP ontology as a result of David Hill, Tanya Berardini and Doug Howe attending the heart development meeting hosted by University College London.
8. Harold Drabkin and Alexander Diehl are active in the Signaling GO content development group.
9. Alexander Diehl is active in the Virus Term GO content development group.
10. Alexander Diehl continues to act as the GOC liaison to the Infectious Disease Ontology and Vaccine Ontology groups and to act on term requests for the GO from those groups.
11. Alexander Diehl and Chris Mungall are leading a project to revise the Cell Ontology and increase its utility as a source for cell type-specific GO terms and in co-annotation with GO terms. This project is funded by an ARRA Competitive Revision to the main GO Consortium grant; funding began on September 30, 2009. Terrence Meehan at MGI was recruited to work as a full time curator on the Cell Ontology under the direction of Alexander Diehl. Our first large project on the CL is developing the hematopoietic cell type terms into a full logical-definition/cross product format.
12. Alexander Diehl and Judith Blake recently co-mentored Morgan V. Hightshoe, a member of the 2009 Jackson Laboratory Summer Studen.t Program, in a project to revise the representation of nervous system cell types in the Cells Ontology. This work is being continued by a high school intern, Wade Valleau, as part of our larger work on the cell ontology.
13. David Hill and Midori Harris continue to oversee the biological content development of GO. In particular, all new developmental biology-related terms are handled by David Hill and all new 'regulation' terms are handled by David Hill and Tanya Berardini.
Annotation Outreach and User Advocacy Efforts:
The Protein Ontology project is providing a web interface (http/pir.georgetown.edu/cgi-bin/pro/race_pro ) whereby functional annotation using the GO can be applied to PRO submissions. These are reviewed by Cecilia Arighi of Georgetown. At present, only the PRO curators (Georgetown and MGI) are using the tool, but it is available to anyone.
We are now suppling a GAF 2.0 format file to GOC with column 17 isoform data filled in. This file is also available directly from our own FTP site. Column 16 fill in, starting with specifying cell type will be implemented shortly.
Alexander Diehl co-wrote an ARRA Competitive Revision extension to the Gene Ontology Consortium grant HG000273, along with Chris Mungall and Judith Blake for the purpose of revising the Cell Ontology, and using the CL in conjunction with the GO in annotation and in cross-product term formation. This grant was funded on September 30, 2009.
Dmitry Sitnikov continues collaboration with Larry Hunter's group (Dr. Mike Bada) to establish a large, high-quality, corpora of full-text publications, expertly annotated with expressive knowledge representations, to improve the performance of a wide variety of biochemical text mining systems and to create new approaches to text mining. This involves systematically training and evaluating a broad sample of information extraction methods for key tasks, including concept and relationship identification. This effort can also be useful in synonym improvements to the GO between GO terms and certain biological concepts and terminology used in the literature. So far, more then 80 artictles of the chosen 98 have been annotated for molecular function and process. Additionally, Dmitry is working to improve the usefulless of MGI internal GO QC reports.
As the designated coordinator of the MGI/GO project with the GO Reference Genome project, Li Ni participates in annotations of genes assigned by the Reference Genome Project, maintain the mouse Reference Genome list on MGI GO wiki and Google spreadsheet, oversees the curation of Reference Genome Genes for the mouse group.
Mary Dolan has been involved in a collaboration with Carol Bult at MGI on aligning gene ontology annotations for mouse genes assigned to MouseCyc pathways. See