MGI, March 2009
Mouse Genome Informatics March, 2009
Staff:
Judith Blake
Alexander Diehl
Harold J Drabkin
David Hill
Li Ni
Dmitry Sitnikov
Mary Dolan
Annotation Progress
We continue to put emphasis on those genes selected for the Reference Genome Project. Additional emphasis has been placed on certain genes associated with lung development.
Annotation Type | 09_Dec_08 | 26_Mar_09 | Change | % Change |
Total Genes annotated (with at least one GO term of any kind): | 18083
|
18159*
|
76
|
0.42
|
Total Manual Annotation | ||||
Number of Genes | 10855
|
11045
|
190
|
1.75
|
Orthology: | 682
|
685
|
3
|
0.44
|
IEA Annotation | ||||
SwissProt to GO | 16043
|
16083
|
40
|
0.25
|
Interpro to GO | 10631
|
10536
|
95
|
0.89
|
EC to GO | 1510
|
1478
|
-32
|
-21
|
* 62% of current gene models |
Methods and strategies for annotation
- Literature curation:
Literature curation continues to be the major focus of our annotation efforts. We are currently exploring natural language processing tools to aid in identifying papers that are primarily focused on aspects of lung development, with the aid of Karen Dowel.
- Computational annotation strategies:
As always current strategies involve use of translation table to mine SwissProt Keywords and InterPro domains for IEA annotation. These are performed automatically on a nightly basis and require little human intervention.
- Priorities for annotation
- Genes assigned by Reference Genome Project (everyone)
- Isoform curation (Harold, Protein Ontology project)
- Genes with no GO annotation but with literature (Li and Dmitry)
- Genes identified as being important in lung development (Dmitry)
- Genes marked as having GO annotation completed, but now having new literature (Dmitry)
Presentations and Publications
a. Papers with substantial GO content
Feltrin E, Campanaro S, Diehl AD, Ehler E, Faulkner G, Fordham J, Gardin C, Harris M, Hill D, Knoell R, Laveder P, Mittempergher L, Nori A, Reggiani C, Sorrentino V, Volpe P, Zara I, Valle G, Deegan J Nee Clark, 2009, “Muscle Research and Gene Ontology: New standards for improved data integration,” BMC Medical Genomics, 2:6.
Masci AM, Arighi CN, Diehl AD, Lieberman AE, Mungall C, Scheuermann RH, Smith B, Cowell LG, 2009, “An improved ontological representation of dendritic cells as a paradigm for all cell types,” BMC Bioinformatics, 10:70.
Diehl AD, Deckhut Augustine A, Blake JA, Cowell LG, Gold ES, Gondré-Lewis TA, Masci AM, Meehan TF, Morel PA, NIAID Cell Ontology Working Group, Nijnik A, Peters B, Pulendran B, Scheuermann RH, Yao QA, Zand MS, Mungall CJ, 2009, “Hematopoietic Cell Types: Prototype for a Revised Cell Ontology,” submitted as conference paper to the International Conference on Biomedical Ontology, July 24-26, 2009, University of Buffalo, NY.
He Y, Cowell L, Diehl AD, Mobley H, Peters B, Ruttenberg A, Scheuermann RH, Brinkman R, Courtot M, Mungall C, Xiang Z, Chen F, Todd T, Colby L, Rush H, Whetzel T, Musen MA, Athey BD, Omenn GS, Smith B, 2009, “VO: Vaccine Ontology,” submitted as conference paper to the International Conference on Biomedical Ontology, July 24-26, 2009, University of Buffalo, NY.
Lovering RC, Camon EB, Blake JA, Diehl AD, 2008, “Access to immunology through the Gene Ontology,” Immunology, 125:154-60.
b. Presentations including Talks and Tutorials and Teaching
Harold gave presentation at 2nd Protein Ontology Meeting Annotation Jamboree (Nov 18-20, 2008) on how to annotate using the Gene Ontology
Other Highlights
A. Ontology Development Contributions:
- David Hill has worked on a team with Tanya Berardini, Chris Mungall and Jane Lomax to develop cross product links between the three GO namespaces.
- The lung development branch of Biological Process needs to be expanded to adequately annotate genes identified as being important in the development of lung cancers.
B. Annotation Outreach and User Advocacy Efforts:
Harold gave presentation at 2nd Protein Ontology Meeting Annotation Jamboree (Nov 18-20, 2008) on how to annotate using the Gene Ontology
The Gene Ontology section of MGI Gene Detail pages now mention when a gene as been selected for the Reference Genome Project; See link for example. http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=markerDetail&key=13516
C. Other Dmitry Sitnikov continues collaboration with Larry Hunter's group (Dr. Mike Bada, Amanda Howard) to establish a large, high-quality, corpora of full-text publications, expertly annotated with expressive knowledge representations, to improve the performance of a wide variety of biochemical text mining systems and to create new approaches to text mining. This involves systematically training and evaluating a broad sample of information extraction methods for key tasks, including concept and relationship identification. This effort can also be useful in synonym improvements to the GO between GO terms and certain biological concepts and terminology used in the literature.
Mary Dolan has been involved in a collaboration with Carol Bult at MGI on aligning gene ontology annotations for mouse genes assigned to MouseCyc pathways. See
ftp://ftp.informatics.jax.org/pub/curatorwork/MouseCyc_Graphs/index.html
U.Maine graduate student Karen Dowell and high school intern Daniel Hale are working on the use of the NCBO Open-Biomedical Annotator Web Service tools to rapidly identify papers on lung related topics that are most likely to contain data that can be annotated using the Gene Ontology. see http://obs.bioontology.org/oba/OBA_v1.1_rest.html for more details