Ontology Development Progress Report December 2015
Ontology Development Progress Report
Prepared and Submitted by Melanie Courtot and David Hill on behalf of the ontology working group
- David Hill (MGI)
- Melanie Courtot (EBI)
- Tanya Berardini (TAIR)
- Heiko Dietze (LBL)
- Harold Drabkin (MGI)
- Chris Mungall (LBL)
- David Osumi-Sutherland (EBI)
- Paola Roncaglia (EBI)
As of December 14th 2015. For 2015 only (created:>2015-01-01): 142 open, 467 closed.
Note: we imported tickets from the SourceForge tracker which was retired. For completeness, if we include all tickets that were ported to the GitHub tracker (starting January 2014): we have 250 open tickets and 1168 closed tickets.
Total number of GO terms added Jan 2015 to Nov 1,2015: 1560
Total number of GO terms added manually Jan 2015 to Nov. 1,2015: 310 Total number of GO terms added via TermGenie Jan 2015 to Nov 1,2015: 1250
Total number of GO terms obsoleted Jan 2015 to Nov 1,2015: obsolete 68 , merged 54
Software and infrastructure
- We have continued to develop TermGenie, adding new features (e.g. Github support) and templates (added templates for anatomy using Uberon). TG now accounts for 80% (1362 of 1710) of new terms added to the GO.
- Seven new templates have been added to the GO TermGenie instance and it has now 48 templates in total.
- We have also implemented a TermGenie for the cell type ontology, and other ontologies.
These templates utilize both classes within GO and classes from external ontologies, shown in parentheses. PO = Plant Ontology, CL = Cell Ontology, CHEBI = ChEmicals of Biological Interest ontology, UBER= Uber anatomy ontology
Ontology Release Tooling
- We have developed a new tool for managing the release process of complex ontologies such as the GO, incorporating reasoning, classification, validation, file format conversion. The tool is openly available from https://github.com/ontodev/robot/ and was demonstrated at the 2015 International Conference on Biomedical Ontologies (http://ceur-ws.org/Vol-1515/demo6.pdf ).
Ontology tracking and infrastructure
- The GO has previously hosted all issue trackers on SourceForge since 2000. These are vital to the both the continued operation of the GO, and as a provenance-trail; providing a knowledge base of decisions made regarding the structure of the ontology. Due to ongoing issues at SourceForge, we migrated our trackers, including complete history to GitHub. We developed a framework that has since been used by multiple groups migrating from sourceforge to github (https://github.com/cmungall/gosf2github/).
- We also took this opportunity to centralize multiple pieces of metadata in yaml on our github site, improving the efficiency of multiple operations within GO. This includes metadata on all GO editors and curators, that is used by the common annotation framework and TermGenie.
Integration with external ontologies
- We have integrated the GO with external ontologies, including the cell-type ontology and Uberon.
- We have also extended and improved the cell type ontology in multiple areas and are working with external groups to apply this for functional data annotation and interpretation (FANTOM5, ENCODE).
- A proposed major refactoring of molecular functioning was completed that groups high-level / intermediate MF classes by biological context. In support of these revisions, which currently is in progress, is the development of unit tests for the logical definitions of these terms. The code for testing improvements to MF axiomatisation are defining design patterns to use for constructing compound MF terms is in development. These include specification of the strategies and patterns for improving axiomatisation of simple MFs such as enzyme activities using Rhea & its mappings to ChEBI.
Ontology editors are routinely using the OWL version of GO to check for logical consistency in the ontology and to create terms with logical definitions. Ontology editors are now using Protege 5.0. OWL is also used as the underlying format for creating new terms via the TermGenie tool.
The ontology group has been working in coordination with the annotation group to refine and clarify the use of relations in annotation extensions and to align those relations with those used in the ontology. These relations are also being used in the initial implementation of the common annotation framework.
This work included cleaning up gorel-edit.owl, integrating the relations (many were duplicates), correcting errors in the relations hierarchy, improving definitions and adding OWL axiomatisation to improve inference. A way to automatically update documentation of relations on the wiki when they are edited in the source file has been implemented.
The group has successfully migrated from Sourceforge to GitHub and now uses Github as the primary resource for non-TermGenie ontology requests.
A GitHub site was developed for documentation, term requests, and automated report and documentation generation. It replaces manually-edited pages on GO wiki. This was used to document the GO release pipeline and the use of relation metadata by GOA/QuickGO/Protein2GO. A Gene Ontology organization has been established to manage the different GO repositories (GO website, GO ontology, GO annotations). Training has been provided to external users (e.g., InterPro) requiring help with the switch.
Linking to external ontologies and groups
The ‘regulation of quality’ terms are now logically defined using OBA (Ontology of Biological Attributes).
We worked with ChEBI to identify critical GO dependencies, implement a ChEBI ontology build plan thereby improving GO’s integration with ChEBI.
We collaborated with F. Hoffmann-La Roche AG to deliver improvements of an existing in-house mapping to GO. A transferable and extendable mapping strategy has been devised and over 200 terms were added or revised in the ontology. Errors or ommissions in the GO have also been identified as part of this work and fixed.
We engaged with the Wikidata group to add GO terms and properties (in progress) This will allow us to link existing Wikidata entries to GO terms by defining appropriate properties. Wikidata is an an open, Semantic Web-compatible database tightly coupled to Wikipedia. It allows to link multiple datasets and exposes the result as RDF, amenable to querying across those datasets. Linkage to GO terms will also be shown in the Wikipedia infoboxes.
Worked with NCI Thesaurus editors and developers on a plan for swapping in Biological Process into their ontology.
Engaged with immuneXpresso developers to replace their custom-made list of cell functions with GO biological processes.
Improved term names and definitions
Over 100 complex term names that describe transcription processes have been converted to simpler names using new naming patterns. This improves term readability for users and gene annotators.
Over 100 textual definitions of terms describing membrane components have been modified as a result of gene annotator feedback. The terms are more practically relevant, and better reflect community usage.
Improved logical representations of terms
Design patterns were developed, documented and implemented for a number of areas of the ontology including patterns for: membrane regions; organelle membranes; envenomation; import and export across membranes.
Improved Biological Representation
We developed design patterns for transport processes with specification of start and end location and barriers transported across as well as the nature of the entity transported. Newly added OWL axioms allow inference of start and end location over part relations in the GO. 11 different TermGenie templates for transport processes are now available to users.
We developed design patterns for classes used to record the relationship of proteins and protein complexes to membranes (integral, anchored, peripheral etc).
We revised the ontology and annotations in this area, in consultation with field experts, especially with regard to 'death' and 'caspase' terms. This project is nearly completed and involved significant ontology development as well as large-scale annotation efforts. Users can now benefit from thousands of new or revised apoptosis annotations, stemming from >300 newly added or revised ontology terms. These additions will prove very valuable to the research community to perform data analysis (e.g. enrichment analysis).
Ciliary dysfunction is heavily involved in several human diseases and conditions. Known ciliopathies include polycystic kidney disease and several others, including some cases of early embryonic death and retinal degeneration (see PMID:16722803). This ontology development project is run in collaboration with the SYSCILIA Consortium (http://syscilia.org). We have completed our work in the cellular component branch of the ontology and are continuing to work on better representing biological processes that occur in cilia. We have started discussions with Reactome to that effect.
Expanding and improving the GO to better represent autophagy, a process that's well-conserved across species, and recently identified as involved in several pathophysiological processes relevant to human health. These include neurodegenerative processes such as Parkinson’s disease as well as cancer, metabolic disorders, and cardiovascular and pulmonary diseases (see PMID:23656658 for a review). Autophagy is also implicated in response to aging and to exercise.
We have standardized protein complex annotation based on consultation with experts from the IntAct resource to improve the ontology based on current scientific knowledge.
Stemming from feedback from electrophysiologists, we worked with ontology editors and gene annotation curators to reach a consensus in representing selective vs. non-selective ion channels and implemented resulting proposal in ontology.
Using the existing model of glycolysis, we have extended the carbohydrate catabolic process modeling to include several types of glycolytic fermentation and additional carbohydrate catabolic processes up until the TCA cycle.
A large project has been undertaken with a group of external experts to improve the representation of synapses and synapse function. The modification of synapse cellular components is almost complete and the modification of molecular function and biological process is ongoing. A model for signaling at the synapse is currently under discussion.
Gene annotation curators specializing in synapse-related annotations have been trained.
We have expanded and improved the GO to better represent extracellular vesicles, in response to the community need in a fast-growing field of research ensuring the GO is up to date, and knowledge can be captured by annotators.
Ontology editors continue to monitor TermGenie requests for new templates and non-templated terms, Github requests for ontology evaluation and GOHELP issues from external users.
Plans for coming year
- Complete the refactoring of Molecular Function
- Implement pattern-based Term generation system
- Continued to work on integration between CL, UBERON, and other ontologies with GO
- Provide support to the NCI for replacement of NCIt with GO
- Provide ongoing support to curators