Ontology Development Progress Report December 2015

From GO Wiki
Jump to: navigation, search

Ontology Development Progress Report

Dec 2015

Prepared and Submitted by Melanie Courtot and David Hill on behalf of the ontology working group

Personnel

  • David Hill (MGI)
  • Melanie Courtot (EBI)
  • Tanya Berardini (TAIR)
  • Heiko Dietze (LBL)
  • Harold Drabkin (MGI)
  • Chris Mungall (LBL)
  • David Osumi-Sutherland (EBI)
  • Paola Roncaglia (EBI)

Ontology Editing

GitHub Requests

As of December 14th 2015. For 2015 only (created:>2015-01-01): 142 open, 467 closed.

Note: we imported tickets from the SourceForge tracker which was retired. For completeness, if we include all tickets that were ported to the GitHub tracker (starting January 2014): we have 250 open tickets and 1168 closed tickets.

Term Statistics

Total number of GO terms added Jan 2015 to Nov 1,2015: 1560
 Total number of GO terms added manually Jan 2015 to Nov. 1,2015: 310            
 Total number of GO terms added via TermGenie Jan 2015 to Nov 1,2015:  1250
Total number of GO terms obsoleted Jan 2015 to Nov 1,2015: obsolete 68 , merged 54

Software and infrastructure

TermGenie

    • We have continued to develop TermGenie, adding new features (e.g. Github support) and templates (added templates for anatomy using Uberon). TG now accounts for 80% (1362 of 1710) of new terms added to the GO.
    • Seven new templates have been added to the GO TermGenie instance and it has now 48 templates in total.
    • We have also implemented a TermGenie for the cell type ontology, and other ontologies.

These templates utilize both classes within GO and classes from external ontologies, shown in parentheses. PO = Plant Ontology, CL = Cell Ontology, CHEBI = ChEmicals of Biological Interest ontology, UBER= Uber anatomy ontology

Ontology Release Tooling

    • We have developed a new tool for managing the release process of complex ontologies such as the GO, incorporating reasoning, classification, validation, file format conversion. The tool is openly available from https://github.com/ontodev/robot/ and was demonstrated at the 2015 International Conference on Biomedical Ontologies (http://ceur-ws.org/Vol-1515/demo6.pdf ).

Ontology tracking and infrastructure

    • The GO has previously hosted all issue trackers on SourceForge since 2000. These are vital to the both the continued operation of the GO, and as a provenance-trail; providing a knowledge base of decisions made regarding the structure of the ontology. Due to ongoing issues at SourceForge, we migrated our trackers, including complete history to GitHub. We developed a framework that has since been used by multiple groups migrating from sourceforge to github (https://github.com/cmungall/gosf2github/).
    • We also took this opportunity to centralize multiple pieces of metadata in yaml on our github site, improving the efficiency of multiple operations within GO. This includes metadata on all GO editors and curators, that is used by the common annotation framework and TermGenie.

Integration with external ontologies

    • We have integrated the GO with external ontologies, including the cell-type ontology and Uberon.
    • We have also extended and improved the cell type ontology in multiple areas and are working with external groups to apply this for functional data annotation and interpretation (FANTOM5, ENCODE).

Molecular Function

    • A proposed major refactoring of molecular functioning was completed that groups high-level / intermediate MF classes by biological context. In support of these revisions, which currently is in progress, is the development of unit tests for the logical definitions of these terms. The code for testing improvements to MF axiomatisation are defining design patterns to use for constructing compound MF terms is in development. These include specification of the strategies and patterns for improving axiomatisation of simple MFs such as enzyme activities using Rhea & its mappings to ChEBI.

Major Projects

Using OWL

Ontology editors are routinely using the OWL version of GO to check for logical consistency in the ontology and to create terms with logical definitions. Ontology editors are now using Protege 5.0. OWL is also used as the underlying format for creating new terms via the TermGenie tool.


GO relations

The ontology group has been working in coordination with the annotation group to refine and clarify the use of relations in annotation extensions and to align those relations with those used in the ontology. These relations are also being used in the initial implementation of the common annotation framework.

This work included cleaning up gorel-edit.owl, integrating the relations (many were duplicates), correcting errors in the relations hierarchy, improving definitions and adding OWL axiomatisation to improve inference. A way to automatically update documentation of relations on the wiki when they are edited in the source file has been implemented.

Github Migration

The group has successfully migrated from Sourceforge to GitHub and now uses Github as the primary resource for non-TermGenie ontology requests.

A GitHub site was developed for documentation, term requests, and automated report and documentation generation. It replaces manually-edited pages on GO wiki. This was used to document the GO release pipeline and the use of relation metadata by GOA/QuickGO/Protein2GO. A Gene Ontology organization has been established to manage the different GO repositories (GO website, GO ontology, GO annotations). Training has been provided to external users (e.g., InterPro) requiring help with the switch.

Linking to external ontologies and groups

The ‘regulation of quality’ terms are now logically defined using OBA (Ontology of Biological Attributes).

We worked with ChEBI to identify critical GO dependencies, implement a ChEBI ontology build plan thereby improving GO’s integration with ChEBI.

We collaborated with F. Hoffmann-La Roche AG to deliver improvements of an existing in-house mapping to GO. A transferable and extendable mapping strategy has been devised and over 200 terms were added or revised in the ontology. Errors or ommissions in the GO have also been identified as part of this work and fixed.

We engaged with the Wikidata group to add GO terms and properties (in progress) This will allow us to link existing Wikidata entries to GO terms by defining appropriate properties. Wikidata is an an open, Semantic Web-compatible database tightly coupled to Wikipedia. It allows to link multiple datasets and exposes the result as RDF, amenable to querying across those datasets. Linkage to GO terms will also be shown in the Wikipedia infoboxes.

Worked with NCI Thesaurus editors and developers on a plan for swapping in Biological Process into their ontology.

Engaged with immuneXpresso developers to replace their custom-made list of cell functions with GO biological processes.


Improved term names and definitions

Over 100 complex term names that describe transcription processes have been converted to simpler names using new naming patterns. This improves term readability for users and gene annotators.

Over 100 textual definitions of terms describing membrane components have been modified as a result of gene annotator feedback. The terms are more practically relevant, and better reflect community usage.

Improved logical representations of terms

Design patterns were developed, documented and implemented for a number of areas of the ontology including patterns for: membrane regions; organelle membranes; envenomation; import and export across membranes.

Improved Biological Representation

Transport

We developed design patterns for transport processes with specification of start and end location and barriers transported across as well as the nature of the entity transported. Newly added OWL axioms allow inference of start and end location over part relations in the GO. 11 different TermGenie templates for transport processes are now available to users.

Membrane proteins

We developed design patterns for classes used to record the relationship of proteins and protein complexes to membranes (integral, anchored, peripheral etc).

Apoptosis

We revised the ontology and annotations in this area, in consultation with field experts, especially with regard to 'death' and 'caspase' terms. This project is nearly completed and involved significant ontology development as well as large-scale annotation efforts. Users can now benefit from thousands of new or revised apoptosis annotations, stemming from >300 newly added or revised ontology terms. These additions will prove very valuable to the research community to perform data analysis (e.g. enrichment analysis).

Cilia

Ciliary dysfunction is heavily involved in several human diseases and conditions. Known ciliopathies include polycystic kidney disease and several others, including some cases of early embryonic death and retinal degeneration (see PMID:16722803). This ontology development project is run in collaboration with the SYSCILIA Consortium (http://syscilia.org). We have completed our work in the cellular component branch of the ontology and are continuing to work on better representing biological processes that occur in cilia. We have started discussions with Reactome to that effect.

Autophagy

Expanding and improving the GO to better represent autophagy, a process that's well-conserved across species, and recently identified as involved in several pathophysiological processes relevant to human health. These include neurodegenerative processes such as Parkinson’s disease as well as cancer, metabolic disorders, and cardiovascular and pulmonary diseases (see PMID:23656658 for a review). Autophagy is also implicated in response to aging and to exercise.

Protein Complexes

We have standardized protein complex annotation based on consultation with experts from the IntAct resource to improve the ontology based on current scientific knowledge.

Ion Channels

Stemming from feedback from electrophysiologists, we worked with ontology editors and gene annotation curators to reach a consensus in representing selective vs. non-selective ion channels and implemented resulting proposal in ontology.

Metabolic Pathways

Using the existing model of glycolysis, we have extended the carbohydrate catabolic process modeling to include several types of glycolytic fermentation and additional carbohydrate catabolic processes up until the TCA cycle.

Synapse

A large project has been undertaken with a group of external experts to improve the representation of synapses and synapse function. The modification of synapse cellular components is almost complete and the modification of molecular function and biological process is ongoing. A model for signaling at the synapse is currently under discussion.

Gene annotation curators specializing in synapse-related annotations have been trained.

Extracellular vesicle

We have expanded and improved the GO to better represent extracellular vesicles, in response to the community need in a fast-growing field of research ensuring the GO is up to date, and knowledge can be captured by annotators.

Core Duties

Ontology editors continue to monitor TermGenie requests for new templates and non-templated terms, Github requests for ontology evaluation and GOHELP issues from external users.

Plans for coming year

  • Complete the refactoring of Molecular Function
  • Implement pattern-based Term generation system
  • Continued to work on integration between CL, UBERON, and other ontologies with GO
  • Provide support to the NCI for replacement of NCIt with GO
  • Provide ongoing support to curators

Publications