Ontology Release Files Proposal

From GO Wiki
Jump to: navigation, search

Date: 2011-08-02

Background

Originally the GO was distributed as a single monolithic file, which was consumed by all users. Over time, the GO was released in a number of different formats (obo, obo-xml, simple rdf, owl-rdf/xml, mysql dumps), as well as different subsets, in order to accommodate different requirements of different users. At first these subsets were simple GO-slims - but as the GO advanced and started to include links between the 3 hierarchies, as well as links that did not fit into the original DAG-paradigm of the GO (e.g. has part), there was a need to release a "relationship subset" which excluded these.

The GO will soon start incorporating references to external ontologies (see Cross Products and PMID:20152934). These are essential for internal maintenance of the GO, and are strongly in-demand for the next generation of applications that are capable of making use of these. At the same time, we remain committed to the majority of users and databases who require simple subsets that retain a simple DAG-like ontology without dependencies on external ontologies.

We are taking this opportunity to reorganize how we distribute our ontology files, taking the opportunity to align this across the whole suite of OBO ontologies that are connected to the GO, and to encourage adoption of the OWL version of the GO. For more details see OBO Foundry ID Policy

Current Ontology Files

See http://geneontology.org/GO.downloads.ontology.shtml

File/URL structure:

 go/
     ontology/
         gene_ontology.obo
         gene_ontology_edit.obo
         editors/
             gene_ontology_write.obo
         obo_format_1_0/
             gene_ontology.1_0.obo
         obo_format_1_2/
             gene_ontology.1_2.obo
             gene_ontology_ext.obo

The OWL is not available in the current CVS structure or the main geneontology URL, only from:

http://archive.geneontology.org/latest-termdb/go_daily-termdb.owl.gz

This also uses the legacy translation to OWL

Note also that obof1_0 only differs from 1_2 in minor ways there is no reason to continue to provide the 1_0 translation

New Directory Layout

The new layout can be browsed on the prototype svn server here:

 go/
     ontology/
         gene_ontology.obo
         gene_ontology_edit.obo
         editors/
             gene_ontology_write.obo
         obo_format_1_0/
             gene_ontology.1_0.obo
         obo_format_1_2/
             gene_ontology.1_2.obo
             gene_ontology_ext.obo
        go.obo
        go.owl
        go-simple.obo
        go-simple.owl
        extensions/
            x-cell.{obo,owl}
            x-chemical.{obo,owl}
        subsets/
            gosubset_prok.obo
            gosubset_prok.owl
            goslim_generic.obo
            goslim_generic.owl

Key:

  • deprecated
  • new
  • keep

Formats: OBOF and OWL

All cuts of the ontology will be available in both OBO Format (OBOF) and OWL2. The OWL serialization will be OWL2 RDF/XML, although we may opt to also include other serializations, such as OWL2-XML, if there is demand.

Note that OBO Format is now officially a subset of OWL2. The semantics of the OBOF and OWL versions will at first be identical, but in future the OWL may contain additional axioms not expressible in OBOF. For this reason, we recommend that all new software and infrastructure consume the OWL version. Note that existing software can continue to use the OBOF files, but it will harder to evolve this software to take advantage of new features of the GO.

URLs

The different cuts and versions will be available from either of two base URLs:

The former would redirect to the latter

E.g.


Note that the PURLs are considered the formal ontology IRIs.

Ontologies

There will be at least two cuts available of the GO:

  • go
  • go-simple

In addition, there will be sub-ontologies for each subset; eg.

  • go/subsets/gosubset_prok

Combining these gives 4 URLs for each base URL (shown with redirects):

  1. http://purl.obolibrary.org/obo/go.obohttp://geneontology.org/release/go.obo
  2. http://purl.obolibrary.org/obo/go.owlhttp://geneontology.org/release/go.owl
  3. http://purl.obolibrary.org/obo/go/go-simple.obohttp://geneontology.org/release/go/go-simple.obo
  4. http://purl.obolibrary.org/obo/go/go-simple.owlhttp://geneontology.org/release/go/go-simple.owl

(in each case, the standard obolibrary URL can be substituted for the geneontology.org one - the former will redirect to the latter. Use of the obolibrary urls is encouraged)

We expect that the majority of people will consume either go-simple.obo or go.owl (the tooling and infrastructure associated with OWL is mature enough to handle the more advanced features, whereas the legacy tooling associated with obo tends to make outdated assumptions regarding the structure or content of the ontology).


go

The following will be included in go from the outset:

  • simple compositional logical definitions. See Category:Cross Products. In OWL these are equivalence axioms between a named class in the GO, and an intersection of class expressions.
  • Relationships between the 3 GO hierarchies. At first these include links based on the relations: part of, regulates, occurs in (BP to CC)
  • Relationships that introduce cycles into the graph structure of the GO
  • Disjoint from axioms between terms

In the future this ontology could potentially include:

  • Logical relationships and logical definitions that reference other OBO Foundry and OBO Foundry candidate ontologies
  • imports directives, either to entire ontologies, or minimal modules extracted from these ontologies
  • portions of other ontologies merged in to the main GO ontology - for example, following the MIREOT specification
  • [Taxon constraints]
  • Logical axioms that cannot be expressed in obo-format (and thus only available in the owl version)

It is recommended that the full go be consumed using the owl file (go.owl) and associated APIs (for example, the OWL API or Jena). Such software does not make incorrect assumptions about the structure of the ontology, and OWL reasoners are guaranteed to provide answers to queries that are valid.

go-simple

go-simple contains a subset of the information in go. Some parts of the ontology are filtered out to simplify things for legacy tools.

The following guarantees are made about go-simple:

  • The set of edges formed by links in the ontology will always form a DAG
  • The entire simple DAG structure can be obtained by following is_a tags and relationship tags (in the obof version)
  • The ontology will only contain terms (classes) from GO. It will not import other ontologies. It will not contain logical relationships to other ontologies.
  • The ontology will not include links between the 3 GO hierarchies

In other words, the structural characteristics will remain identical to what has been in the GO for the last 5 years. go-simple.obo corresponds to [[1]]

Subsets

Note that go-simple is essentially a subset - it includes the full set of terms (classes), but only a subset of the relationships and logical axioms.

In addition, we will make all the standard subsets or "slims" in GO available as subsets. These will be available in a "subsets" directory.

Example:

Versioning

The ontology version is stored in the data-version tag in OBOF. In OWL it is stored as the VersionInfo in the ontology header.

Currently the version number follows a numeric major-minor structure (e.g. 1.12345). We propose to switch to an ISO 8601 YYYY-MM-DD date structure, to be in sync with the rest of the OBO Library


Note that if we adopt Migration_of_GO_to_SVN_(Proposal) we can manage releases more easily in svn:

 go/
   trunk/
     release/
       go.obo
       ...
   branches/
   releases/
     2011-11-01
     2011-11-08
     ...

The versioned IRIs would look like:

* http://purl.obolibrary.org/obo/go/releases/YYYY-MM-DD/go.obo

Served from this directory:

Editors File

The editors file is likely to be extended to include additional artefacts required for the ontology development process. For example, axiom annotations of the form:

 is_a: GO:nnnnn {is_inferred="true"} ! ....

We will at first exclude these from all releases

Mapping to current structure

OBO-XML

Simple RDF

Relationship to release of annotations and GO database

Software

Software used to generate releases

In collaboration with other groups involved in the OBO Foundry initiative, we have developed The Oort, the OBO Ontology Release Tool. This is based on the java OWL API and the OBO Library obo to owl converter. The chief developers are Heiko Dietze and Shahid Manzoor.

The Oort runs as a command-line utility or as a desktop GUI.

More details can be found on the Oort wiki

See making a release pipeline

Recommendations for software developers using the ontology

In order to take full advantage of the GO, and to ensure future-proofing for the imminent future, we recommend consuming the OWL versions of the ontology, and using a standard OWL-level application programmer interface or tooling, such as Jena or the OWL API. The GO software team has experience of the OWL API, and can give support to software developers. In addition, there is a large community of bio-ontology software developers well-versed in the OWL API.

The GO software group, in collaboration with other groups, have developed a wrapper for the OWL API called OWLTools. This provides convenience methods for GO and other OBOF ontologies.

Other programmatic means of accessing the ontologies

This proposal aims primarily to address the organization and contents of ontology files available over the web.

We will continue to provide programmatic access to both the ontology and associated data through a variety of means. We will also continue to work with collaborators and 3rd party websites and web service providers to ensure optimal programmatic access to data. Examples of alternate means of access includes:

  • relational-level access, through database dumps, web-based SQL access via GOOSE, and programmatic APIs such as the GOLD Hibernate Layer
  • semantic web access, through SPARQL endpoints such as the obofoundry (http://sparql.obofoundry.org), neurocommons (http://sparql.obo.neurocommons.org/) and bio2rdf (http://bio2rdf.org/)
  • Web-services, including the AmiGO web services layer, NCBO web services, OLS web services, QuickGO web services
  • Access through BioMART, GO-InterMine and R libraries such as Bio-Conductor
  • Use of the OWL API and other ontology APIs

References