Ontology Release Files Proposal: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 5: Line 5:
Originally the GO was distributed as a single monolithic file, which was consumed by all users. Over time, the GO was released in a number of different ''formats'' (obo, obo-xml, simple rdf, owl-rdf/xml, mysql dumps), as well as different ''subsets'', in order to accommodate different requirements of different users. At first these subsets were simple GO-slims - but as the GO advanced and started to include links between the 3 hierarchies, as well as links that did not fit into the original DAG-paradigm of the GO (e.g. [[has part]]), there was a need to release a "relationship subset" which excluded these.
Originally the GO was distributed as a single monolithic file, which was consumed by all users. Over time, the GO was released in a number of different ''formats'' (obo, obo-xml, simple rdf, owl-rdf/xml, mysql dumps), as well as different ''subsets'', in order to accommodate different requirements of different users. At first these subsets were simple GO-slims - but as the GO advanced and started to include links between the 3 hierarchies, as well as links that did not fit into the original DAG-paradigm of the GO (e.g. [[has part]]), there was a need to release a "relationship subset" which excluded these.


The GO will soon start incorporating references to external ontologies (see [[:Category:Cross Products|Cross Products]]). These are essential for internal maintenance of the GO, and are strongly in-demand for the next generation of applications that are capable of making use of these. At the same time, we remain committed to the majority of users and databases who require simple subsets that retain a simple DAG-like ontology without dependencies on external ontologies.
The GO will soon start incorporating references to external ontologies (see [[:Category:Cross Products|Cross Products]] and). These are essential for internal maintenance of the GO, and are strongly in-demand for the next generation of applications that are capable of making use of these. At the same time, we remain committed to the majority of users and databases who require simple subsets that retain a simple DAG-like ontology without dependencies on external ontologies.


We are taking this opportunity to reorganize how we distribute our ontology files, taking the opportunity to align this across the whole suite of OBO ontologies that are connected to the GO. For more details see [http://www.obofoundry.org/id-policy.shtml OBO Foundry ID Policy]]
We are taking this opportunity to reorganize how we distribute our ontology files, taking the opportunity to align this across the whole suite of OBO ontologies that are connected to the GO. For more details see [http://www.obofoundry.org/id-policy.shtml OBO Foundry ID Policy]


== Proposal ==
== Proposal ==

Revision as of 16:04, 2 August 2011

Date: 2011-08-02

Background

Originally the GO was distributed as a single monolithic file, which was consumed by all users. Over time, the GO was released in a number of different formats (obo, obo-xml, simple rdf, owl-rdf/xml, mysql dumps), as well as different subsets, in order to accommodate different requirements of different users. At first these subsets were simple GO-slims - but as the GO advanced and started to include links between the 3 hierarchies, as well as links that did not fit into the original DAG-paradigm of the GO (e.g. has part), there was a need to release a "relationship subset" which excluded these.

The GO will soon start incorporating references to external ontologies (see Cross Products and). These are essential for internal maintenance of the GO, and are strongly in-demand for the next generation of applications that are capable of making use of these. At the same time, we remain committed to the majority of users and databases who require simple subsets that retain a simple DAG-like ontology without dependencies on external ontologies.

We are taking this opportunity to reorganize how we distribute our ontology files, taking the opportunity to align this across the whole suite of OBO ontologies that are connected to the GO. For more details see OBO Foundry ID Policy

Proposal

Formats

All cuts of the ontology will be available in both OBO Format (OBOF) and OWL. The OWL serialization will be OWL2 RDF/XML, although we may opt to also include other serializations, such as OWL2-XML, if there is demand.

Note that OBO Format is now officially a subset of OWL2. The semantics of the OBOF and OWL versions will at first be identical, but in future the OWL may contain additional axioms not expressible in OBOF. For this reason, we recommend that all new software and infrastructure consume the OWL version. Note that existing software can continue to use the OBOF files, but it will harder to evolve this software to take advantage of new features of the GO.

URLs

The different cuts and versions will be available from either of two base URLs:

There will be at least two cuts available of the GO:

  • go
  • go-simple

Combining these gives 4 URLs for each base URL:

  1. http://purl.obolibrary.org/obo/go.obo
  2. http://purl.obolibrary.org/obo/go.owl
  3. http://purl.obolibrary.org/obo/go-simple.obo
  4. http://purl.obolibrary.org/obo/go-simple.owl

(in each case, the standard obolibrary URL can be substituted for the geneontology.org one - the former will redirect to the latter. Use of the obolibrary urls is encouraged)

We expect that the majority of people will consume either go-simple.obo or go.owl (the tooling and infrastructure associated with OWL is mature enough to handle the more advanced features, whereas the legacy tooling associated with obo tends to make outdated assumptions regarding the structure or content of the ontology).

go-simple

The following guarantees are made about go-simple:

  • The set of edges formed by links in the ontology will always form a DAG
  • The entire simple DAG structure can be obtained by following is_a tags and relationship tags (in the obof version)
  • The ontology will only contain terms (classes) from GO. It will not import other ontologies. It will not contain logical relationships to other ontologies.
  • The ontology will not include links between the 3 GO hierarchies

In other words, the structural characteristics will remain identical to what has been in the GO for the last 5 years. go-simple.obo corresponds to [[1]]

go

The following will be included in go from the outset:

  • simple compositional logical definitions. See Category:Cross Products. In OWL these are equivalence axioms between a named class in the GO, and an intersection of class expressions.
  • Relationships between the 3 GO hierarchies. At first these include links based on the relations: part of, regulates, occurs in (BP to CC)
  • Relationships that introduce cycles into the graph structure of the GO
  • Disjoint from axioms between terms

In the future this ontology could potentially include:

  • Logical relationships and logical definitions that reference other OBO Foundry and OBO Foundry candidate ontologies
  • imports directives, either to entire ontologies, or minimal modules extracted from these ontologies
  • portions of other ontologies merged in to the main GO ontology - for example, following the MIREOT specification
  • Taxon constraints
  • Logical axioms that cannot be expressed in obo-format (and thus only available in the owl version)

It is recommended that the full go be consumed using the owl file (go.owl) and associated APIs (for example, the OWL API or Jena). Such software does not make incorrect assumptions about the structure of the ontology, and OWL reasoners are guaranteed to provide answers to queries that are valid.

Mapping to current structure

OBO-XML

Simple RDF

Relationship to release of annotations and GO database

References