Ontology Release Files Proposal
From GO Public
Originally the GO was distributed as a single monolithic file, which was consumed by all users. Over time, the GO was released in a number of different formats (obo, obo-xml, simple rdf, owl-rdf/xml, mysql dumps), as well as different subsets, in order to accommodate different requirements of different users. At first these subsets were simple GO-slims - but as the GO advanced and started to include links between the 3 hierarchies, as well as links that did not fit into the original DAG-paradigm of the GO (e.g. has part), there was a need to release a "relationship subset" which excluded these.
The GO will soon start incorporating references to external ontologies (see Cross Products and PMID:20152934). These are essential for internal maintenance of the GO, and are strongly in-demand for the next generation of applications that are capable of making use of these. At the same time, we remain committed to the majority of users and databases who require simple subsets that retain a simple DAG-like ontology without dependencies on external ontologies.
We are taking this opportunity to reorganize how we distribute our ontology files, taking the opportunity to align this across the whole suite of OBO ontologies that are connected to the GO, and to encourage adoption of the OWL version of the GO. For more details see OBO Foundry ID Policy
Current Ontology Files
go/ ontology/ gene_ontology.obo gene_ontology_edit.obo editors/ gene_ontology_write.obo obo_format_1_0/ gene_ontology.1_0.obo obo_format_1_2/ gene_ontology.1_2.obo gene_ontology_ext.obo
The OWL is not available in the current CVS structure or the main geneontology URL, only from:
New Directory Layout
The new layout can be browsed on the prototype svn server here:
go/ ontology/ gene_ontology.obo gene_ontology_edit.obo editors/ gene_ontology_write.obo obo_format_1_0/ gene_ontology.1_0.obo obo_format_1_2/ gene_ontology.1_2.obo gene_ontology_ext.obo release/ go.obo go.owl go-simple.obo go-simple.owl ext/ go-relations.obo go-relations.owl subsets/ gosubset_prok.obo gosubset_prok.owl goslim_generic.obo goslim_generic.owl
Formats: OBOF and OWL
All cuts of the ontology will be available in both OBO Format (OBOF) and OWL2. The OWL serialization will be OWL2 RDF/XML, although we may opt to also include other serializations, such as OWL2-XML, if there is demand.
Note that OBO Format is now officially a subset of OWL2. The semantics of the OBOF and OWL versions will at first be identical, but in future the OWL may contain additional axioms not expressible in OBOF. For this reason, we recommend that all new software and infrastructure consume the OWL version. Note that existing software can continue to use the OBOF files, but it will harder to evolve this software to take advantage of new features of the GO.
The different cuts and versions will be available from either of two base URLs:
The former would redirect to the latter
Note that the PURLs are considered the formal ontology IRIs.
There will be at least two cuts available of the GO:
In addition, there will be sub-ontologies for each subset; eg.
Combining these gives 4 URLs for each base URL (shown with redirects):
- http://purl.obolibrary.org/obo/go.obo ⇒ http://geneontology.org/release/go.obo
- http://purl.obolibrary.org/obo/go.owl ⇒ http://geneontology.org/release/go.owl
- http://purl.obolibrary.org/obo/go/go-simple.obo ⇒ http://geneontology.org/release/go/go-simple.obo
- http://purl.obolibrary.org/obo/go/go-simple.owl ⇒ http://geneontology.org/release/go/go-simple.owl
(in each case, the standard obolibrary URL can be substituted for the geneontology.org one - the former will redirect to the latter. Use of the obolibrary urls is encouraged)
We expect that the majority of people will consume either go-simple.obo or go.owl (the tooling and infrastructure associated with OWL is mature enough to handle the more advanced features, whereas the legacy tooling associated with obo tends to make outdated assumptions regarding the structure or content of the ontology).
The following will be included in go from the outset:
- simple compositional logical definitions. See Category:Cross Products. In OWL these are equivalence axioms between a named class in the GO, and an intersection of class expressions.
- Relationships between the 3 GO hierarchies. At first these include links based on the relations: part of, regulates, occurs in (BP to CC)
- Relationships that introduce cycles into the graph structure of the GO
- Disjoint from axioms between terms
In the future this ontology could potentially include:
- Logical relationships and logical definitions that reference other OBO Foundry and OBO Foundry candidate ontologies
- imports directives, either to entire ontologies, or minimal modules extracted from these ontologies
- portions of other ontologies merged in to the main GO ontology - for example, following the MIREOT specification
- [Taxon constraints]
- Logical axioms that cannot be expressed in obo-format (and thus only available in the owl version)
It is recommended that the full go be consumed using the owl file (go.owl) and associated APIs (for example, the OWL API or Jena). Such software does not make incorrect assumptions about the structure of the ontology, and OWL reasoners are guaranteed to provide answers to queries that are valid.
go-simple contains a subset of the information in go. Some parts of the ontology are filtered out to simplify things for legacy tools.
The following guarantees are made about go-simple:
- The set of edges formed by links in the ontology will always form a DAG
- The entire simple DAG structure can be obtained by following is_a tags and relationship tags (in the obof version)
- The ontology will only contain terms (classes) from GO. It will not import other ontologies. It will not contain logical relationships to other ontologies.
- The ontology will not include links between the 3 GO hierarchies
In other words, the structural characteristics will remain identical to what has been in the GO for the last 5 years. go-simple.obo corresponds to []
Note that go-simple is essentially a subset - it includes the full set of terms (classes), but only a subset of the relationships and logical axioms.
In addition, we will make all the standard subsets or "slims" in GO available as subsets. These will be available in a "subsets" directory.
The ontology version is stored in the data-version tag in OBOF. In OWL it is stored as the VersionInfo in the ontology header.
Currently the version number follows a numeric major-minor structure (e.g. 1.12345). We propose to switch to an ISO 8601 YYYY-MM-DD date structure, to be in sync with the rest of the OBO Library
Note that if we adopt Migration_of_GO_to_SVN_(Proposal) we can manage releases more easily in svn:
go/ trunk/ release/ go.obo ... branches/ releases/ 2011-11-01 2011-11-08 ...
The versioned IRIs would look like:
Served from this directory:
Mapping to current structure
Relationship to release of annotations and GO database
Software used to generate releases
In collaboration with other groups involved in the OBO Foundry initiative, we have developed The Oort, the OBO Ontology Release Tool. This is based on the java OWL API and the OBO Library obo to owl converter. The chief developers are Heiko Dietze and Shahid Manzoor.
The Oort runs as a command-line utility or as a desktop GUI.
More details can be found on the Oort wiki
Recommendations for software developers using the ontology
In order to take full advantage of the GO, and to ensure future-proofing for the imminent future, we recommend consuming the OWL versions of the ontology, and using a standard OWL-level application programmer interface or tooling, such as Jena or the OWL API. The GO software team has experience of the OWL API, and can give support to software developers. In addition, there is a large community of bio-ontology software developers well-versed in the OWL API.
The GO software group, in collaboration with other groups, have developed a wrapper for the OWL API called OWLTools. This provides convenience methods for GO and other OBOF ontologies.
Other programmatic means of accessing the ontologies
This proposal aims primarily to address the organization and contents of ontology files available over the web.
We will continue to provide programmatic access to both the ontology and associated data through a variety of means. We will also continue to work with collaborators and 3rd party websites and web service providers to ensure optimal programmatic access to data. Examples of alternate means of access includes:
- relational-level access, through database dumps, web-based SQL access via GOOSE, and programmatic APIs such as the GOLD Hibernate Layer
- semantic web access, through SPARQL endpoints such as the obofoundry (http://sparql.obofoundry.org), neurocommons (http://sparql.obo.neurocommons.org/) and bio2rdf (http://bio2rdf.org/)
- Web-services, including the AmiGO web services layer, NCBO web services, OLS web services, QuickGO web services
- Access through BioMART, GO-InterMine and R libraries such as Bio-Conductor
- Use of the OWL API and other ontology APIs
- PMID:12603063 - A methodology to migrate the gene ontology to a description logic environment using DAML+OIL
- PMID:20152934 - Cross-product extensions of the Gene Ontology.
- Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development
- OBO Format 1.4 guide
- OBO Format Formal Specification and mapping to OWL