OBO-Edit: Versioning Proposal

From GO Wiki
Revision as of 12:32, 30 June 2014 by Gail (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

OBO is a difficult format to version. Since data can be spread across multiple files, one cannot safely apply a file-scoped version number to an ontology. Since large ontologies (particularly ontologies with cross products) are GUARANTEED to exist in multiple files, version number conflicts between the files loaded in a session are likely. Many ontologies are maintained by multiple users, so version numbers cannot be permanently assigned until an ontology is "published" in a central repository.

This proposal suggests a way of solving these problems while introducing only a small amount of new software, and very little burden on the ontology user.

Concepts: Repositories, Branching, and Publishing

This proposal assumes that every ontology exists in some official repository, where it is available to the public. The act of adding an ontology to a repository is called publishing an ontology. An ontology is given a permanent, official version number only when it is published.

When a user edits an ontology, they first check it out of the repository. When an ontology is first checked out, it retains it's official version number. However, once the ontology is edited, it is immediately branched, and assigned a branch version number (see Branch Version Numbers below). Each time the ontology is edited, the branch version number is incremented to reflect the user's local changes to their copy of the ontology. When the branched ontology is finally published in the repository, the user's local changes are merged into the current official version of the ontology and assigned a new permanent version number.

The meaning of repository depends on the particular circumstances of an ontology. A repository could be a file repository of a versioning system like CVS or Subversion. A repository could be an SQL database. Or a repository could just be a location on a computer hard drive, in the case of a manually published ontology.

Version Numbers

Version numbers are attached to namespaces, not files. This will a change to the syntax of the OBO 1.3 data-version tag to allow the tag to accept two parameters: namespace and version number (the current one-parameter version could still be used to specify a version number for all namespaces without an explicit version number). Because a single namespace may have members defined in several files, it's possible that two files will specify different version numbers for a single namespace. If this occurs, the higher version number is used for the namespace (see version number ordering below).

Official Version Numbers

Official version numbers are assigned to published ontologies only, and take the form:

X.YYYY[.ZZZZ]*[-beta'BBBB']

The number of digits in each sub-section is up to the ontology maintainer. Leading zeroes are legal in any number.

This means that an official version number must contain at least a major version number and a minor version number, but may contain any number of optional sub-version numbers, and one final optional beta identifier. The following are legal version numbers:

  • 1.200-beta10
  • 1.2-beta10
  • 2.0
  • 1.00001
  • 1.00001.30.0044
  • 1.00001.30.0044-beta19

When an official version number is automatically incremented by software (for example, when an ontology is published), only the least-significant version number will be incremented. The software will preserve the number of digits in that version number if possible.

Branch Version Numbers

Branch version numbers are assigned as soon as an ontology with an official version number is edited. A branch version takes the form:

official_version_number'-branch_user_name_branch_version_number

The user name is a unique user name within the official ontology repository (for example a SQL database user id or a CVS user name).

To illustrate, imagine that user midori checks out a version of the Gene Ontology with version numbers 1.32001 (cellular_component namespace), 1.24330 (molecular_function namespace), and 1.48823 (biological_process namespace). Assuming she edits terms in molecular_function and biological_process (but not cellular component), the header of her saved file will contain the lines:

data-version: molecular_function 1.24330-branch_midori_1
data-version: biological_process 1.48823-branch_midori_1
data-version: cellular_component 1.32001

If midori does another set of edits that touch terms from cellular_component and biological_process (but not molecular_function) the version numbers will become:

data-version: molecular_function 1.24330-branch_midori_1
data-version: biological_process 1.48823-branch_midori_2
data-version: cellular_component 1.32001-branch_midori_1

When Midori finally publishes her changes (assuming no one else has published changes since she initially checked-out the ontology), the version numbers will become:

data-version: molecular_function 1.24331
data-version: biological_process 1.48824
data-version: cellular_component 1.32002

Version Number Ordering

Version numbers use the following ordering:

  • A.? > A
  • A.X.? > A.X
  • A.X-branch? > A.X
  • A.X > A.X-beta? (that's not a mistake - beta versions are lower than the same number without a beta qualifier)

Otherwise, the version number with the highest number in the most significant position is the larger version number. Therefore, 1.101 > 1.2 , but 1.200 > 1.101

Publishing

If possible, repositories should be designed to automatically increment version numbers when an ontology is committed to the repository (an operation I'll call autopublishing - see Server Side Software below). However, some simple ontologies may not have a configurable repository. These ontologies can be manually published using client software (like OBO-Edit). When an ontology is manually published, its version number is promoted to an official version number and saved to a local disk. It is then the user's responsibility to make the ontology available to the public.

Software Requirements

OBO-Edit Changes

Datamodel Modifications

OBO-Edit will need to add objects to represent, compare and increment these new version numbers. The org.obo.datamodel.Namespace class will need to be modified so that version numbers can be associated with OBO namespaces.

Parser Modifications

OBO-Edit will need to be modified to read/write the new version information, and automatically increment the current version number when the ontology is saved. The parsers will also need small modifications to enable autopublishing. I suggest that autopublishing be accomplished via a checkbox in the OBO save configuration panel.

Server Side Software

Every repository will have its own software requirements. However, I suggest we write some CVS triggers for the Gene Ontology that automatically update the version numbers and tag the cvs revision with the OBO version number when an ontology is committed. These CVS triggers should be made available to the general public for use on their own CVS servers.