Talk:OBO-Edit: Versioning Proposal

From GO Wiki
Jump to: navigation, search

What are the motivations?

What problem are we serving here?

It is certainly the case that the public needs a better way of referring to versions of GO (particularly GO + GO annotations), and similarly for other ontologies.

However, these requirements may be better served by having a more formal release process for GO:

I wrote this document a few months ago. It should really be called "Release proposal"

Your document addresses "revisions" - fine grained changes of interest to editors and annotators but not necessarily the general users. The public may be better served by releases. Compare with software and internal revisions (eg in version control like CVS) and releases / release versions

From John: Absolutely. The main difficulty I'm trying to address is the continuing requests for OBO-Edit to "automatically handle versioning", as well as requests from the Phenote community to make version tracking possible at all. The advantage of this proposal is that it doesn't preclude a formal release process at all, but it still makes it possible to keep track of the internal, fine-grained version information that Phenote would like to have (for example, Phenote can't safely use ontology version information to figure out whether to update its local ontology file cache).

Most importantly, I'm trying to create a framework in which the idea of "automatically handling versioning" in OBO-Edit has some sort of stable meaning.

Terminology: data-version, branch-version

I find branch-version confusing, as I think of CVS branches.

Also the OBI community use the term "branch" to refer to one of the orthogonal subclass hierarchies in OBI

Our terminology here should be the same as in the file format.

I am not so keen on data-version either. The contents of an ontology are not traditionally regarded as "data", unless we regard the byte-content as the entity that is being versioned. However, I think we want to version the semantic content, not the byte-content (see below).

In obof1.2 "version" is deprecated, and "data-version" is suggested instead. data-version is not specified, it is any syntax. No current ontologies use data-version. A few use version, and those that use version use it specifically for the CVS revision auto-tag.

I suggest we allow people to continue using "version" (or data-version) as they have been using it. Repurposing a field is always bad. We introduce an entirely de-novo header tag. I suggest ontology-version or namespace-version. This would have the strict syntax you propose.

Deployed ontologies vs editors ontologies

What about ontologies such as SO and GO that have for-public versions and editors versions?

In the case of SO, the semantic content of the two are the same (so-xp and so). The latter has reasoner-impliable links realized.

In the case of the GO, the for public version may be mildly semantically impoverished (e.g. dropping of disjoint_from tags), but is still essentially the same ontology.

My feeling is that all these 'versions' or 'exports' of the ontology should have the same version. Thus we should strive to call these 'exports' (or something similar) so as not to be confused with true versions.

Software requirements

In general I worry about introducing additional complexities to the simple cvs management system.

The server-side mechanism you suggest for GO would have to be smart enough to detect per-namespace changes. This would require at least running some java on the cvs server. We have to consider contingencies for when this fails (e.g. an edit introduces a subtle parse error).

perhaps we need to go beyond cvs/svn but I don't think we can dedicate resources to do this yet

It could all be done within OE. However, this means that once an ontology-version system is instigated for an ontology, all successive edits MUST be done using a version of OE that supports incrementing the namespace-version. This is fine, but it will delay introduction of the plan.

Note that in addition to the changes you mention, presumably the oboedit change operation model also needs extended?

Also there is another issue with client-side versioning

Asynchronous edits

some ontologies are edited asynchronously by different individuals and checked in separately. There would need to be some kind of global version server. Not technically difficult, but adds large admin overhead, and has consequences for things like off-line editing. This will also hamper adoption for ontologies with few or no resources to spare.

What is a change?

We need to work out the semantics of what constitutes a change. E.g. is adding a term to a subset a change?

Coordinating with NCBO

NCBO will implement its own versioning system. Note that the NCBO will not contain every fine-grained edit; rather it will contain releases.