Talk:OBO-Edit: Versioning Proposal: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
(New page: == Deployed ontologies vs editors ontologies == What about ontologies such as SO and GO that have for-public versions and editors versions? In the case of SO, the semantic content of th...)
 
No edit summary
Line 1: Line 1:
== What are the motivations? ==
What problem are we serving here?
It is certainly the case that the public needs a better way of referring to versions of GO (particularly GO + GO annotations), and similarly for other ontologies.
However, these requirements may be better served by having a more formal release process for GO:
http://gocwiki.geneontology.org/index.php/Versioning_Proposal
I wrote this document a few months ago. It should really be called "Release proposal"
Your document addresses "revisions" - fine grained changes of interest to editors and annotators but not necessarily the general users. The public may be better served by releases. Compare with software and internal revisions (eg in version control like CVS) and releases / release versions
== Terminology: data-version, branch-version ==
I find branch-version confusing, as I think of CVS branches.
Also the OBI community use the term "branch" to refer to one of the orthogonal subclass hierarchies in OBI
Our terminology here should be the same as in the file format.
I am not so keen on data-version either. The contents of an ontology are not traditionally regarded as "data", unless we regard the byte-content as the entity that is being versioned. However, I think we want to version the semantic content, not the byte-content (see below).
In obof1.2 "version" is deprecated, and "data-version" is suggested instead. data-version is not specified, it is any syntax. No current ontologies use data-version. A few use version, and those that use version use it specifically for the CVS revision auto-tag.
I suggest we allow people to continue using "version" (or data-version) as they have been using it. Repurposing a field is always bad. We introduce an entirely de-novo header tag. I suggest ontology-version or namespace-version. This would have the strict syntax you propose.


== Deployed ontologies vs editors ontologies ==
== Deployed ontologies vs editors ontologies ==
Line 9: Line 36:


My feeling is that all these 'versions' or 'exports' of the ontology should have the same version. Thus we should strive to call these 'exports' (or something similar) so as not to be confused with true versions.
My feeling is that all these 'versions' or 'exports' of the ontology should have the same version. Thus we should strive to call these 'exports' (or something similar) so as not to be confused with true versions.
== Software requirements ==
In general I worry about introducing additional complexities to the simple cvs management system.
The server-side mechanism you suggest for GO would have to be smart enough to detect per-namespace changes. This would require at least running some java on the cvs server. We have to consider contingencies for when this fails (e.g. an edit introduces a subtle parse error).
perhaps we need to go beyond cvs/svn but I don't think we can dedicate resources to do this yet
It could all be done within OE. However, this means that once an ontology-version system is instigated for an ontology, all successive edits MUST be done using a version of OE that supports incrementing the namespace-version. This is fine, but it will delay introduction of the plan.
Note that in addition to the changes you mention, presumably the oboedit change operation model also needs extended?
Also there is another issue with client-side versioning
== Asynchronous edits ==
some ontologies are edited asynchronously by different individuals and checked in separately. There would need to be some kind of global version server. Not technically difficult, but adds large admin overhead, and has consequences for things like off-line editing. This will also hamper adoption for ontologies with few or no resources to spare.
== What is a change? ==
We need to work out the semantics of what constitutes a change. E.g. is adding a term to a subset a change?
== Coordinating with NCBO ==
NCBO will implement its own versioning system. Note that the NCBO will not contain every fine-grained edit; rather it will contain releases.

Revision as of 17:58, 29 November 2007

What are the motivations?

What problem are we serving here?

It is certainly the case that the public needs a better way of referring to versions of GO (particularly GO + GO annotations), and similarly for other ontologies.

However, these requirements may be better served by having a more formal release process for GO:

http://gocwiki.geneontology.org/index.php/Versioning_Proposal

I wrote this document a few months ago. It should really be called "Release proposal"

Your document addresses "revisions" - fine grained changes of interest to editors and annotators but not necessarily the general users. The public may be better served by releases. Compare with software and internal revisions (eg in version control like CVS) and releases / release versions

Terminology: data-version, branch-version

I find branch-version confusing, as I think of CVS branches.

Also the OBI community use the term "branch" to refer to one of the orthogonal subclass hierarchies in OBI

Our terminology here should be the same as in the file format.

I am not so keen on data-version either. The contents of an ontology are not traditionally regarded as "data", unless we regard the byte-content as the entity that is being versioned. However, I think we want to version the semantic content, not the byte-content (see below).

In obof1.2 "version" is deprecated, and "data-version" is suggested instead. data-version is not specified, it is any syntax. No current ontologies use data-version. A few use version, and those that use version use it specifically for the CVS revision auto-tag.

I suggest we allow people to continue using "version" (or data-version) as they have been using it. Repurposing a field is always bad. We introduce an entirely de-novo header tag. I suggest ontology-version or namespace-version. This would have the strict syntax you propose.

Deployed ontologies vs editors ontologies

What about ontologies such as SO and GO that have for-public versions and editors versions?

In the case of SO, the semantic content of the two are the same (so-xp and so). The latter has reasoner-impliable links realized.

In the case of the GO, the for public version may be mildly semantically impoverished (e.g. dropping of disjoint_from tags), but is still essentially the same ontology.

My feeling is that all these 'versions' or 'exports' of the ontology should have the same version. Thus we should strive to call these 'exports' (or something similar) so as not to be confused with true versions.

Software requirements

In general I worry about introducing additional complexities to the simple cvs management system.

The server-side mechanism you suggest for GO would have to be smart enough to detect per-namespace changes. This would require at least running some java on the cvs server. We have to consider contingencies for when this fails (e.g. an edit introduces a subtle parse error).

perhaps we need to go beyond cvs/svn but I don't think we can dedicate resources to do this yet

It could all be done within OE. However, this means that once an ontology-version system is instigated for an ontology, all successive edits MUST be done using a version of OE that supports incrementing the namespace-version. This is fine, but it will delay introduction of the plan.

Note that in addition to the changes you mention, presumably the oboedit change operation model also needs extended?

Also there is another issue with client-side versioning

Asynchronous edits

some ontologies are edited asynchronously by different individuals and checked in separately. There would need to be some kind of global version server. Not technically difficult, but adds large admin overhead, and has consequences for things like off-line editing. This will also hamper adoption for ontologies with few or no resources to spare.

What is a change?

We need to work out the semantics of what constitutes a change. E.g. is adding a term to a subset a change?

Coordinating with NCBO

NCBO will implement its own versioning system. Note that the NCBO will not contain every fine-grained edit; rather it will contain releases.