OBO: 1.3 Whiteboard

From GO Wiki
Revision as of 11:00, 3 October 2007 by Jrichter (talk | contribs) (General Specification Changes)

Jump to: navigation, search

This page is intended to be a scratch pad for OBO 1.3 features and proposals.

Changes to Relations

The following new Typedef tags should be supported in OBO 1.3...

  1. transitive_under
  2. always_implies_inverse

transitive_under means that p -transitive_under-> q and X -q-> Y and Y -p-> Z, then X -p-> Y. Note that every relation is automatically transitive_over and transitive_under IS_A

always_implies_inverse is a boolean property. If always_implies_inverse is true for a relation p, it means that if p -inverse_of-> q and X -p-> Y, then Y -q-> X. This property could be used to define an integral_part_of relationship in OBO_REL, for example:

   [Typedef]
   id: OBO_REL:part_of
   name: part of
   inverse_of: has_part

   [Typedef]
   id: OBO_REL:has_part
   name: has part

   [Typedef]
   id: OBO_REL:integral_part_of
   is_a: OBO_REL:part_of
   always_implies_inverse: true

Further, OBO 1.3 will allow classes to specify relationships to other terms OR to links between other terms. Link identifiers are specified in the following form:

 child_term_id -relation_id-> parent_term_id

This new feature may not be part of the main specification, but may need to be specified in an ancillary parser extension specification (see below).

General Specification Changes

The OBO 1.3 specification needs to discuss the concept of parser extensions. Parser extensions are optional addenda to the basic OBO 1.3 specification that provide additional features to the OBO language.

Parser extensions require that we add a new header tag to OBO files called requires_extension. The requires_extension tag should specify both an identifier for the required extension (so we need to figure out how we specify that) and a minimum version number for that extension.

Extensions

There are at least two new extensions to OBO 1.3:

Postcomp Extension

This extension allows specially formatted post-composition expressions to be substituted for most identifier references in an OBO file. The post-composition expressions have the following format:

genus_term_id^differentia_type_id(differentia_term_id) [^differentia_type_id(differentia_term_id)]*

Where any of the term ids may be replaced with another post-comp expression, and parenthesis can be used in postcomp expressions to show precedence.

Annotation Extension

This extension allows specifies a new, compact syntax for describing annotations as ontology instances. Out of laziness, I'll let an email I sent about it act as our temporary specification:

OBO annotations are an extension to the OBO file format that will
give us a succinct, but completely correct, way of representing
annotations as ontology instances.

The idea is that an annotation is an instance that posits some
relationship between other ontology objects. For example, someone
might annotate a gene to a Gene Ontology term in the following way:

	flybase_gene:300458382 -occurs_in-> endoplasmic_reticulum

If we extend OBO format to allow terms and instances to have
relationships TO OTHER RELATIONSHIPS (which will be supported in OBO
1.3), we could correctly model the statement above as an instance:

	[Instance]
	id: my_annotation:1
	instance_of: oban:annotation
	relationship: posits flybase_gene:300458382 -occurs_in->
endoplasmic_reticulum
	relationship: based_on_evidence pubmed:3039942

But this representation is cumbersome and difficult to understand.
The new annotation format introduces a new kind of stanza to
represent our annotation:

	[Annotation]
	id: my_annotation:1
	subject: flybase_gene:300458382
	relation: occurs_in
	object: endoplasmic_reticulum
	evidence: pubmed:3039942


Note that the OBO annotation format simply specifies a mapping
between these new annotation stanzas and instance stanzas. We're not
introducing any new OBO semantics - this is just syntactic sugar.

We're also extending the datamodel libraries in OBO-Edit to provide a
programming API that gives programmers access to the benefits of this
new syntax. For example, the datamodel contains a new Annotation
object that has getSubject(), getObject(), setSubject(), setObject(),
etc methods. The Annotation object is just an extension of the OBO-
Edit Instance object, so any calls to these new Annotation methods
are automatically mapped into calls to Instance methods.

I'm about to start working with Chris to produce a draft
specification for OBO 1.3, so this will be spelled out in much
greater detail then. I hope this brief introduction was useful -
please let me know if there are any details you'd like filled in.

The specifics of these Annotation stanzas are largely up in the air, but our current prototype supports the following tags:

  • subject
  • relationship
  • object
  • assigned_by
  • evidence
  • source

Of these, only subject, relationship and object are particularly well defined. For any Annotation with a subject, object and relationship specified, the mapping works like this:

! This annotation...
[Annotation]
id: <id>
subject: <subject_id>
relationship: <relationship_id>
object: <object_id>

!is equivalent to this instance...
[Instance]
id: <id>
instance_of: oban:annotation
relationship: oban:posits <subject_id> -<relationship_id>-> <object_id>

The other tags (assigned_by, evidence, and source) have no well-specified meaning (as far as I know) at this point.

You'll notice that the mapping relies on some pre-defined ontology objects. Those objects are defined in an ontology file that currently exists as a resource in the OBO-Edit source repository, but will probably be moved to the OBO foundry soon. The contents of oban.obo are reproduced below:

default-namespace: oban

[Term]
id: oban:annotation
name: Annotation

[Typedef]
id: oban:has_data_source
name: has data source
domain: oban:annotation

[Typedef]
id: oban:has_evidence
name: has evidence
domain: oban:annotation
range: oban:evidence

[Term]
id: oban:evidence
name: Evidence

[Typedef]
id: oban:posits
name: posits
domain: oban:annotation