OBO: 1.3 Whiteboard

From GO Wiki
Jump to: navigation, search

This page is out of date

Please see: http://www.geneontology.org/GO.format.obo-1_3.shtml


This page is intended to be a scratch pad for OBO 1.3 features and proposals.


Relation Composition

John suggested transitive_under:

transitive_under means that p -transitive_under-> q and X -q-> Y and Y -p-> Z, then X -p-> Y. Note that every all-some relation is automatically transitive_over and transitive_under IS_A

We decided against an additional tag

instead we will have a more general relation composition operator.

 [Typedef]
 id: R
 holds_over_chain: R1 R2 

Semantics:

R holds_over_chain R1 R2 & X R1 Y & Y R2 Z => X R Z

transitive_over (in obof1.2 and used in GO) then becomes a specialization of this.

Only binary compositions allowed: however, any chain can be constructed from compositions. Should be equivalent to OWL2 expressivity

Do we also allow

R = R1 o R2

?

Status: implemented in OE2 parser and reasoner. Part of obof1.3 spec.

inverses

always_implies_inverse is a boolean property. If always_implies_inverse is true for a relation p, it means that if p -inverse_of-> q and X -p-> Y, then Y -q-> X. This property could be used to define an integral_part_of relationship in OBO_REL, for example:

   [Typedef]
   id: OBO_REL:part_of
   name: part of
   inverse_of: has_part

   [Typedef]
   id: OBO_REL:has_part
   name: has part

   [Typedef]
   id: OBO_REL:integral_part_of
   is_a: OBO_REL:part_of
   always_implies_inverse: true

STATUS: review after instance/class relation issue review

Link IDs

Further, OBO 1.3 will allow classes to specify relationships to other terms OR to links between other terms. Link identifiers are specified in the following form:

 child_term_id -relation_id-> parent_term_id

This new feature may not be part of the main specification, but may need to be specified in an ancillary parser extension specification (see below).

STATUS: the main requirements for this are satisfied by annotation stanzas

Parser Extensions

The OBO 1.3 specification needs to discuss the concept of parser extensions. Parser extensions are optional addenda to the basic OBO 1.3 specification that provide additional features to the OBO language.

Parser extensions require that we add a new header tag to OBO files called requires_extension. The requires_extension tag should specify both an identifier for the required extension (so we need to figure out how we specify that) and a minimum version number for that extension.

Time

OBO 1.3 will have some way of dealing with time-dependant relationships. Perhaps Chris can shed some light on how these will work?

Status: n-ary relations are allowed

Extensions

There are at least two new extensions to OBO 1.3:

Postcomp Extension

This extension allows specially formatted post-composition expressions to be substituted for most identifier references in an OBO file. The post-composition expressions have the following format:

genus_term_id^differentia_type_id(differentia_term_id) [^differentia_type_id(differentia_term_id)]*

Where any of the term ids may be replaced with another post-comp expression, and parenthesis must be used in postcomp expressions to show precedence.

STATUS: implemented in OE, and part of obof1.3 spec

Annotation Extension

This extension allows specifies a new, compact syntax for describing annotations as ontology instances. Out of laziness, I'll let an email I sent about it act as our temporary specification:

OBO annotations are an extension to the OBO file format that will
give us a succinct, but completely correct, way of representing
annotations as ontology instances.

The idea is that an annotation is an instance that posits some
relationship between other ontology objects. For example, someone
might annotate a gene to a Gene Ontology term in the following way:

	flybase_gene:300458382 -occurs_in-> endoplasmic_reticulum

If we extend OBO format to allow terms and instances to have
relationships TO OTHER RELATIONSHIPS (which will be supported in OBO
1.3), we could correctly model the statement above as an instance:

	[Instance]
	id: my_annotation:1
	instance_of: oban:annotation
	relationship: posits flybase_gene:300458382 -occurs_in->
endoplasmic_reticulum
	relationship: based_on_evidence pubmed:3039942

But this representation is cumbersome and difficult to understand.
The new annotation format introduces a new kind of stanza to
represent our annotation:

	[Annotation]
	id: my_annotation:1
	subject: flybase_gene:300458382
	relation: occurs_in
	object: endoplasmic_reticulum
	evidence: pubmed:3039942

Note that the OBO annotation format simply specifies a mapping
between these new annotation stanzas and instance stanzas. We're not
introducing any new OBO semantics - this is just syntactic sugar.

We're also extending the datamodel libraries in OBO-Edit to provide a
programming API that gives programmers access to the benefits of this
new syntax. For example, the datamodel contains a new Annotation
object that has getSubject(), getObject(), setSubject(), setObject(),
etc methods. The Annotation object is just an extension of the OBO-
Edit Instance object, so any calls to these new Annotation methods
are automatically mapped into calls to Instance methods.

I'm about to start working with Chris to produce a draft
specification for OBO 1.3, so this will be spelled out in much
greater detail then. I hope this brief introduction was useful -
please let me know if there are any details you'd like filled in.

The specifics of these Annotation stanzas are largely up in the air, but our current prototype supports the following tags:

  • subject
  • relationship
  • object
  • assigned_by
  • evidence
  • source
  • is_negated

Of these, only subject, relationship and object are particularly well defined. For any Annotation with a subject, object and relationship specified, the mapping works like this:

! This annotation...
[Annotation]
id: <id>
subject: <subject_id>
relationship: <relationship_id>
object: <object_id>

!is equivalent to this instance...
[Instance]
id: <id>
instance_of: oban:annotation
relationship: oban:posits <subject_id> -<relationship_id>-> <object_id>

The other tags (assigned_by, evidence, and source) have no well-specified meaning (as far as I know) at this point.

You'll notice that the mapping relies on some pre-defined ontology objects. Those objects are defined in an ontology file that currently exists as a resource in the OBO-Edit source repository, but will probably be moved to the OBO foundry soon. The contents of oban.obo are reproduced below:

default-namespace: oban

[Term]
id: oban:annotation
name: Annotation

[Typedef]
id: oban:has_data_source
name: has data source
domain: oban:annotation

[Typedef]
id: oban:has_evidence
name: has evidence
domain: oban:annotation
range: oban:evidence

[Term]
id: oban:evidence
name: Evidence

[Typedef]
id: oban:posits
name: posits
domain: oban:annotation

Status: implemented in OE2, part of obof1.3 spec

http://geneontology.svn.sourceforge.net/viewvc/geneontology/java/obo/trunk/src/org/obo/annotation

(used in Phenote)

See also

http://www.berkeleybop.org/obd

Formal semantics of annotations

TBD. Non-trivial. See

Schulz S, Jansen L 2006 "Lmo-2 interacts with elf-2" On the Meaning of Common Statements in Biomedical Literature, O. Bodenreider, ed., Proceedings of KR-MED, 37-45.

OWL2 may help: e.g. class-class annotations would be treated as subclass-of-retsriction axioms, and these could be annotated using special axioms. See: http://www.w3.org/2007/OWL/wiki/Annotation_System

Status: specified in IKL in obof1.3

Additional requirements

the following was moved from the bbop wiki:

THIS DOCUMENT SHOULD BE CONSIDERED PRE-ALPHA

The GO Consortium released obo-format 1.2 in 2006, and moved the primary curator's version of the ontology from 1.0 to 1.2. Documentation here:

The 1.2 format was aimed to correct some deficiencies in 1.0, and also to provide support for a subset of OWL-DL required for OBO ontologies, particularly the ability to state both necessary and sufficient conditions for a class (as genus-differentia definitions).

The mapping to OWL is described here:

Since freezing the format, a few new use cases have come to light that will require minor incremental additions to the format. In addition, new use cases outside the GO are better suited by a more formal specification of the OBO format. This document describes some of these use cases and proposals for incremental additions to the format.

Compatibility

All future extensions will be backwards compatible with obof1.2

Partitioning

The obo-format shall be arranged into partitions of increasing complexity; thus we will be able to present a simple core to the majority of users and keep the more advanced features options. The simplest core will be the set of tags that were only present in 1.0; next will be the OWL compliance tags in 1.2; after that some of the stuff below

Syntax

Create a BNF syntax for OBO-Format. This has been started on by Ian Horrocks - though we should separate the syntax from the owl semantics (which we keep as optional rather than the default semantics)

Ideally there should be a limited set of syntax patterns for tags that will make it easy to add new complex tags in an extensible way (there will still be a need for new tags in future versions as ontology metadata standards coalesce - we can always use property_values but this can get ugly with non-atomic tags, eg those that take multiple values, IDs)

Status: part of spec

Datatypes in property_values

  • Extend range of xsd datatypes to include xsd:anyURI

Status: all xsd types allowed

Allowing property_value in classes (Term stanzas)

obo-format currently strikes a good balance between flexibility and standardisation when it comes to term metadata (synonyms, comments, text definitions, obsoletion replacements, subsets). However, there have been requests for new metadata types that will require addition of new tags to the format. This will lead to format churn - undesirable.

We want to be flexible in what metadata we attach to a term; eg

  1. modification time
  2. status
  3. credit to external contributor

(See OBI metadata initiative: RU)

Some of these may justify a new tag in the format - but a more flexible approach is to allow extension with arbitrary property_values

It is important to realise that these apply to the term, and not the instances of the term, and thus they are NOT inherited

property_values may be derived from an external ontology of metadata tags. It a requirement that loading of this external ontology is NOT forced.

We may define a separate ontology of such metadata tags that are considered "builtin"

Some ontology maintainers may wish to use properties from dublin core, skos, OBI, etc. We should also build in some idspaces for these, just as we consider the xsd idspace builtin.

STATUS: implemented in OE2, part of 1.3 spec

Ontology Metadata

metadata specification in 1.2 is fairly weak. We want a larger set of predefined tags (for instance, predefining the tag for ontology versioning), plus a means of specifying the meaning of user-defined tags (eg via Typedefs/AnnotationProperties)

Slims/subsets/views

  • Can we have subsets that span ontologies?
  • Subset membership is currently extensional; can we allow intensional (views)?

EXAMPLE: currently you have to list explicitly all set members; but we could allow a subset membership to be based on an obo filter query. (see next item)

Currently all species subsets are done manually; if we explicitly link to taxonomy IDs we can then make queries to fetch terms applicable to a species and use these to make the subset

Queries, obo-filters

Can we save obo filters? Need a BNF for filter syntax.

Use case:

  1. filter to specify an intensional slim

Change logs

including: reason for change; see: Versioning_Ceusters

- changes in reality (unlikely for GO?) - changes in scientific understanding - reassessment of relevance - mistakes

clarification of import statements

this is a little unclear in obof1.2

Also:

  • means of indicating alternate repositories
  • means of indicating alternatives for alternate file formats
  • ways of importing subsets based on some query

New tags:

  • import_subset
  • import_query

Disjoint Sets

An ontology is PD (Pairwise disjoint) if all the is_a children of a term are disjoint. Explicitly declaring this with pairwise disjoint links is not scalable, even with some kind of GUI 'wizard' to automatically populate these. For a class with N children, there are (N*(N-1))/2 combinations.

Instead we want a way of declaring an entire class to only have disjoint children

See how OWL1.1 does this....

Perhaps a "disjoint_union" tag?

  • direct_subclasses_are_disjoint
  • all_subclasses_are_disjoint

Identity and Unique ID Assumption

The UNA makes sense for most OBO users, but not for the wider semantic web and OWL. Have some way of declaring that a collection of classes/ontologies follow the UNA (ie no two distinct primary IDs refer to the same type/universal in reality)

Introduce an identity tag (either first class or in RO) for OWL compatibility

 [Term]
 id: GO:123456
 name: cell
 identical_to: CL:123456 ! cell

Note: this will probably be done using an xref idspace declaration in the header; eg


 treat-xrefs-as-equivalent: CL

...

 [Term]
 id: GO:123456
 name: cell
 xref: CL:123456 ! cell

See http://www.obofoundry.org/wiki/index.php/Mappings#Equivalence_Mappings

Status: UNA holds within an obo namespace unless explicitly relaxed. See spec.

Common Logic Semantics

There will be a specification of obo semantics in Common Logic. In addition, it will be possible to embed CL axioms in any obo file.

Rationale: Clarification of semantics & advanced applications Affects GO: Not directly

This is a powerful extension, but most applications are free to ignore it.

Status: 1.3 still specified as KIF, but trivial syntactic transform to CL

Class associations

See Waclaw's paper

Clarification of existential vs universal

Both in the definition of the relation:

 [Typedef]
 id: part_of
 quantifier: all-some

And in the link itself:

 [Term]
 id: x
 restriction: foo y {quantifier=all-some}

The latter is required only for compatibility with OWL. Link quantification should not override quantification at the relation level

  • TODO: how does time fit into this. Do we want all-some-all-times? Or is it fine to specify this externally

Status: clarified in obof1.3. Distinct IDs for type and instance level relations

Time-indexed instance level relations

OBO_REL defines class-level relations in terms of instance level relations that are time-indexed. These definitions are embedded in formal but non-mahcine readable text. The definitions are actually in conflict with the obo2owl semantics of obo-format, since in OWL relations are binary

Here is what a sample database represented using obo could look like:

 [Instance]
 id: patient12345
 instance_of: NCBITax:9606   ! Homo sapiens
 property_value: has_part tumor9876 {at=t23}
 
 [Instance]
 id: tumor9876
 instance_of: MPATH:223   ! tumor
 
 [Instance]
 id: t23
 instance_of: bfo:IntervalSpaceTimeRegion
 

This would have a translation to FOL/KIF/CL/OboLog:

 (part_of patient12345 tumor9876 t23)

Instantiation can also be time-indexed

 [Instance]
 id: cell_0001
 instance_of: CL:stem_cell {at=t1}
 instance_of: PATH:cancerous_cell {at=t2}
 [Instance]
 id: t1
 instance_of: bfo:IntervalSpaceTimeRegion
 relationship: precedes t2

Which would be translated as:

 (instance_of cell_0001 CL_stem_cell t1)
 (instance_of cell_0001 PATH_cancerous_cell t2)
 (precedes t1 t2)

Translation to OWL would be via CL. Different translations may be desirable depending on the application. Translations possible include:

  1. time-slicing (introducing new instances to represent temporal parts)
  2. treating indexed instance level links as instances of a relation

class

  1. something external to OWL such as named graphs
  2. filtering out indexed relations (resulting in a database that is not

complete, but still not wrong under the open world assumption)

Note that this is a powerful extension of obof1.2. However, most uses will only be to index at the instance level. Class-level relations may remain binary (this may be a recommendation or even constraint) which means that applications which consume ontologies only should be unaffected

Exact details will be provided when the CL spec is provided

Applications:

  1. referent tracking
  2. phenotype annotation

Some further thoughts: time should be a builtin argument for instance-instance relations and instance-universal relations.

Status: see below

N-ary relations

With time taken care of, there is less need for N-ary relations. If required, some kind of variant of the above could be used

Use cases:

  • Time (taken care of elsewhere - time should be explicit in obof that an arbitrary extra argument)
  • "attributed" relations in FMA
  • participation roles (eg X participates in Y in role R - eg chemical in a process as a catalyst - handled now by sub-relations of has_participant)
  • relational qualities (handled now with a 'towards' relation).

e.g.

  (has_qualiy eye_inst001 sensitivity_inst001 EnvOnt:RedLight)

Status: allowed in 1.3, implemented in OE


Translation to OWL2

The current oboInOwl mapping uses the n-ary relation pattern for terminological metadata, which is ugly and problematic for most tools

OWL2 gives us axiom annotation, which is a superior way of doing this

Irreflexive Relations

Relation attributes: class vs instance level

Current status in obof1.2:

  • inverse_of: instance-level (eg part_of/has_part; but NOT integral_part_of)
  • cyclic: ???
  • is_{transitive,symmetric,anti_symmetric}: class-level

We want to be able to discriminate between these. Eg integral_part_of is an class-level relation ONLY. We want to state its class level inverse has_integral_part.

cyclicity may differ on instance and class levels; eg the precedes relation in cyclic processes can cycle at the type level but not at the instance level

What syntax?

We could introduce trailing modifiers here; eg {applies_to="instance"}

it is probably cleanest if we use new tags.

Direct support for RDF-style Reification

See above

Tracking link status

Use case: reasoner infers links from source ontology; curator approves links; source ontology changes. previously approved links should be flagged (or automatically re-mapped)

 [Term]
 id: x
 relationship:
  • Status: not needed

Complements / negation

for OWL roundtripping - does not need to be in subset of obo that is used by reasoner

Resolved Items for OBO 1.3 Format

  • Add an optional [Namespace] stanza. This stanza would contain ontology-specific metadata. Supported tags:
    • data-version: <version number>
    • namespace-uri-mapping: <uri> ! This mapping will apply to any id in this namespace. These mappings will be applied before id-mapping tags are considered
  • Allow namespaces to be specified for Synonym Categories, Categories, Synonyms and Dbxrefs in a trailing modifier named "namespace"
  • Create an Annotation extension to the OBO format that includes the Annotation stanza. The Annotation stanza always maps to an Instance of the new built-in Annotation class. Supported tags are:
    • subject: <term id>
    • relationship: <property id>
    • object: <object id>
    • source: <identifier for the source of this annotation, for example a literature reference or experiment id>
    • evidence: <evidence code, perhaps an identifier for some built-in evidence objects>
    • assigned_by: <user id, probably just a string>
  • Add an object-version tag that can be associated with Instances, Properties, Terms and Annotations
  • Deprecate the necessary and inverse_necessary tags.
  • Add inverse_always_true tag to Property stanza
  • Add complement_of tag with similar semantics to union_of tag. Ontologies with this tag can be ignored by the reasoner; this tag exists only for the purpose of round-tripping OWL ontologies.
  • Create a Post-Composition extension to the OBO format that defines the OBO post-composition syntax. The syntax is:
    • <object_identifier_expression>^<object_identifier_expression>^...
    • where object_identifier_expression is:
      • an object ID OR
      • an expression of the form <property_id>(<object_identifier_expression>) OR
      • <open_parens> <object_identifier_expression> <close_parens>