Cross Product Guide

From GO Wiki
Jump to: navigation, search

This guide uses the GO biological process to OBO Cell ontology cross-products as an example.

This page assumes familiarity with the cross-product methodology. It is intended to be a detailed guide for users and curators covering all eventualities.

For more background see:

For mature (partially) vetted cross-products, see

For current progess see (on this wiki):

This is a draft. The final version may go to the public wiki, or it may split with the editors guide remaining here

Overview

Management of files

See also Cross Product Timeline

  • Editors will continue to edit cvs/go/gene_ontology_write.obo
  • Users will continue to download cvs/go/gene_ontology.obo (or access GO via various tools or web interfaces)

The contents of these files will not be changed by the cross-products plan. In a future phase of the project we may consider adding the xp tags directly into the editors .obo file, as is done for SO. However, in the current phase we are 'externalising' the xps, making them optional scaffolding that can be used to periodically 'prop-up' GO, and as an optional add-on for cross-ontology querying.

The cross-product files will live in a new directory:

  • cvs/go/ontology/cross_products

Currently they live in:


This directory will continue to be used for 'experimental' cross-products. This could be considered roughly analagous to IEA - they have been unvetted. Once they are vetted they move to the cross_products directory; the scratch directory may still be used for suggestions of automated updated (though ideally this will not be necessary, as new xps should be added prospectively, avoiding the need for a retrospective mapping)

The proposed new directory structure would be:

 go/
   ontology/
     gene_ontology.obo      -- public version
     gene_ontology_edit.obo -- editors version
     biological_process_to_molecular_function.obo
     cellular_component_to_molecular_function.obo
     cross_products/
       biological_process_xp_cell.obo
       biological_process_xp_cellular_component.obo
       biological_process_xp_self.obo
       
       ...
   external2go/             -- mapping files
     ec2go
     ...

Users Guide

'Typical' Users

For most users, cross-products are entirely optional add-on.

Users can continue using gene_ontology.obo in whatever capacity they use the file now (to browse; to load into a database; as an input to an analysis program; etc), they will experience no significant differences.

Advanced Users

Advanced users may want to use the xps for a number of reasons - for browsing, for loading into a database or tool so that others can browse, for reasoning over ontologies or data, or for using the xps in some analysis. Here we provide guidelines based on the mechanics of using the xps rather than based on end-user goals (which are varied)

Browsing in OBO-Edit

The following instructions are for browsing in OBO-Edit. For browsing in a web interface like AmiGO, the xps must first be loaded into a database (see next section). For browsing in Protege-OWL or SWOOP, see #OWL_Users

The cross-products, and any external ontologies are optional add-ons. This means the main GO file can be loaded in the normal fashion, and the user sees a normal view of GO, as it looks now. The cross-products and external ontologies (if any) must be explicitly loaded by the user.

There are 2 ways of doing this - the two protocols are shown below

note that here 'user' indicates any user of OBO-Edit (or similar tool), editor or end-user

Protocol I: load individual files separately

In the case of external cross-products, eg to CL, the user will have to load 4 files:

  • gene_ontology_edit.obo
  • biological_process_xp_cell.obo
  • cell.obo
  • relationship.obo

At the present time, the editor has to load a 5th files

  • ro_proposed.obo

Note that this process is simplified if the cross-products are internal; for example, if the user wants to see xps between BP and CC (eg formation of autophagic vacuole) then the following is loaded:

  • gene_ontology_edit.obo
  • biological_process_xp_cellular_component.obo
  • relationship.obo

It may even be desirable to fold in these internal cross-products at a later date; for now we can keep them separately

Protocol II: load individual files separately

This process can be simplied by loading an existing "import file". This file contains directives saying where to fetch the other files.

At the moment, import files are available on:

Simply paste the URL of the import file into the oboedit load dialog.

In the case of bp xp cell, paste in this URL:

Loading into a database

Go is often loaded into a database, which is then queried via some interface. Guidelines for the technical details regarding loading of xps will be published later, on the main go database page:

OWL Users

Users wishing to consume OWL files should get the cross-products and OWL serializations of external ontologies from either the OBO Foundry or the following pages:

The translation of a cross product into OWL is simple - the intersectionOf construct is used. With OWL manchester syntax this is event simpler to illustrate. Each xp is translated into a class expression of the form:

 Class:
  EquivalentClass:
   Genus THAT Relation SOME DifferentiaClass

For example:

 B cell differentiation
  EquivalentClass:
    Cell differentiation THAT results_in_acquisition_of_features_of SOME B cell

Editors Guide

GO Editors Guide

It is important to stress that in the first phase of implementation, cross-products will be used by editors as an optional extension. They can be thought of as scaffolding, to be put up temporarily to fix the structure of GO, rather than being part of the major skeletal structure of GO.

Editors are encouraged to load the relevant cross products for the particular part of the ontology they are editing. However, this is optional. Editors are also encouraged to fix erroneous cross-products as they find them, and also strongly encouraged to add new cross-products for new combinatorial terms.

When to ignore cross-products

Scenario: an editor wishes to make a quick grammatical change to a definition, or they wish to edit a part of the ontology for which no mature cross-products or external ontology exists (eg interleukin binding). In cases like these, there is no compelling reason for an editor to load cross-products. Of course, there is no harm in doing so.

If the editor wishes to make changes such as those above, they can simply edit the ontology as normal. E.g. load gene_ontology_edit.obo, make changes, save.

Using cross-products

The xps are subdivided according to (a) the GO hierarchy for which the xps pertain and (b) the external ontology that comprises the differentia of the xps. These will either be internal or external.

Loading external cross-products

Here we take biological_process_xp_cell as an example. We assume the GO curator does not have write privileges to the external ontology (although we may want to consider the possibility of joint custodianship of certain parts of some ontologies).

The editor will load 4 files:

  • gene_ontology_edit.obo
  • biological_process_xp_cell.obo
  • cell.obo
  • relationship.obo

At the present time, the editor has to load a 5th files

  • ro_proposed.obo

However, this should only be required with xps in the 'experimental' phase - for mature xps, all relations required should be in the main RO

The RO is also treated as an external ontology to GO

The editor has two options:

  • create an oboedit profile containing the 4 files above
  • use an 'import' .obo file

The import obo file is simply a collection of directives telling oboedit to load obo files.

The editor has the choice of locating external ontologies on their filesystem, or via some URL.

For example, if they have cell.obo checked out, they would specify the path

   cvs/obo/ontology/anatomy/cell_type/cell.obo

Or they could provide a URL

  http://purl.org/obo/obo-all/cell/cell.obo

This is a matter of curator preference. If they wish to make edits to the external ontology, it is best to use the filesystem path

At the moment oboedit performs no caching of external URLs. This means if a URL is chosen over a cvs-downloaded file, then the editor must be online when they load the files

The editor can choose to load multiple cross-products and external ontologies - for example, cell and chebi at the same time. However, if more external ontologies are loaded then performance may be affected. This is not an immediate short term concern, since there are a limited number of mature xp sets. However, in future this may be a concern if it is typical for editors to make changes to terms like "cysteine biosynthesis", "astrocyte differentiation" in a single session. If these kinds of edits can be naturally partitioned into separate editing sessions then this is less of a problem - editors can simply reload.

Creating a new cross-product

Scenario: a editor wishes to make a new GO term "X differentiation". CL has the terms X and Y, with the link X is_a Y. In this scenario, CL xps are considered stable.

The editor has the choice of prospectively making the xp at the time of editing (the new way), or leaving this to be done retrospectively (the old way). Whilst the editor is strongly encouraged to do this the new way (at least for stable xp sets), the editor has the option of ignoring this recommendation (the xps can be found after-the-fact, by Obol)

Doing this "the old way" the editor would make the term "X differentiation" as a child of "Y differentiation" (perhaps using their own biological knowledge of Xs and Ys, or perhaps by eyeballing the CL ontology). Through discipline they would employ regular grammar in term construction, pehaps (but not always) make sure X is present in CL (submitting a request if it is not, but not waiting for an answer before proceeding). Then at some time later, obol or another tool may mine the xp for "X differentiation".

Doing this the "new way" the editor would have previously loaded GO + xps + external ontology.

Then they would use the oboedit cross product editor [link to OE docs here], selecting the genus from GO ("cell differentiation"), the relation from RO/ro_proposed ("results_in_acquisition_of_features_of") and the term from CL.

Note that whilst this step may seem to involve lots of choices, in fact the existence of previous cross-products makes it easier. The same pattern is followed as for "A differentiation", "B differentiation".

In fact the requested xp matrix editor should make this trivial:

Once the term and it's xp has been created it can be placed in the DAG automatically.

Note that this also requires the ability to realize reasoner results directly from the oboedit interface. We want to do this as we are using the reasoner in 'repair' mode, rather than becoming completely dependent on it:

Until oboedit has this functionality the reasoner can still be used to check the placement of the new term

When the curator saves their session, the term and its links will be saved to the core GO file, as normal. The xp def (i.e. just the intersection_of tags) will be saved to the appropriate xp file

Changing a cross-product

Scenario: the term "X differentiation" is defined using CL term "A". The GO editor realises this is a mistake (perhaps the term name was misleading which lead to an erroneous xp). They wish to rectify this.

They first load GO + xps + CL. They then rectify the mistake in the xp editor. They can turn on the reasoner to re-check the placement in the DAG, then save their results. The save will change the xp file - it will not necessarily change the GO file.

Reconciling changes

As soon as GO + xps + External ontologies are loaded, a number of different things can happen.

  1. The files load without warnings
  2. Some kind of warning or notification is issued

A warning may be issued if, for example, a term in the external ontology has recently been obsoleted and an xp that refers to this term has not been updated. Note that the editor is not compelled to fix this immediately - they are free to fix this at a later date or leave it to someone else.

Obsoletions in external ontology

Consider the following scenario:

t1: GO has a term "X differentiation", cell has term "X". The two are unlinked.

t2: An editor adds an xp such that "X differentiation" = "differentiation" that results_in_acquisition_of_features_of "X". They turn on the reasoner and see that this is consistent with previously asserted is_a links in GO (eg X is_a Y and "X diff" is_a "Y diff"). The editor saves their session, with the new xp going into the biological_process_xp_cell file

t3: The CL editor obsoletes "X". They provide consider tags to two new terms, X1 and X2.

t4: A week later a GO editor loads GO + xps + CL. However, they are using a local copy of CL which they have not cvs updated since t3. The obsoletion is not noticed, the GO editor saves their work to go_edit.obo (and possibly additional changes to the xp file)

t5: A week later another GO editor loads GO + xps + a fresh CL. On loading, oboedit gives a warning that the term "X differentiation" refers to an obsolete term.

What next?

At the moment, oboedit provides no help beyond a warning. In the future oboedit will pop up a component that will help automate the mapping forward of the obsolete cross product. For example, in this case, based on the 'consider' tags in CL, it could offer the curator a number of options:

  1. change the xp for "X differentiation" to use the term X1
  2. change the xp for "X differentiation" to use the term X2
  3. obsolete the term "X differentiation" and replace it either or both of two terms "X1 differentiation" and "X2 differentiation"

The first two options are equivalent in terms of capabilities required in oboedit, and should be easy to implement. These changes do not affect the core GO, but they affect the xps.

The second option will be harder to implement. The changes affect both core GO and the xps

This capability has been requested of oboedit:

Until oboedit implements this capability, the curator must manually resolve the issue (or ignore it for the time being: no harm is done by saving the xps at the end of the session, unchanged). Note that the situation is still better than the pre-cross products days, in which such an issue would remain unnoticed altogether.

The editor can make the changes as described above using normal oboedit capabilities.

After the editor has made any changes, they have the option of switching on the reasoner. This will highlight any link repairs required. The editor may have already resolved these manually - they are encouraged to use the reasoner to do this, but in this phase the reasoner is a tool to help editors repair links and is not required.

Splits and merges in external ontology

The situation is analogous for obsoletions. A merge in CL will result in the subsumed terms identifier being used as an alt_id. The resolution for this should be analogous to that for obsoletions.

Destroyed terms in external ontology

Note that all external ontologies will follow GO/OBO Foundry identifier principles:

This means we should not have to worry about identifiers getting destroyed. However, editing should still be robust in the face of unexpected problems

The situation here is analagous to obsoletions. However, here oboedit will complain that the cross-products refer to dangling terms, and will offer no suggestions as to how to fix this.

This will serve as a signal to the editor to use appropriate channels (email, tracker, phone, walk down the corridor) to resolve this issue.

Whilst the issue is unresolved there are no serious consequences. The dangling xp can still be saved as normal, it just cannot be used by the reasoner. As the editors are using the reasoner in 'repair' mode, this is not a serious consequence. It would be a problem if we adopted the SO-xp methodology whereby we relied on the reasoner to make the public for-users deployed version of GO. This is one reasoner we are taking things slowly and only using the reasoner in repair mode

DAG changes in external ontology

Unlike obsoletions, topology changes in the DAG of the external ontology will not generate automatic notifications on loading GO + xps + external ontologies.

However, if the editor loads the xps and switches on the reasoner (not necessarily for the whole session, or even during editing). They can then visually inspect the reasoner results to see if there are any obvious consequences.

For example, we may start from the following state:


GO has

 Y diff
    ^
    |
    |
 X diff

CL has

    Y
    ^
    |
    |
    X

biological_process_xp_cell has:

 X diff = cell diff THAT results_in_acquisition_of_features_of X
 Y diff = cell diff THAT results_in_acquisition_of_features_of Y

We will now examine different scenarios.

external ontology link removal

In the first scenario, CL removes the link between X and Y.

    Y


    X


Does this affect the following link in GO?

 Y diff
    ^
    |
    |
 X diff

No! We are using the reasoner in repair mode and asserting links, with the aid of the reasoner where possible. The link will remain in GO

However, the editor will see a visual cue. Whereas before, with the reasoner on, this link would show up red (indicating the reasoner was capable of inferring this), now it will be unhighlighted. This will be in contrast to some of the other surrounding differentiation terms. The editor can investigate, and then decide what to do next:

  1. they may disagree with the CL editors decision, open a dialog, and retain the link in GO
  2. they may agree with CL, and manually remove the link from GO
  3. the problem may be in the XP, in which case they can change this
external ontology link change

In fact the above scenario is more likely to involve a link change, perhaps resulting in X and Y becoming siblings:

      A
      ^
      |
      |
    +-+-+
    |   |
    |   |
    V   W
    ^   ^
    |   |
    |   |
    X   Y

Again, this does not change the asserted link in GO:

 Y diff
    ^
    |
    |
 X diff

However, the editor gets the visual cue that something is different

Actions depend on whether "V diff" and "W diff" and "A diff" are realized in GO.

external ontology intermediate link added

Here CL goes from

    Y
    ^
    |
    |
    X

to:

    Y
    ^
    |
    |
    Z
    ^
    |
    |
    X

Again, this does not change the asserted link in GO:

 Y diff
    ^
    |
    |
 X diff

What should happen next depends on whether there is a term "Z diff" in GO, and if not, whether there should be.

If there is not, there is no compelling reason to add one unless requested. It may not be appropriate for GO to have the same level of granularity as CL.

However, the editor can add the intermediate GO term if they choose to do so. If they follow the protocol for adding new xps (above) then the reasoner should indicate the correct intermediate placement automatically.

If a term "Z diff" exists in GO, AND it has an xp def, and if the editor switches on the reasoner (not necessarily for the whole session: it can be temporarily switched on to examine any repairs required), THEN the reasoner will indicate the repairs required.

In this case it will show:

 Y diff
    [BLUE] Z diff
         [BLUE] X diff
    [RED] X diff

Blue is implied links NOT asserted in GO. Red is asserted links that are redundant.

Ideally the curator would hit the "repair" button here, but at the moment the repairs are manual (but at least the reasoner visually indicates what must be done, which is a massive improvement on the pre-corss products situation)

Of course, the editor may disagree with the new CL DAG. In this case they would leave the GO DAG as is and open a dialog with CL to resolve their disagreement.

External Ontology Editors guide

External ontology editors will already be following OBO Foundry principles (otherwise their ontology would not be selected for cross-products)

There are no strict recommendations beyond this at the moment. However, external ontology editors should be considerate of all users of their ontology, whether for annotation or cross-product references.

It is recommended that external ontology editors periodically load GO and GO xps whilst editing their ontology. For example, CL editors could load biological_process_xp_cell and GO. This way they can see the consequences of their actions. Whilst CL should not be required to announce changes and wait for a response (as is the case, for example, when GO wishes to obsolete terms that have annotations), any large scale changes should be announced in advance.

external ontology editors should respond promptly to requests for xp-related terms.

Note that there may be mutual dependence between organisations. For example, GO may depend on CL for biological_process_xp_cell. CL may depend on GO for cell_xp_cellular_component (eg nucleate cell). Note that there is no logical circularity here since the ontologies are different. The only cycle is in the dependence hierarchy between organisations

Note on internal XPs

Internal xps (eg regulation and modulation terms; formation of autophagic vacuole) are easier as synchrony problems can be resolved immediately. Much of the above may still apply, but is considerably simplified. link title