Relation composition

From GO Wiki
Jump to: navigation, search

This page describes the relation composition rules for relations used in GO. See the OBO Edit Reasoner paper on google docs for background. See also Transitive_closure

See also the Relation Ontology and accompanying paper

Simple composition rules

rules for is_a and part_of

TODO: fill in examples

Basic transitivity compositions:

For example:

mitosis is_a cell cycle phase is_a cell cycle process, THEREFORE mitosis is_a cell cycle process

The following rules arise from the definitions give in the OBO Relation Ontology

For example, starting with:

mitosis part_of M phase of mitotic cell cycle is_a M phase is_a cell cycle phase is_a cell cycle process part_of cell cycle

We can iteratively reduce this by repearted application of composition rules:

  1. mitosis part_of M phase is_a cell cycle phase is_a cell cycle process part_of cell cycle
  2. mitosis part_of M phase is_a cell cycle process part_of cell cycle
  3. mitosis part_of cell cycle process part_of cell cycle
  4. mitosis part_of cell cycle

rules can be applied in any order (e.g. in the second reduction we reduced M phase is_a cell cycle phase is_a cell cycle process)

We can also infer the same link from the following asserted links:

  1. mitosis part_of M phase of mitotic cell cycle part_of mitotic cell cycle is_a cell cycle

rules for regulates

With the addition of the regulates relations in GO, the composition rules expand.

First the standard interaction with is_a:

  • is_a . R → R transitivity under is_a
  • R . is_a → R transitivity over is_a

In the above R stands for any of: regulates, negatively_regulates, positively_regulates

Note that regulates is not itself transitive, but we may wish to include a weaker transitive relation (see below)

Note that positively and negatively regulates are sub-relations of regulates; i.e.

  • IF: X negatively_regulates Y
  • THEN: X regulates Y

The regulates relation is transitive over part_of; i.e.

Slight modification for the negatively and positively regulates relations:

Note that this rule is not hard-coded - it is declared in the gene_ontology .obo file, in the stanza for regulates (see the transitive_over tag)

rules involving gene products

Most of the time we talk of the relation between gene products and GO terms informally as one of "annotated_to". As we expand the relations used in GO (for example, between process and function), we need to be more precise. This will allow us to be consistent in giving recommendations for how tools and databases should handle annotations and the graph.

To formalize annotations we need two further relations, to be defined in RO:

  • has_function_in - between a protein and a MF or BP (as specified in an annotation). Potentially also between a CC and an MF.
  • localized_to - between a protein and a CC.

With the addition of these relations it is simpler to show the compositions in a table:

Here's how it works. If you have two links (annotations or ontology links)

a R1 b R2 c

And you want to know the relation (if any) between a and c, look up the composition R1.R2 in the table. Row first, then column

For example, if you have

Lookup (R+,P) in the table - the cell value is R+ (i.e. the regulates relations are transitive_over part_of)

Composition is recursive, e.g, this:

  • a R1 b, b R2 c, c R3 d

can be written as this:

  • a ((R1.R2).R3) d

Which means you look up R1.R2 first, take the result, then plug that in as the row and look under the R3 column.

If you get a red X, you know something is wrong (remember we have defined regulates as holding between processes; we can generalize so that we can say a gene product is regulated, though it may be better to introduce a different but similar relation)

If you get a -/? then you have a legal relation, just one we have so far declined to name. There is nothing to stop us naming for example "indirectly_regulates" (SEE BELOW: David and Tanya have provided these)

It's important to name the links between gene products and what is denoted by GO terms, this allows us to give consistent coherent explanations of why we propagate certain things up the DAG by default. For example, we don't propagate annotations over part_of by intuition. It's because L.PL and F.PF.

Say we have a gene product p directly annotated to a. a is in BP, so the implicit relation is has_function_in (F). The user queries for e (a MF)

If the ontology has:

(this is post BP->MF links)

The full path from the gene product to the query term is:

Should the tool return p? (Here 'tool' can be generalized to amigo queries, map2slim, enrichment calculation etc.)

According to the table there is no name for the relation that holds between p and e. The tool should not include p in the results since there is nothing we can say about how p relates to the query. This is in accord with what we have been saying about how tools should work with the regulates relation. However, there may be circumstances where we want to allow this propagation to occur, but not in an ad-hoc fashion.

If we like, we can name the composition of P.R e.g. "part_of_regulation_of", PR for short. We can also name the composition F.PR - say "functions_as_part_of_regulation_of" or FPR for short (our table starts getting a bit more complex but that's OK). The composition F.I.P.R.I is reduced to FPR.

This means the tool has a concrete basis for offering the user options for how the gene product is propagated. For example, it could say "no gene products are annotated as *having the function* e. Do you want to extend your search to include products that *function as part of the regulation of* e?

Of course tools could also just have a checkbox of relations to propagate over too: but this doesn't take into account the fact that that certain orderings have different semantics.

If we name the relations then this makes it easier for people using the table of implied relations in GO: Transitive_closure#Calculating_the_transitive_closure:_the_new_way

(of course we won't precompute every gene product to every term, just every meaningful term-term relation. The final composition is done without the table)

David and Tanya proposed the following extension to the table:

  • A (F) B (R) C= A is a regulator of C
  • A (F) B (R+) C=A is a positive regulator of C
  • A (F) B (R-) C= A is a negative regulator of C
  • A (P) B (F) C= A contributes_to C
  • A (P) B (R)C= A (R) C this assumes that the other parts of B will occur
  • A (P) B (R+) C= A (R+) C this assumes that the other parts of B will occur
  • A (P) B (R-) C= A (R-) C this assumes that the other parts of B will occur
  • A (R) B (R) C= A indirectly_regulates C
  • A (R) B (R+) C= A indirectly_regulates C
  • A (R) B (R-) C= A indirectly_regulates C
  • A (R+) B (R) C= A indirectly_regulates C
  • A (R+) B (R+) C= A indirectly_positively_regulates C
  • A (R+) B (R-) C= A indirectly_negatively_regulates C
  • A (R-) B (R) C= A indirectly_regulates C
  • A (R-) B (R+) C= A indirectly_negatively_regulates C
  • A (R-) B (R-) C= A indirectly_positively_regulates C
  • A (L) B (F) C= A may contribute_to C


Updates to relations involving gene products, April 2011

--gwg 14:43, 6 April 2011 (PDT)

This documents the relations used in the GO Moose perl toolkit.


Two relationships are used for connecting annotated entities to GO terms, depending on the GO terms. These are:

 gene product  == part of ==> [cellular component]

and

 gene product  == capable of ==> [ molecular function | biological process ]


For terms that do not belong to the GO ontologies, the generic relationship annotated to is used.


These are the properties of the relations:

term 1 ontology GP --> term 1 term 1 --> term 2 Inferred GP --> term 2

cellular component

part of

part of

part of

function or process

capable of

is a

capable of

function or process

capable of

part of

capable of part of

function or process

capable of

regulates

regulator of

function or process

capable of

positively/negatively regulates

positive/negative regulator of

cellular component

part of

has part

no inference possible

function or process

capable of

has part

no inference possible

cellular component

part of

is a

part of

any

annotated to

is a

annotated to

any

annotated to

part of

annotated to

any

annotated to

(positively/negatively) regulates

(positive/negative) regulator of

any

annotated to

has part

no inference


GP -- capable of --> molecular function
GP --has function in --> biological process
GP --localizes to-->cellular component
if GP capable of MF AND MF part of BP ==> GP has function in BP
integral to - essentially creates a subclass of a complex which is species-specific; saying GP is integral to complex means complex always has part GP in that species

Has_part

See has_part page

Example of relation composition

This example assumes that amongst our annotations we have:

  • MGI Bcl2 - (direct/asserted) annotation to positive regulation of anti-apoptosis
  • RGD Apoe - (direct/asserted) annotation to anti-apoptosis

(For the sake of the example, we assume that these are the only annotations that were created for these genes. We ignore evidence codes here -- assuming they are trusted annotations)

According to our formalization of what annotations mean, the annotation corresponds to has_function_in

We can then apply the composition rules to get the implied links to apoptosis

  • Apoe negative_regulator_of apoptosis
  • Bcl2 indirect_negative_regulator_of apoptosis

This page uses oboedit to illustrate the relationships between the gene produts and different kinds of process. It may seem odd to view annotations in OE, but according to our formalism the links between proteins and the processes they participate in are not a different kind of beast from the other kinds of links in GO. Still, we'll hopefully have this in AmiGO too shortly.

You can get the subset of GO (plus annotations in obo format) used to make these screenshots here:

The full transitive closure is here:

Bcl2-graph.jpg

Bcl2-OEP.jpg

It should also be possible to do queries using the OE2 link search box too - e.g. ask for genes that bear some relation to apoptosis and get back "Bcl2 negative_regulator_of GO:apoptosis". However, the link search doesn't appear to be working properly in conjunction with the reasoner - Amina is working on this.

OBO Format

  • The is_transitive tag is the same as a R <- R.R composition
  • The transitive_over tag is the same as a R <- R.R2 composition
  • The holds_over_chain tag allows for arbitrary compositions R <- R1.R2