Relation composition: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 43: Line 43:
THIS SECTION REQUIRES MORE BACKGROUND INFORMATION
THIS SECTION REQUIRES MORE BACKGROUND INFORMATION


[[Media:go-relation-composition.xls|Initial version of composition table]]
* [[Media:go-relation-composition.xls|Initial version of composition table]] (Excel)
* [[Media:go-relation-composition.pdf|Initial version of composition table]] (PDF)
 


Most of the time we talk of the relation between gene products and GO terms informally as one of "annotated_to". As we expand the relations used in GO, we need to be more precise.
Most of the time we talk of the relation between gene products and GO terms informally as one of "annotated_to". As we expand the relations used in GO, we need to be more precise.

Revision as of 17:36, 24 June 2008

Simple composition rules

rules for is_a and part_of

TODO: fill in examples

Basic transitivity compositions:

The following rules arise from the definitions give in the OBO Relation Ontology

rules for regulates

With the addition of the regulates relations in GO, the composition rules expand.

First the standard interaction with is_a:

  • is_a . R → R transitivity under is_a
  • R . is_a → R transitivity over is_a

In the above R stands for any of: regulates, negatively_regulates, positively_regulates

Note that regulates is not itself transitive, but we may wish to include a weaker transitive relation (see below)

Note that positively and negatively regulates are sub-relations of regulates; i.e.

  • IF: X negatively_regulates Y
  • THEN: X regulates Y

The regulates relations are transitive over part_of; i.e.

  • regulates . part_of → regulates transitivity over part_of
  • negatively_regulates . part_of → negatively_regulates transitivity over part_of
  • positively_regulates . part_of → positively_regulates transitivity over part_of

rules involving gene products

THIS SECTION REQUIRES MORE BACKGROUND INFORMATION


Most of the time we talk of the relation between gene products and GO terms informally as one of "annotated_to". As we expand the relations used in GO, we need to be more precise.

Additional relations required for formalization:

  • has_function_in - between a protein and a MF or BP (as specified in an annotation). Potentially also between a CC and an MF.
  • localized_to - between a protein and a CC.

Here's how it works. If you have two links (annotations or ontology links)

a R1 b b R2 c

And you want to know the relation (if any) between a and c, look up the composition R1.R2 in the table. Row first, then column (seems most intuitive? Could be transposed if required)

For example, if you have

a positively_regulates b, b part_of c

Lookup (R+,P) in the table - the cell value is R+ (i.e. the regulates relations are transitive_over part_of)

Composition is recursive, e.g.: a R1 b, b R2 c, c R3 d => a ((R1.R2).R3) d

Which means you look up R1.R2 first, take the result, then plug that in as the row and look under the R3 column.

If you get a red X, you know something is wrong (remember we have defined regulates as holding between processes; we can generalize so that we can say a gene prouct is regulated, though it may be better to introduce a different but similar relation)

If you get a -/? then you have a legal relation, just one we have so far declined to name. There is nothing to stop us naming for example "indirectly_regulates" (remember we have declared regulates as intransitive)

It's important to name the links between gene products and what is denoted by GO terms, this allows us to give consistent coherent explanations of why we propagate certain things up the DAG by default. For example, we don't propagate over part_of just because it feels warm and fuzzy. It's because L.P=>L and F.P=>F.

Say we have a gene product p directly annotated to a. a is in BP, so the implicit relation is has_function_in (F). The user queries for e (a MF)

If the ontology has:

a is_a b part_of c regulates d is_a e

(this is post BP->MF links)

The full path is:

p has_function_in a is_a b part_of c regulates d is_a e

Should the tool return p? (Here 'tool' can be generalized to amigo queries, map2slim, enrichment calculation etc.)

According to the table there is no name for the relation that holds between p and e. The tool should not include p in the results since there is nothing we can say about how p relates to the query. This is in accord with what we have been saying about how tools should work with the regulates relation. However, there may be circumstances where we want to allow this propagation to occur, but not in an ad-hoc fashion.

If we like, we can name the composition of P.R e.g. "part_of_regulation_of", PR for short. We can also name the composition F.PR - say "functions_as_part_of_regulation_of" or FPR for short (our table starts getting a bit more complex but that's OK). The composition F.I.P.R.I is reduced to FPR.

This means the tool has a concrete basis for offering the user options for how the gene product is propagated. For example, it could say "no gene products are annotated as *having the function* e. Do you want to extend your search to include products that *function as part of the regulation of* e?

Of course tools could also just have a checkbox of relations to propagate over too: but this doesn't take into account the fact that that certain orderings have different semantics.

If we name the relations then this makes it easier for people using the table of implied relations in GO: http://wiki.geneontology.org/index.php/Transitive_closure#Calculating_the_transitive_closure:_the_new_way

(of course we won't precompute every gene product to every term, just every meaningful term-term relation. The final composition is done without the table)

David and Tanya proposed the following extension to the table:

  • A (F) B (R) C= A is a regulator of C
  • A (F) B (R+) C=A is a positive regulator of C
  • A (F) B (R-) C= A is a negative regulator of C
  • A (P) B (F) C= A contributes_to C
  • A (P) B (R)C= A (R) C this assumes that the other parts of B will occur
  • A (P) B (R+) C= A (R+) C this assumes that the other parts of B will occur
  • A (P) B (R-) C= A (R-) C this assumes that the other parts of B will occur
  • A (R) B (R) C= A indirectly_regulates C
  • A (R) B (R+) C= A indirectly_regulates C
  • A (R) B (R-) C= A indirectly_regulates C
  • A (R+) B (R) C= A indirectly_regulates C
  • A (R+) B (R+) C= A indirectly_positively_regulates C
  • A (R+) B (R-) C= A indirectly_negatively_regulates C
  • A (R-) B (R) C= A indirectly_regulates C
  • A (R-) B (R+) C= A indirectly_negatively_regulates C
  • A (R-) B (R-) C= A indirectly_positively_regulates C
  • A (L) B (F) C= A may contribute_to C

Example of relation composition

This example assumes that amongst our annotations we have:

  • MGI Bcl2 - (direct/asserted) annotation to positive regulation of anti-apoptosis
  • RGD Apoe - (direct/asserted) annotation to anti-apoptosis

For the sake of the example, we assume that these are the only annotations that were created for these genes. We ignore evidence codes here (assuming they are trusted annotations)

This page uses oboedit to illustrate the relationships between the gene produts and different kinds of process. It may seem odd to view annotations in OE, but according to our formalism the links between proteins and the processes they participate in are not a different kind of beast from the other kinds of links in GO. Still, we'll hopefully have this in AmiGO too shortly.

You can get the subset of GO used to make these screenshots here:

It should also be possible to do queries using the OE2 link search box too - e.g. ask for genes that bear some relation to apoptosis and get back "Bcl2 negative_regulator_of GO:apoptosis". However, the link search doesn't appear to be working properly in conjunction with the reasoner - Amina is working on this.