Mining function process links from pathway databases

From GO Wiki
Jump to navigation Jump to search

We can use existing pathway database to mine potential F-P links

For the latest results, see:

Issues

Lack of xrefs in either GO or pathway DB

There are many case where there is a pathway ID with no corresponding process ID. We need to do more work to ensure pathways are covered. For example, GO obsoleted "purine metabolism" -- yet this is a pathway in Reactome.

Similarly, Reactome has purine biosynthesis, not present in GO. Instead GO has:

  • GO:0006164 ! purine nucleotide biosynthetic process
  • GO:0009113 ! purine base biosynthetic process
  • GO:0042451 ! purine nucleoside biosynthetic process
http://www.reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Homo%20sapiens&ID=73847&

Automating detecting xrefs

Some can be found by simple name matching of the GO process and the reaction name:

We can also use XP:biological_process_xp_chebi to mine xrefs.

For example, Formation_of_Acetoacetic_Acid Reactome:77110 currently has no xref to a GO BP (2008/10/13).

However, from the computable pathway description in reactome, we know that the output of this process is CHEBI:15344-acetoacetic acid.

From chebi cross-products we know that GO:0043441-acetoacetic acid biosynthetic process outputs the same type of chemical entity. Thus we can infer an xref between the two.

Inconsistent reciprocal xrefs

see reactome-Homo_sapiens.xrefcheck

Androgen biosynthesis in Reactome xrefs to GO:0006702 ! androgen biosynthetic process

However, there is a lack of reciprocal link:

[Term]
id: GO:0006702
name: androgen biosynthetic process
namespace: biological_process
def: "The chemical reactions and pathways resulting in the formation of androgens, C19 steroid hormones that can stimulate the development of male sexual characteristics." [ISBN:0198506732 "Oxford Dictionary of Biochemistry and Molecular Biology"]
synonym: "androgen anabolism" EXACT []
synonym: "androgen biosynthesis" EXACT []
synonym: "androgen formation" EXACT []
synonym: "androgen synthesis" EXACT []
is_a: GO:0006694 ! steroid biosynthetic process
is_a: GO:0008209 ! androgen metabolic process
is_a: GO:0042446 ! hormone biosynthetic process

Note: in Mining_Process_Function_Links_from_Reactome I use reactome xrefs to GO, not vice versa, so this is not a problem here. However, it is perhaps a problem for people consuming things from the GO side.


Xrefs too general

We have various cases like this:

 Reactome:163765 ChREBP activates metabolic gene expression > GO:positive regulation of transcription ; GO:0045941

Imre: The intended semantics is that the goBiologicalProcess slot of a Reactome Event is filled with the equivalent GO biological process term. As evident from your example above this is not always the case - quite often it get's filled with the closest or best fitting (least inappropriate?) GO biological process term. Which is a pity, since we now don't know where we have the clear equivalency and where some sort of approximation.

Peter: No, the intent is that the GO term should be equal in generality to the Reactome event or, if no equal term is available, then a more general GO term should be used. The ChREBP event is not a counterexample, in fact, even though its name makes it look like one. In fact, the process we have annotated is the formation of an active complex of ChREBP with a second protein, MLX, and then the workings of that complex to positively regulate transcription of a number of specific genes

Marc: That would be great, and along with Chris and Imre's suggestions I think that it would be really nice if this autosuggestion feature was available to the curators as they work on modules.

(note: ORB would be useful here)


Media:Reactome_GCM_2006.ppt

IMG

http://img.jgi.doe.gov/

In progress