Align the reaction / catalytic activity terms in GO with those in external databases such as MetaCyc, EC, KEGG, etc..


The plan is to create GO terms that correspond to the reactions that enzymes catalyse; rather than follow EC's system, which is gene product-based, and allows an enzyme to catalyse a number of different reactions, follow a system like that of MetaCyc, KEGG reaction or RHEA, where a GO term represents a certain reaction, which may be catalysed by a number of different enzymes. This means that there may be several GO terms with the same EC number, and (less frequently) GO terms with several EC numbers. I believe this is the standard way of handling multifunctional or multi-reaction enzymes, or at least it appears that way from various SF entries I've done and the more recent enzyme activity terms in the ontology.

This is the plan of action:

- clean up current enzyme activity terms in GO -> define any undefined activities -> ensure that as many terms as possible have a reaction associated with them -> split terms with several reactions into separate terms -> convert terms representing a sequence of reactions into processes -> obsolete any 'gene product' activities (e.g. photoreceptor cyclic-nucleotide phosphodiesterase activity) -> associate existing reactions without xrefs to a MetaCyc, KEGG, EC or other database wherever possible

- terms with reactions: use the same system of chemical nomenclature throughout -> not sure what the current stage of the GOCHE work is and how reliable CHEBI is

- create systematic representations of the reactions, e.g. using has_input: CHEBI:XXX and has_output: CHEBI:YYY, to allow the terms to be monitored more easily.

- cross-check GO terms with those in MetaCyc, KEGG, and EC to achieve catalytic harmony

- monitor updates to these resources and update GO as required

- achieve world domination via enzyme activity.

I am using data from IntEnz (EC mirror with some tasty extra data and xrefs), MetaCyc, and (to a small extent) KEGG at the moment, and could later expand it to look at UM-BBD reactions (though they don't provide a db download, which is a bit of a nuisance), RHEA (reaction database at the EBI) and possibly IMG reactions? There are quite a lot of reactions that need to be cleaned up in some way or other (see the first bit of the plan), and that is what I'm working on at present.

MetaCyc, KEGG, and (I believe) UM-BBD all have pathway data (sets of reactions comprising a pathway) which could be integrated into GO fairly easily, especially if some sort of systematic way to represent pathways could be dreamt up.

