Database cross references
TO BE REVIEWED
- 1 Cross-referencing other databases
- 2 Cross-references for enzyme reactions
- 3 General Rules and Things of Note
- 3.1 Adding database cross-references for Definitions versus cross-references for Terms
- 3.2 Example 1: epi-cedrol synthase
- 3.3 Example 2: farnesol kinase
- 3.4 Example 3: phosphomethylethanolamine N-methyltransferase activity
- 3.5 Example 4: updating EC:18.104.22.168, a transferred EC entry
- 3.6 Multi-Step Enzyme Reactions
- 3.7 Review Status
Cross-referencing other databases
General database cross-references, or general dbxrefs, should be used where a GO term is identical to an object in another database. For more information on syntax, please refer to the GO File Format Guide and for a complete list of dbxrefs, see the database cross-references page.
|Transport Protein Database||TC:2.A.29.10.1|
|University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD)||UM-BBD_enzymeID:e0310|
|MetaCyc metabolic pathway database||MetaCyc:XXXX-RXN|
|Process||MetaCyc metabolic pathway database||MetaCyc:2ASDEG-PWY|
|University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD)||UM-BBD_pathwayID:dcb|
Cross-references for enzyme reactions
GO uses Rhea for the definitions of reactions. At some point in the future, we will use the Rhea xrefs to automatically populate definitions and to classify reactions based on participation. In addition, we will use Rhea as the source of information for EC xrefs when the Rhea xref is available.
There are five websites that are particularly useful when adding reaction terms. These are:
Note that those cross-references should only be applied to MF terms.
General Rules and Things of Note
Adding database cross-references for Definitions versus cross-references for Terms
* Add a database cross-reference whenever you can * If the database was used to help generate the term Definition, you should add the database cross-reference to the term Definition directly. * If you see a case where there is a database cross-reference for the Definition and no database cross-reference for the term, check the xref and add it to the term if appropriate. * If the database cross-reference in the text Definition is a partial EC number, it should be removed. * Since we will use Rhea as the source of term definitions and inferences in the future, Rhea xrefs are special. Use them only if they are 1:1. If the reaction is generic (bidirectional) use the Rhea identifier that corresponds to the agnostic reaction. The agnostic reaction is the one in Rhea that has an '=' between the reactants and products, as opposed to '<=>' or '=>'.
Use the non-directed RHEA term (=) unless needed (if a reaction is known to only occurs in one direction under physiological conditions).
The Enzyme Commission names and categorizes enzymes based on reaction mechanism. In the same way that a gene product may participate in a number of different processes, it may catalyze more than one reaction; the ontology should contain each reaction even if it is enabled by a single gene product. Gene annotators should associate a gene product with all of the molecular functions it can catalyze, or in the case of a GO-CAM model with the specific reaction that happens in the model. A single enzyme may perform a number of different reactions, and it is also possible for several different EC enzymes to perform the same reaction.
This means that there is not a 1:1 correspondence between EC numbers and GO reaction terms.
There are a number of websites that mirror the EC data; one that is particularly useful is IntEnz. It shows the reactions from RHEA, allowing for easy correlation checks between the resources.
Precise vs. Imprecise EC Numbers
GO has terms that represent the categories used by EC. These have EC xrefs of the form EC:n, EC:n.n and EC:n.n.n (where n is a number).
For reactions where the enzyme has not yet been added to EC, but it can be put into one of the EC categories, the xref should be of the form EC:n.n.n.-, i.e. ending with a dash.
One EC number, multiple reactions
There are a number of cases where an enzyme can catalyse a set of reactions. These may or may not be specified by EC, but KEGG and MetaCyc will often show additional reactions. Similarly, there are often different EC enzymes that will catalyse the same reaction. A good example of this overlap is found in EC:22.214.171.124, 14, 15, 16, and 17. Looking at IntEnz, there are four reactions for EC:126.96.36.199; if we then look at EC:188.8.131.52, we can see that one of the reactions from EC:184.108.40.206 can be catalysed by this enzyme, too. KEGG shows this data more clearly; viewing all the reactions for EC:220.127.116.11 (click 'Show all' on the enzyme data page), each reaction has the EC numbers of enzymes that can catalyse it listed. MetaCyc also lists a number of reactions for each EC number.
At present, MetaCyc reactions are associated with one EC number, so if two different EC enzymes catalyze the same reaction, there will be two MetaCyc reactions, one for each EC number.
KEGG makes reactions independent of the EC number; you can look up an EC number and see the reactions that the enzyme performs (e.g. EC:18.104.22.168), or you can look up a reaction and see which EC enzymes perform that reaction (e.g. R01036). Nifty!
Reactome provide mappings of their terms to GO terms, so they do the work for us! Whenever a release occurs, GO retrieves the updated mappings and the ontology is updated appropriately.
Here is the (September 2013) view from the IUBMB committee (Keith Tipton and other members clarified this) after canvasing by Kristian Axelson and Alan Bridge:
Our 'classification Rules (on both websites) clearly state in rule 18: "Where the enzyme can use either coenzyme, this should be indicated by writing NAD(P)+".
For further info see rule 18 on systematic names at http://www.chem.qmul.ac.uk/iubmb/enzyme/rules.html
So the meaning is really: "the enzyme can use both", rather than "the reaction may contain either".
alditol + NAD(P)+ = aldose + NAD(P)H + H+
means that the enzyme performs
alditol + NAD+ = aldose + NADH + H+
alditol + NADP+ = aldose + NADPH + H+
HOWEVER, this is in conflict with the way that GO uses ChEBI. In ChEBI NAD(P), CHEBI:25524, is defined: A coenzyme that may be NAD or NADP. Therefore, it refers to either NAD (CHEBI:13389) or NADP (CHEBI:25523). To classify these types of reactions correctly the specific participants should be indicated in subclass relations. For a gene product that can use both substrates, the information should be captured at the level of annotation by annotating to both children.
Example 1: epi-cedrol synthase
Add a term for EC 22.214.171.124, epi-cedrol synthase
- Check the reaction does not exist in GO by searching on the name, EC number, reactants and Rhea. I searched for 'epicedrol' and 'epi-cedrol'.
- Look up the reaction in EC (using IntEnz), MetaCyc and KEGG.
- IntEnz: 126.96.36.199
2-trans,6-trans-farnesyl diphosphate + H2O <=> epi-cedrol + diphosphate
- MetaCyc: RXN-10004
(2E,6E)-farnesyl diphosphate + H2O <=> 8-epi-cedrol + diphosphate
trans,trans-Farnesyl diphosphate + H2O <=> 8-epi-Cedrol + Diphosphate
Check against the RHEA reaction, RHEA:26118 (linked from IntEnz) so that we can be sure we're using the correct nomenclature.
Names and synonyms: KEGG and EC both give us "(2E,6E)-farnesyl-diphosphate diphosphate-lyase (8-epi-cedrol-forming)", which is the systematic name, according to EC. We also have "8-epicedrol synthase" and "epicedrol synthase".
Parentage: find the GO term for the category EC:4.2.3; if any of the children are relevant, use them as the parent.
name: epi-cedrol synthase activity def: "Catalysis of the reaction: 2-trans,6-trans-farnesyl diphosphate + H2O = epi-cedrol + diphosphate." [RHEA:26118] synonym: "(2E,6E)-farnesyl-diphosphate diphosphate-lyase (8-epi-cedrol-forming) activity" EXACT systematic_synonym [EC:188.8.131.52] synonym: "8-epicedrol synthase activity" EXACT  synonym: "epicedrol synthase activity" EXACT  xref: EC:184.108.40.206 xref: MetaCyc:RXN-10004 xref: KEGG:R09140 xref: RHEA:26118 is_a: GO:0016838 ! carbon-oxygen lyase activity, acting on phosphates
Example 2: farnesol kinase
definition: farnesol + an NTP = farnesol phosphate + an NDP EC: 2.7.1.- One example of a more specific case of this is: MetaCyc RXN-11625 PMID 21395888 PMID 10557276 NARROW synonym: trans,trans-farnesol kinase NARROW synonym: 2-trans, 6-trans-farnesol kinase
- Look up the MetaCyc reaction. It's
2-trans,-6-trans-farnesol + CTP = 2-trans,-6-trans-farnesyl monophosphate + CDP + H+
- Search GO, EC, KEGG and RHEA for farnesol. No results for reactions of a similar form.
- Checking the literature references, it is not clear whether the farnesol reactions are limited to the 2-trans,6-trans isomer, so we'll refer to 'farnesol' in the reaction.
- ChEBI searches for farnesol phosphates turn up a blank; however, "farnesyl phosphate" is a parent term for "farnesyl diphosphate" so we should use the name "farnesyl monophosphate" instead of "farnesol phosphate" to refer to the reaction product.
- Parentage: MetaCyc gives an EC ref of 2.7.1.- for RXN-11625; this corresponds to GO:0016773. We can have a look at the ChEBI hierarchy for "farnesyl phosphate" to get some hints as to whether there may be any generic terms under GO:0016773, but there don't seem to be any. (N.b. a 'prenol kinase' term was later added which would be a more appropriate parent)
- Reaction equation: NTP and NDP are referred to in ChEBI as nucleoside triphosphate and nucleoside diphosphate.
name: farnesol kinase activity def: "Catalysis of the reaction: farnesol + nucleoside triphosphate = farnesyl monophosphate + nucleoside diphosphate." [MetaCyc:RXN-11625] synonym: "trans,trans-farnesol kinase activity" NARROW xrefs: EC:2.7.1.- is_a: GO:0016773 ! phosphotransferase activity, alcohol group as acceptor
- Add the MetaCyc reaction cited as a child of this new term. I gave it the name "2-trans,-6-trans-farnesol kinase activity" to reflect the specific substrate.
Example 3: phosphomethylethanolamine N-methyltransferase activity
Def: Catalysis of the reaction: phosphomethylethanolamine (PMEA) + AdoMet -> phosphodimethylethanolamine Ref: GOC:tb PMID 20650897
Searching for the enzyme name brings up no results in GO, EC, MetaCyc and KEGG, so let's look up the reaction instead.
Look up all three compounds mentioned in MetaCyc and KEGG.
Check the reactions for these compounds.
- KEGG: R06868 looks like a match:
S-Adenosyl-L-methionine + N-Methylethanolamine phosphate <=> S-Adenosyl-L-homocysteine + Phosphodimethylethanolamine
- MetaCyc: RXN-5642 looks like a match:
N-methylethanolamine phosphate + S-adenosyl-L-methionine <=> N-dimethylethanolamine phosphate + S-adenosyl-L-homocysteine + H+
- Check that N-dimethylethanolamine phosphate (from the MetaCyc reaction) is also known as phosphodimethylethanolamine
- phosphodimethylethanolamine is a synonym on the MetaCyc compound page; the KEGG compound ID C13482 matches that in the KEGG reaction
- If in doubt, search for the compound in ChEBI and check the synonyms.
- MetaCyc states that the reaction is one of three catalysed by EC:220.127.116.11, so go to IntEnz and look up 18.104.22.168. Although the comments mention subsequent reactions, the reaction list doesn't, so we will use the more generic EC:2.1.1.- as a reference.
- Get the ChEBI names for the substances and generate a balanced equation. Check to see if the reaction is in Rhea. I looked at the automatic xrefs for N-methylethanolamine phosphate in ChEBI and clicked on the Rhea xrefs. RHEA:25322 is a match! Checking the xrefs for the Rhea reaction, they match the reactions in KEGG and MetaCyc that we found earlier.
- Term name: a quick Google search reveals that 'phosphomethylethanolamine N-methyltransferase' appears to be the most common name for this term.
- Synonyms: added the KEGG name for the reaction as an exact synonym with the scope set as 'systematic synonym'; also added a synonym using the ChEBI name for the chemical instead of phosphomethylethanolamine.
- Term parentage: this term can go under N-methyltransferase activity.
name: phosphomethylethanolamine N-methyltransferase activity def: "Catalysis of the reaction: N-methylethanolamine phosphate + S-adenosyl-L-methionine = N,N-dimethylethanolamine phosphate + S-adenosyl-L-homocysteine + H(+)." [RHEA:25322, KEGG:R06868, MetaCyc:RXN-5642] synonym: "N-methylethanolamine phosphate N-methyltransferase activity" EXACT synonym: "S-adenosyl-L-methionine:methylethanolamine phosphate N-methyltransferase activity" EXACT systematic_synonym [KEGG:R06868] xref: EC:2.1.1.- xref: KEGG:R06868 xref: MetaCyc:RXN-5642 xref: RHEA:25322 is_a: GO:0008170 ! N-methyltransferase activity
Example 4: updating EC:22.214.171.124, a transferred EC entry
Transferred entry: polyamine oxidase. Now included with EC 126.96.36.199 N1-acetylpolyamine oxidase, EC 188.8.131.52 polyamine oxidase (propane-1,3-diamine-forming), EC 184.108.40.206 N8-acetylspermidine oxidase (propane-1,3-diamine-forming), EC 220.127.116.11 spermine oxidase and EC 18.104.22.168 non-specific polyamine oxidase
This is a tricky entry as there is a lot of overlap between the reactions that each enzyme catalyses. The best way to handle it is to copy out all the reactions (either from IntEnz or KEGG) and then see which are duplicated. E.g.
EC:22.214.171.124: [RHEA:25815] N1-acetylspermidine + H2O + O2 <=> 3-acetamidopropanal + H2O2 + putrescine [RHEA:25803] N1-acetylspermine + H2O + O2 <=> 3-acetamidopropanal + H2O2 + spermidine [RHEA:25871] N1,N12-diacetylspermine + H2O + O2 <=> 3-acetamidopropanal + N1-acetylspermidine + H2O2 [RHEA:25811] H2O + O2 + spermidine <=> 3-aminopropanal + H2O2 + putrescine [RHEA:25807] H2O + O2 + spermine <=> 3-aminopropanal + H2O2 + spermidine
EC:126.96.36.199: [RHEA:25807] H2O + O2 + spermine <=> 3-aminopropanal + H2O2 + spermidine
EC:188.8.131.52 [RHEA:25803] N1-acetylspermine + H2O + O2 <=> 3-acetamidopropanal + H2O2 + spermidine [RHEA:25807] H2O + O2 + spermine <=> 3-aminopropanal + H2O2 + spermidine [RHEA:25811] H2O + O2 + spermidine <=> 3-aminopropanal + H2O2 + putrescine [RHEA:25815] N1-acetylspermidine + H2O + O2 <=> 3-acetamidopropanal + H2O2 + putrescine
From these lists, we can see that RHEA:25807 will have EC refs 184.108.40.206, 220.127.116.11 and 18.104.22.168; RHEA:25815 will have EC refs 22.214.171.124 and 126.96.36.199; and so on. The KEGG reaction display makes it easier to check which reactions are linked with which EC numbers once you have figured out the correspondence between RHEA IDs and KEGG IDs. KEGG also provides names for the reactions; there was one case where a reaction name clashed with an existing GO MF term, so I made the new term name more specific whilst keeping to the nomenclature conventions used by the other terms.
There ended up being a lot of new terms created; here's a sample:
name: spermine:oxygen oxidoreductase (spermidine-forming) activity def: "Catalysis of the reaction: H(2)O + O(2) + spermine = 3-aminopropanal + H(2)O(2) + spermidine." [RHEA:25807] xref: EC:188.8.131.52 xref: EC:184.108.40.206 xref: EC:220.127.116.11 xref: KEGG:R09076 xref: MetaCyc:18.104.22.168-RXN xref: MetaCyc:RXN-9015 xref: RHEA:25807 "H(2)O + O(2) + spermine = 3-aminopropanal + H(2)O(2) + spermidine" name: spermidine:oxygen oxidoreductase (3-aminopropanal-forming) activity def: "Catalysis of the reaction: H(2)O + O(2) + spermidine = 3-aminopropanal + H(2)O(2) + putrescine." [RHEA:25811] xref: EC:22.214.171.124 xref: EC:126.96.36.199 xref: KEGG:R09077 xref: MetaCyc:RXN-10461 xref: MetaCyc:RXN-12089 xref: RHEA:25811 "H(2)O + O(2) + spermidine = 3-aminopropanal + H(2)O(2) + putrescine" name: N1-acetylspermine:oxygen oxidoreductase (3-acetamidopropanal-forming) activity def: "Catalysis of the reaction: H(2)O + N(1)-acetylspermine + O(2) = 3-acetamidopropanal + H(2)O(2) + spermidine." [RHEA:25803] xref: EC:188.8.131.52 xref: EC:184.108.40.206 xref: KEGG:R03899 xref: MetaCyc:RXN-12090 xref: MetaCyc:RXN-9940 xref: RHEA:25803 "H(2)O + N(1)-acetylspermine + O(2) = 3-acetamidopropanal + H(2)O(2) + spermidine" name: N1-acetylspermidine:oxygen oxidoreductase (3-acetamidopropanal-forming) activity def: "Catalysis of the reaction: H(2)O + N(1)-acetylspermidine + O(2) = 3-acetamidopropanal + H(2)O(2) + putrescine." [RHEA:25815] xref: EC:220.127.116.11 xref: EC:18.104.22.168 xref: KEGG:R09074 xref: MetaCyc:RXN-12091 xref: MetaCyc:RXN-9942 xref: RHEA:25815
There were also extra reactions in KEGG and MetaCyc that weren't in the EC listings; whether you add these or not depends on whether the person requesting the terms has asked for them and/or whether you want to add them.
Multi-Step Enzyme Reactions
- When an enzyme reaction is multi-step, but with just one EC number, add one GO term for the overall reaction, and add a comment to say the reaction is multi-step (and whether one of the steps is spontaneous).
- Comment: This is a multi-step reaction, with a spontaneous second step.
E.g. EC:22.214.171.124: http://www.chem.qmul.ac.uk/iubmb/enzyme/EC1/14/13/109.html
Accepted name: abieta-7,13-dien-18-ol hydroxylase Reaction: abieta-7,13-dien-18-ol + NADPH + H+ + O2 = abieta-7,13-dien-18-al + NADP+ + 2 H2O (overall reaction) (1a) abieta-7,13-dien-18-ol + NADPH + H+ + O2 = abieta-7,13-dien-18,18-diol + + NADP+ + H2O (1b) abieta-7,13-dien-18,18-diol = abieta-7,13-dien-18-al + H2O (spontaneous)
- An update on this issue (from Becky, 16.10.2012):
We decided that if the enzyme catalyzed all steps, and if all steps always occurred together, then we'd add GO terms for the separate steps as HAS_PART children of the overall reaction. The problem came with naming the individual steps... the plan was to take the names direct from KEGG, but KEGG don't always name the separate steps. I emailed them about it but haven't heard back from them yet:
Some of the background is at the top of this SF item: https://sourceforge.net/tracker/?func=detail&aid=3510070&group_id=36855&atid=440764
- Example of this in action with pictures: http://www.ebi.ac.uk/panda/jira/browse/GOHELP-75
So at the moment, most GO terms have been added with the definition specifying that it's a 2/3/multi-step reaction (E.g. GO:0036209), ready for HAS_PART children to be added when we've sorted out how to name them.