Guidelines for catalytic activity terms

From GO Wiki
Revision as of 11:55, 30 October 2024 by Pascale (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

These are the GO ontology guidelines for catalytic activity and child terms.

When to create or obsolete a catalytic activity term

In scope for catalytic activity terms

  • Granularity of catalytic activity terms: The substrate specificity of MFs should represent the inferred, in vivo biological specificity of the gene product, which may or may not correspond exactly to the substrate used in the experiment.
  • Specific substrate and positional information: Positional information of modifications on proteins and RNA, such as targets of protein kinases, are usually not in the scope of GO, with the following exceptions:
    • histone code, for which the activity of histone methyltransferases on specific substrates is captured in the ontology, for example
      • GO:0042800 histone H3K4 methyltransferase activity
      • GO:0046976 histone H3K27 methyltransferase activity
      • Note that irreversible modifications that are not part of the histone code are not captured (see discussion).
    • non-coding RNA modifications
  • Bi-directional reactions
    • Normally, bi-directional reactions are represented as a single term that describes both directions of the reaction, unless there is a biological justification to separate the two directions of the reaction into separate terms. An example of this exception is the pair of terms
      • GO:0046932 sodium-transporting ATP synthase activity, rotational mechanism: Enables the transfer of a solute or solutes from one side of a membrane to the other according to the reaction: ADP + phosphate + Na+(out) => ATP + H2O + Na+(in), by a rotational mechanism.
      • GO:0046962 sodium-transporting ATPase activity, rotational mechanism: Enables the transfer of a solute or solutes from one side of a membrane to the other according to the reaction: ATP + H2O + Na+(in) -> ADP + phosphate + Na+(out), by a rotational mechanism, as they are part of different biological processes.
  • Reactions catalyzed by different members of a complex
    • For example, the reactions of the glycine cleavage system, a multienzyme complex involved in the catabolism of glycine, are catalyzed by different subunits and therefore, represented by separate function terms.
    • The overall reaction of the complex is glycine + tetrahydrofolate + NAD = NH3 + 5,10-methylene-THF + CO2 + NADH, but there are two reactions catalyzed by two different enzymes. Therefore the GO molecular function terms:
      • GO:0004375 glycine dehydrogenase (decarboxylating) activity: Catalysis of the reaction: Catalysis of the reaction: glycine + lipoylprotein = S-aminomethyldihydrolipoylprotein + CO2. (EC:1.4.4.2, RHEA:24304)
      • GO:0004047 aminomethyltransferase activity: Catalysis of the reaction: (6S)-tetrahydrofolate + S-aminomethyldihydrolipoylprotein = (6R)-5,10-methylenetetrahydrofolate + NH3 + dihydrolipoylprotein. (EC:2.1.2.10, RHEA:16945)

Out of scope for catalytic activity terms

  • Gene products names: e.g. cytochrome c is not in the ontologies, but attributes of cytochrome c, such as oxidoreductase activity, are.
    • Example: GO:0102113 hypoxia-inducible factor-asparagine oxygenase activity was merged into its parent GO:0036140 [protein]-asparagine 3-dioxygenase activity since it represented a gene product. Note that GO:0102113 is in EC and in RHEA, but these are outside the scope of GO.
  • Substrates beyond the specificity of known protein functions: e. g. GO does not include the term protein threonine kinase activity, as there is no known protein kinase that specifically acts on threonine. Hence, 'GO:0004674 protein serine/threonine kinase activity' does not have a descendant for protein threonine kinase activity, since enzymes with that specificity are not known. Note that RHEA represents this reaction, ATP + L-threonyl-[protein]= ADP + H+ + O-phospho-L-threonyl-[protein] (RHEA:46608), which is out of scope for GO.
  • Sub-reactions/Multi-step reactions are usually represented as a single GO MF, as long as the intermediates are immediately consumed in the reaction and that a single product is released.
    • Example with 1 EC and 2 RHEAs:
      • GO:0004503 tyrosinase activity (tyrosinase EC:1.14.18.1) consists of two sub-reactions:
      • (RHEA:18117) L-tyrosine + O2 = dopaquinone + H2O and
      • 2 L-dopa + O2 = 2 dopaquinone + 2 H2O (RHEA:34287). GO only captures the overall reactions, which is this case is the same as EC.
    • Example with 1 RHEA and 2 ECs: GO:0004619 phosphoglyceromutase activity
      • RHEA:15901: (2R)-2-phosphoglycerate = (2R)-3-phosphoglycerate
      • EC:5.4.2.11 and EC:5.4.2.12 both describe the same overall reaction, (2R)-2-phosphoglycerate = (2R)-3-phosphoglycerate. However, EC:5.4.2.12 has intermediate steps, in which the enzyme is phosphorylated on a histidine residue during the reaction. The substrates and products are the same in both cases, therefore GO aligns with RHEA and considers these two reactions identical.
    • The exception to this is situations in which the intermediates are released and used in other biological processes. Note that for multi-step reactions, RHEA creates separate entries for the overall reaction and the individual single-step reactions, with the relationships between them noted in the ‘Comments’ section at the foot of the page. GO only creates a term for the overall reaction, and only adds the overall RHEA as an exactMatch xref.
  • Submolecular events GO functions are described at the level of molecules rather than atoms. Therefore a reaction would not be split into function terms describing each step of the reaction in atomic or subatomic terms (eg. electrons attracted to positive charge or formation of unstable intermediate); it captures the start and end states with respect to the molecules involved. As a consequence of this, separate function terms are not created to cover situations in which different reaction mechanisms provide the route between the same set of same reactants and products.
  • Spontaneous reactions, in which covalent bonds are made/broken at a biologically relevant rate without an enzyme catalyst. Note that reactions that can occur spontaneously but that are are catalyzed by an enzyme are described in GO; for example GO:0050453 obsolete cob(II)alamin reductase activity.

Pointers that indicate that a catalytic activity term should be merged/obsoleted (or not created)

  • The term lacks a EC/RHEA/MetaCyc/KEGG xref.
  • The term lacks any (EXP) annotations or relevant literature (and the term hasn’t just recently been requested).
  • The term only has a MetaCyc xref and the MetaCyc entry has the note: “No enzyme has been identified for this reaction”.
  • The term is the same as an existing term, as indicated by same term/def xref, or same participants written in a different order or using different chemical names.
  • The term describes a substrate-specific version of a more general activity, and no enzyme is known that distinguishes between these substrates among the common organisms that GO annotates. In these cases, the term usually has a RHEA or a MetaCyc as an exactMatch xref, and the proteins annotated to the RHEA terms for the different substrates are the same or greatly overlapping in UniProt.

Catalytic activity label, definition and parentage

Catalytic activity term label

  • By default, the term label is the EC accepted name', appended with 'activity'.
    • Exceptions:
      • The EC accepted name is more specific than the activity described. In this case, there is usually a more appropriate 'alternative name' in EC.
      • For consistencies in term labels, GO term labels specifies 'NADH/NADPH/[NAD(P)H]' even when EC does not. It is encouraged to add synonyms without the 'NADH/NADPH/NAD(P)H' to facilitate searching. This information just before 'activity'; for example: 'alcohol dehydrogenase (NAD+) activity', 'D-arabinose 1-dehydrogenase [NAD(P)+] activity'
        • for [substrate] dehydrogenase: should use one of 'NAD+/NADP+/NAD(P)+
        • for substrate] reductase: should use one of 'NADH/NADPH/NAD(P)H' (note that this includes monooxygenase)

Catalytic activity definition

  • Definitions for catalytic activity usually are of the form Catalysis of the reaction: a + b = c + d., but there are exceptions where this pattern is too rigid.
  • If a reaction correspond exactly to a RHEA or to an EC reaction, then the reaction part of the definition is copied form these sources.
  • GO represents undirected reactions, so = is used in the definition (and not <=>, ->, for example). Note that this is unlike EC, so care must be taken when copying reactions from EC.
  • Chemicals are represented without parenthesis, as in RHEA
    • If you copy this from a RHEA page using the ‘copy the textual equation’ icon, then you’ll need to remove the parentheses in ‘H(+)’ etc to follow current GO policy. Hint: you can avoid having to do this by copying the definition text from a RHEA hit-list.
    • Current policy is to change ‘A’ in RHEA definitions to ‘acceptor’ in GO definitions. TBC

Catalytic activity term hierarchy

  • GO uses the EC hierarchy system as a primary axis of classification for the catalytic activity branch. GO contains terms that correspond to most partial ECs, except rare cases in which the partial EC class has too few members and GO has chosen to group at a higher level.
  • GO terms with EC xref should always have a parent that corresponds to the parent EC class. For example, tetrahydrofolylpolyglutamate synthase activity (GO:0004326) has xref EC:6.3.2.17, so should be a child of acid-amino acid ligase activity (GO:0016881), which has xref EC:6.3.2.-.
  • Other parents are allowed, for example GO:0050333 thiamine triphosphate phosphatase activity has the additional parent GO:0016462 pyrophosphatase activity, which itself is a child of EC:3.6.1.-, but is not represented in RHEA or EC.
    • Note that in this case, GO:0050333 thiamine triphosphate phosphatase activity, EC:3.6.1.28, is not a direct child of GO:0016818 hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides (corresponding to EC:3.6.1.-), but inherits the correct parent via GO:0016462 pyrophosphatase activity.
  • However, note that the GO sometimes has additional/intermediate grouping classes that do not exactly mirror the EC hierarchy, especially at the level of 4-digit ECs and below; some 4-digit EC terms may be parents of other 4-digit EC terms in GO.
  • For terms with a RHEA xref, check whether RHEA have included the reaction within a hierarchy at their site - check the ‘Related reactions’ section near the foot of the RHEA page. If present, the GO parentage should generally follow the RHEA hierarchy.
  • For catalytic activities without a corresponding EC, the closest reaction known should be used to guide GO editors to place the term at the proper place in the GO hierarchy.

NAD/NADP cofactors - different meaning of NAD(P)H in GO and in IUBMB: implications on GO hierarchy

    • In GO, reactions represented with NAD(P)H indicates that it is not known whether an enzyme uses NAD or NADP.
    • In IUBMB NAD(P)H indicates that the enzyme can use both, as indicated in Rule 18 in the IUBMB classification Rules: "Where the enzyme can use either coenzyme, this should be indicated by writing NAD(P)+".
      • The GO usage corresponds to the definition of NAD(P) in CHEBI:25524: A coenzyme that may be NAD or NADP. Therefore, it refers to either NAD (CHEBI:13389) or NADP (CHEBI:25523). To classify these types of reactions correctly the specific participants should be indicated in subclass relations. If a gene product can use both cofactors, then two annotations are made, one to the 'NAD+' and one to the NADP+ term.
      • If the cofactor (NADH or NADPH) is unknown for a class of reactions, then GO creates a single GO term with (NAD(P)H).
      • If there are different RHEA reactions for the NADH and NADPH-dependent reactions, we add these as NARROW xref to the general (NAD(P)H).
      • If the specific cofactor is known, we create one or both relevant reactions. In this case, we also create (or keep a general parent grouping class with [NAD(P)H], if it exists). The structure of the ontology is:
        • x [NAD(P)] activity (note that for 'NAD(P)', that expression is in brackets, to avoid double parentheses)
          • x (NAD) activity
          • x (NADP) activity
  • If the cofactor specificity is always known, then we mark the grouping class 'do not annotate'. An example of this is isocitrate dehydrogenase activity.

Catalytic activity definition and general cross-references (xref)

Definition xrefs

Preferred sources for database definition xrefs for catalytic activity terms

  • If a term has a RHEA xref, and that RHEA matches exactly the GO term definition, use that for the definition xref.
  • Otherwise, EC is the second preferred source of definition cross-references for catalytic activity terms.
  • If neither RHEA or EC are available, it may be appropriate to use MetaCyc, KEGG, UM-BBD (to be confirmed; we have some instances but we may need to remove them)), or MEROPs (for proteases).

Literature xrefs for definitions

  • As for any other term, any number of literature xref may be added to a definitions.
  • If a paper is indexed by PubMed, add the PMID. Otherwise it is possible to use a DOI or the ISBN number.

General xrefs

  • GO editors manually maintain xrefs to Rhea, EC, MetaCyc and KEGG within the ontology.

RHEA

  • Ideally, a leaf term should always have a RHEA xref
  • xrefs to RHEAs use the ‘undefined’ reaction direction, except in cases where both reactions are known to occurs under normal physiological conditions. For example:
    • GO:0046933 proton-transporting ATP synthase activity, rotational mechanism: RHEA:57722
    • GO:0046961 proton-transporting ATPase activity, rotational mechanism: RHEA:57721

Searching RHEA for a specific reaction (basic/advanced search)

If a GO terms does not have a RHEA, or if a new term is requested but the annotator requesting the term was not able to find a RHEA, see RHEA Decision Tree for adding RHEAs

Strategies for searching RHEA:

  • For GO terms that not have a RHEA, check if there are genes associated with the GO term; if there are, check in UniProt to see whether the annotated genes are associated with either a RHEA or an EC.
  • For newly requested terms, or check UniProt
  • Search with a EC/MetaCyc/KEGG/Reactome ID
  • Search with a PubMed ID
  • Search with a chemical (full/partial name or ChEBI ID)
  • Search with a UniProt accession

RHEA filtering options

  • Search results can be filtered for the reaction type (by broad EC level), transport reactions, or by substrate (protein, nucleic acid) (see screenshot below).
  • Transport reactions: On RHEA search results pages, ‘Transport reactions’ can be filtered via the left-hand ‘Reaction types’ menu. RHEA pages featuring transport reactions can be identified by the turquoise tram symbol at the top left of a page.

  • If a GO leaf term doesn’t have a RHEA xref, new RHEA should normally be requested via the feedback form
  • There are exceptions to this, e.g. children of ‘histone acetyltransferase activity’ (GO:0004402)
  • In these cases, we try to use an EC as a broadMatch for the general xref (see EC section); this way, the term is flagged as having been reviewed to find cross references but not having found an exact match.
  • Non-leaf terms may also have a RHEA xref where RHEA has made generic reactions, e.g. XXX
  • Note that RHEA has many substrate-specific reactions that are outside the scope of GO (e.g.). In these cases, either (i) if present, just add the appropriate generic RHEA and ignore the more specific RHEAs (e.g. XX); or (ii) if not present, add all of the specific RHEAs as narrowMatch xrefs (e.g. XX)
  • Pre-release RHEAs: Note that non-public RHEAs may sometimes be obtained through direct interactions with the RHEA term - these can be added to a GO term.
  • RHEA IDs that are not yet public are filtered out from the GO ontology release.
    • Note that if a not-yet-public the RHEA ID is used as a definition cross-reference, there needs to also be another reference (usually a PMID), because if the RHEA gets filtered, the term will no longer have a definition cross-reference, and that will fail the QC check that all terms need a definition cross-reference.

EC

  • Ideally, a GO leaf term should always have an EC xref.
  • Partial ECs (i.e. 1-, 2- or 3-digit ECs) are grouping terms in EC, and should only be a term or definition xref to the equivalent GO grouping term, and should never be present on GO leaf terms.
  • Full, 4-digit ECs often correspond to GO leaf terms, but there are lots of exceptions such as where an EC doesn’t exist (new ECs are no longer routinely made), or the required depth of the GO classification goes beyond the limited 4-digit classification. Likewise, a 4-digit EC may correspond to a non-leaf term (because it describes a relatively general activity) and have subclasses that also have 4-digit EC xrefs.

MetaCyc

  • Like RHEA, MetaCyc has many substrate-specific reactions that are usually outside the scope of GO (e.g.). These are often identified by the MetaCyc entry containing the text “Note that this reaction equation differs from the official Enzyme Commission reaction equation for this EC entry” (e.g. RXN-10659). However, unlike RHEA, MetaCyc does not usually have generic reactions to group these substrate-specific terms. In these cases, add all of the specific MetaCyc IDs as narrowMatch xrefs (e.g. XX).

While most MetaCyc/KEGG reactions are included in RHEA, this isn’t always the case. Assuming the term is within GO scope, then it’s OK for a catalytic activity term to lack a RHEA xref but to have a MetaCyc and/or KEGG xref, though this should be relatively rare. (eg. XX) Xrefs to MetaCyc ‘pathways’ that comprise only a single reaction should be added as xrefs to GO catalytic activity MF terms.

KEGG

Reactome

Reactome reaction mappings are maintained by the Reactome team (mapping file is maintained at (Reactions2GoTerms_human.txt). Ontology editors should alert Reactome to any changes in the ontology that may impact the mappings.

UM-BBD

University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD) reaction: These mappings were added to the GO in a single bulk load in 2003 (in the very early days of GO). They are no longer added or maintained by GO editors, but are kept in the ontology when correct.

xref types: exact, broad or narrow

  • If the term in GO and in the external database have the same meaning, the scope of the cross reference is exactMatch.
  • If the term in the external database represents a more specific concept than the GO term, for example if an external database has different entries for specific substrates, then it is added to the GO term as narrowMatch.
  • If the term is the external database represents a broader concept than the GO term, then it is added to the GO term as broadMatch.

TBC:

  • Note that a database xref can be an exactMatch on a single GO term, and that a GO term may only have a single exactMatch cross reference to any given external database.
  • Note that a GO term cannot have multiple exactMatch cross reference to a given external database
  • However, the same narrowMatch and broadMatch xrefs may be present on more than one GO term.

Maintaining xrefs up to date

  • External cross references can become obsolete, sometimes with a suggested replacement ID(s). EC and RHEA websites show the replacement ID, if it exists - e.g. XXX.
  • There is a QC check when making pull requests that flag obsolete and transferred RHEA and EC IDs.
  • Different scenarios:
    • xref is obsolete at source => remove xref on obsolete GO term
  TBD xref is valid at source, but GO have decided the corresponding GO term is out of scope => in this case, it would be useful to retain the valid xref within the GO, but need to decide whether this should be by retaining the xref on the obsolete GO term OR by adding the xref as a narrowMatch xref to a valid GO term.


 see also https://wiki.geneontology.org/Guidelines_for_database_cross_references and https://wiki.geneontology.org/Obsoleting_an_Existing_Ontology_Term#5._Cross-references

Guidelines on chemical nomenclature

  • GO aligns with RHEA and uses the ‘UniProt-approved’ synonyms for chemicals in ChEBI, which don’t always match the official ChEBI names. Information on this (and other synonyms) can be found by clicking on a chemical in RHEA and following the link to ChEBI.
  • Different groups represent chemicals different ways; EC uses parenthesis for charges (for example, H(+)), while RHEA uses superscripts. GO uses neither superscript nor parentheses (see #23904).

This is how GO spells common chemicals in textual definitions:

EC or MetaCyc GO
 CO(2) CO2
 dioxygen O2
 H(2)O H2O
 H(+)  H+
Hg(2+) Hg2+
K(+) K+
Mg(2+) Mg2+
Na(+)  Na+
NAD(+) NAD+
NADP(+) NADP+
NH(4)(+) NH4
NH(3) NH3

Links to enzyme databases

Reporting errors to external databases

  • While very reliable, external databases do sometimes make mistakes (e.g. wrong/missing xrefs, duplicate reactions, missing hierarchy relationships). To report errors or get feedback:

QC checks

RHEA to GO checks

  1. A single exactMatch RHEA xref is allowed per GO term
  2.  A RHEA cannot be shared between multiple GO terms (TBC WRT narrow/broad)
  3. Many narrowMatch RHEAs are allowed per GO term
  4.  can broad be shared ? TO CHECK

EC to GO checks

  1. A single exactMatch EC xref is allowed per GO term
  2.  An EC cannot be shared between multiple GO terms (TBC WRT narrow/broad)
  3. Many narrowMatch ECs are allowed per GO term
  4.  can broad be shared ? TO CHECK

MetaCyc to GO checks

Still fixing violations: https://github.com/geneontology/go-ontology/issues/28146

KEGG to GO checks

Still fixing violations: https://github.com/geneontology/go-ontology/issues/28146


Planned improvements

  • Check obsolete EC and RHEA >> see https://github.com/geneontology/go-ontology/issues/21347
  • Computationally pull in mappings to MetaCyc and KEGG reactions via the RHEA xref(s)
  • Partial ECs should not be allowed on leaf terms - either general xref or def xref
  • Check unicity of database cross reference exactMatches - (now only implemented for RHEA) also:
    • narrow xref can only be on 1 GO term
    • can 1 GO term have multiple narrow xrefs (for eg 2 narrow RHEAs)
    • broad matches are allowed on multiple terms
  • Def xref should ALSO be a general xref
  • The same database def cross-reference cannot be used on multiple terms: EC, RHEA, KEGG_REACTION, MetaCyc, UM-BBD, MEROPs, TCDB (PMIDs , DOIs, etc are OK)
  • We probably need to make a distinction between database cross-references and literature cross reference in some metadata file somewhere.
  • Need a mechanism for exceptions (subset?) for example proteases
  • Check that GO hierarchy follows EC hierarchy - ie if a term has a 4 digit EC, its parent must have the same first 3 digits (terms with no ECs are allowed as parents) ideally this would be recursive for grandparents if the immediate parent doesn't have an EC. Need to define guidelines when the partial EC is a grandparent, for example - maybe in this case the grandparent linked to the partial EC needs to be asserted to simplify the check. *
    • Example: GO:0050333 thiamine triphosphate phosphatase activity is_a GO:0016462 pyrophosphatase activity, which itself is a child of GO:0016818 hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides, but GO:0050333 is not a direct is_a GO:0016818.

Review Status

Last reviewed: July 15, 2024

Reviewed by: Pascale Gaudet


Back to: Ontology editors' manual