Guidelines for new Molecular Functions

From GO Wiki
Jump to navigation Jump to search

Molecular Function versus Biological Process

  • A GO Molecular Function is defined as a molecular process that can be carried out by the action of a single macromolecular machine, either a protein, a non-coding RNA, or a complex. Function in this sense denotes an action, or activity, that a gene product (or a complex) performs, and which usually involved direct physical interactions with other molecular entities. In GO, these actions are described from two distinct but related perspectives: (1) biochemical activity, and (2) role as a component in a larger system/process.
  • A GO Biological Process is a specific objective that an organism is genetically programmed to achieve. Biological processes are often described by their outcome or ending state, e.g., the biological process of cell division results in the creation of two daughter cells (a divided cell) from a single parent cell. A biological process is accomplished by a particular set of molecular functions carried out by specific gene products or macromolecular complexes, often in a highly regulated manner and in a particular sequence.
  • Molecular functions and biological processes should be orthogonal; i. e. a specific concept should correspond to either a molecular function or to a biological process, but should not represented in both aspects of the ontology.
    • Example of MF represented as a BP: 'GO:0016569 covalent chromatin modification' corresponds to either DNA methyltransferase activity or to a histone post-translational modification, resulting from a kinase activity, a methyltransferase activity, an acetyltransferase activity, etc., so this term was obsoleted.
    • Example of BP represented as a MF: 'GO:0008189 apoptosis inhibitor activity': this term represents the negative regulation of apoptosis, but not a specific molecular function.

GO Molecular Functions

  • MF substrate/target specificity: The substrate specificity of MFs should represent the inferred, in vivo biological specificity of the gene product, which may or may not correspond exactly to the substrate used in the experiment.
  • Compound functions: Some functions contain two or more inseparable steps, for example, GO:0015611 ABC-type D-ribose transporter activity, that hydrolyzes ATP to provide energy for the transport of a substrate, or a GO:0038023 signaling receptor activity, which binds to a ligand signal and acts to transmit that signal (for example by a conformational change that exposes a new binding site or by a modification of a downstream protein). In these cases, the subfunction is captured by the relation 'has part'; for example ABC transporters have the relation 'has part' some 'ATP hydrolysis activity'.
  • Specific substrate and positional information: Positional information of modifications on proteins and RNA, such as targets of protein kinases, are usually not in the scope of GO, with the following exceptions:
    • histone code, for which the activity of histone methyltransferases on specific substrates is captured in the ontology, for example
      • GO:0042800 histone H3K4 methyltransferase activity
      • GO:0046976 histone H3K27 methyltransferase activity
      • Note that irreversible modifications that are not part of the histone code are not captured (see discussion).

Appending Terms with 'Activity'

GO molecular function terms are all appended (with the exception of the root 'molecular function' and binding terms) with the word 'activity', to help distinguish between a protein and its activity, for example, nuclease and nuclease activity.

Enzymatic Reactions

Classifying Enzymatic Reactions

The function ontology has terms representing many of the enzymes in the Enzyme Commission (EC) database, and it uses the EC classification system to group and classify them. To stay in line with EC, proposed enzyme function terms should be checked in the EC database to ensure that the EC recommended name is used. Not all enzyme entries in the EC database are converted directly to single entries in the function ontology, because some enzymes carry out multiple functions.

If an EC number is given, the term can be added under the EC parent term. For example, thiamin-triphosphatase activity, EC:3.6.1.28, should be added under the parent EC:3.6.1.-, hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides ; GO:0016818. A check of the proposed sibling terms should reveal similar reactions and EC numbers in the same range (EC:3.6.1.x in this case).

If an EC entry cannot be found for the enzyme, it may be worth checking some other databases for it. BRENDA contains the same enzymes as the EC database, but with a greater number of alternative names; MetaCyc University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD) both contain reactions not covered by EC which they have given a partial EC number (eg. EC:2.1.-.-). Both these databases can be used in the general dbxrefs; examples would be MetaCyc:GLYOXIII-RXN and UM-BBD_enzymeID:e0225 respectively.

Multi-step reaction

A multi-step reaction is represented as a single GO MF, as long as the intermediates are immediately consumed in the reaction and that a single product is released. The exception to this is situations in which the intermediates are released and used in other biological processes.

Bi-directional reactions

Normally, bi-directional reactions are represented as a single term that describes both directions of the reaction, unless there is a biological justification to separate the two directions of the reaction into separate terms. An example of this exception is

  • GO:0046932 sodium-transporting ATP synthase activity, rotational mechanism: Enables the transfer of a solute or solutes from one side of a membrane to the other according to the reaction: ADP + phosphate + Na+(out) => ATP + H2O + Na+(in), by a rotational mechanism.

and

  • GO:0046962 sodium-transporting ATPase activity, rotational mechanism: Enables the transfer of a solute or solutes from one side of a membrane to the other according to the reaction: ATP + H2O + Na+(in) -> ADP + phosphate + Na+(out), by a rotational mechanism.

NAD/NADP cofactors

  • Reactions represented with NAD(P)H in GO indicates that it is not know whether an enzyme uses NAD or NADP (note that this is different from the IUBMB practice).
    • If the cofactor (NADH or NADPH) is unknown for a class of reactions, then we only create a single GO term with [NAD(P)H].
      • If there are RHEA IDs for both NAD and NADP-dependent reactions, we add these as NARROW xref to the general [NAD(P)H].
  • If the specific cofactor is known, we create one or both relevant reactions. In this case, we also create (or keep a general parent grouping class with [NAD(P)H], if it exists). The structure of the ontology is:
  • x activity NAD(P)
    • x activity NAD
    • x activity NADP
  • If an enzyme uses both, then it is annotated to both NAD and NADP terms.
  • If the cofactor specificity is always known, then we mark the grouping class 'do not annotate'. An example of this is isocitrate dehydrogenase activity.

Out of scope

  • Gene products names: e.g. cytochrome c is not in the ontologies, but attributes of cytochrome c, such as oxidoreductase activity, are.
    • Example: GO:0102113 hypoxia-inducible factor-asparagine oxygenase activity was merged into its parent GO:0036140 [protein]-asparagine 3-dioxygenase activity since it represented a gene product. Note that GO:0102113 is in EC and in RHEA, but these are outside the scope of GO.
  • Substrates beyond the specificity of known protein functions: e. g. GO does not include the term protein threonine kinase activity, as there is no known protein kinase that specifically acts on threonine. Hence, 'GO:0004674 protein serine/threonine kinase activity' does not have a descendant for protein threonine kinase activity, since enzymes with that specificity are not known.
    • Note that RHEA represents this reaction, ATP + L-threonyl-[protein]= ADP + H+ + O-phospho-L-threonyl-[protein] (RHEA:46608)
  • Sub-reactions: If an enzyme catalyzes a multi-step reaction, only the overall reaction is defined as a GO function; for example tyrosinase EC:1.14.18.1 catalyzes two reactions: L-tyrosine + O2 = dopaquinone + H2O and 2 L-dopa + O2 = 2 dopaquinone + 2 H2O
  • Submolecular events GO functions are described at the level of molecules rather than atoms. Therefore a reaction would not be split into function terms describing each step of the reaction in atomic or subatomic terms (eg. electrons attracted to positive charge or formation of unstable intermediate); it captures the start and end states with respect to the molecules involved. As a consequence of this, separate function terms are not created to cover situations in which different reaction mechanisms provide the route between the same set of same reactants and products.
  • Spontaneous reactions, in which covalent bonds are made/broken at a biologically relevant rate without an enzyme catalyst. Note that reactions that can occur spontaneously but that are are catalyzed by an enzyme are described in GO; for example GO:0050453 obsolete cob(II)alamin reductase activity.
  • Attributes of sequence such as intron/exon parameters: these are not attributes of gene products and are described in a separate Sequence Ontology (see the Open Biomedical Ontologies website for more information).
  • Protein domains or structural features.
  • Reactions that cannot be precisely described at the molecular level

For binding terms, the precise substrate must be known. For example you wouldn't say 'vesicle binding'; instead you would find out which protein in the vesicle membrane was being bound and use that in the term name. (BY THAT GUIDELINE, WE WOULD NOT CREATE 'PROTEIN COMPLEX BINDING??' membrane binding, etc? although we certainly should not have 'vesicle binding', the reason could be formulated more clearly)


Example: Multi-step reactions captured as multiple GO MFs

(+++ CHECK RHEA) - is this a good example??? Is this catalyzed by the same protein, looks like its a complex

Conversely, the reactions of the glycine cleavage system , a multienzyme complex involved in the catabolism of glycine, should be represented by separate function terms.

The overall reaction of the complex is glycine + tetrahydrofolate + NAD = NH3 + 5,10-methylene-THF + CO2 + NADH

but this can be split into steps, which, by the criteria above, warrant individual function terms.

GO molecular function terms:

GO:0004375 glycine dehydrogenase (decarboxylating) activity

  Catalysis of the reaction: Catalysis of the reaction: glycine + lipoylprotein = S-aminomethyldihydrolipoylprotein + CO2. (EC:1.4.4.2, RHEA:24304)

GO:0004047 aminomethyltransferase activity

  Catalysis of the reaction: (6S)-tetrahydrofolate + S-aminomethyldihydrolipoylprotein = (6R)-5,10-methylenetetrahydrofolate + NH3 + dihydrolipoylprotein. (EC:2.1.2.10, RHEA:16945)

TO BE REVIEWED

Avoid Cellular Component Information

Cellular structures are not functions. Many cellular component references have been made obsolete in the function ontology. For example, a mitochondrial primase needs only be primase activity because annotators can assign location to gene products by annotating with appropriate terms from the cellular component ontology. By contrast, there are many cases where component terms are appropriate in the process ontology. For example, Golgi organization and biogenesis is different from lysosome organization and biogenesis, so the anatomical qualifiers 'Golgi' and 'lysosome' are necessary.


Avoid Binding Relationships

See the binding working group pages on the GO wiki for more on binding and the current thinking on how to annotate binding.

Catalytic activities should not be related in the ontology to binding terms; for example, ATPase activity should not have a relationship to ATP binding hard-coded in the ontology. Binding terms should only be used in cases where a stable binding interaction occurs. There are several reasons for this.

Firstly, transporter, catalysis and binding activities are all in the function ontology, which is used to describe elemental single step activities that occur at the macromolecular level. That means that if we were to further subdivide these functions - for example, splitting the catalysis of a reaction into steps such as "substrate binding", "formation of unstable intermediate" or "attraction of electrons to positive charge" - we would be saying that a reaction was actually a series of functions - i.e. a process. Additionally, we would be going beyond the scope of the molecular function ontology as we would be dealing with events on a molecular or atomic level.

Another reason is the sheer practicality of sorting through the 4000+ catalytic reactions we have in GO and deciding which of the substrates and products should be given 'binding' terms. Should we say that only substrates are bound by an enzyme? How about reversible reactions or cases where the reaction mechanism is unknown?

Finally, the GO binding terms are supposed to represent stable binding interactions, as opposed to the transient binding that occurs prior to catalysis. Hence there should not be a connection between stable binding and catalysis. Function Grouping Terms

Terms in the function ontology should be grouped on the basis of functional similarity, rather than being involved in the same process. For example, the grouping term monosaccharide transporter activity might have children such as glucose transporter activity and ribose transporter activity, and is a valid function term in its own right. However, the term defense/immunity protein activity, used to group terms such as antigen binding, blood coagulation factor activity and Fc receptor activity, is not a valid function as it represents a protein involved in the defense or immune response (process) of an organism. If a grouping term is not a function itself, or it contains disparate children with no functional similarity, it should be made obsolete.



Back to: Editing the Ontology