Guidelines for new Molecular Functions
Molecular Function versus Biological Process
A GO Molecular Function is defined as a molecular process that can be carried out by the action of a single macromolecular machine, usually via direct physical interactions with other molecular entities. Function in this sense denotes an action, or activity, that a gene product (or a complex) performs. These actions are described from two distinct but related perspectives: (1) biochemical activity, and (2) role as a component in a larger system/process.
A GO Biological Process is a specific objective that an organism is genetically programmed to achieve. Biological processes are often described by their outcome or ending state, e.g., the biological process of cell division results in the creation of two daughter cells (a divided cell) from a single parent cell. A biological process is accomplished by a particular set of molecular functions carried out by specific gene products or macromolecular complexes, often in a highly regulated manner and in a particular sequence.
Molecular functions and biological processes should be orthogonal.
explain: this is WRT annotation semantics note that this is not the case in the ontology now, but this is what we aim for link additional discussions (1) phosphorylation vs kinase activity (2) regulator vs. BP:regulation of [molecular function X] (in both these cases, we consider these a MF)
For background, read Paul Thomas' chapter on the Meaning of Biological Function in the GO handbook (2016)
GO Molecular Functions
- Molecular Functions (MF) represent an activity mediated by a single macromolecular machine: either a protein, a non-coding RNA, or a complex.
- For complexes, not all members necessarily share the same MF:
- In some complexes, individual subunits have different catalytic activity, and the product of one activity is channeled to the next enzyme where it is being used as a substrate. These are usually represented by individual MF for each activity.
- Some complexes contain regulatory and catalytic subunits, for example of some kinases, heterotrimeric G proteins, etc. These are respectively represented by a regulator activity term, under the enzyme regulator activity node, and an enzyme activity term, under the catalytic activity node.
- For complexes, not all members necessarily share the same MF:
- MF substrate/target specificity: The substrate specificity of MFs should represent the inferred, in vivo biological specificity of the gene product, which may or may not correspond exactly to the substrate used in the experiment.
- Multi-step functions: An activity described by a GO MF can be multi-step, as long as the intermediates are immediately consumed in the reaction and that a single product is released. The exception to this is situations in which the intermediates are released and used in other biological processes.
- Compound functions: Some functions contain two or more inseparable steps, for example, GO:0015611 ABC-type D-ribose transporter activity, that hydrolyzes ATP to provide energy for the transport of a substrate, or a GO:0038023 signaling receptor activity, which binds to a ligand signal and acts to transmit that signal (for example by a conformational change that exposes a new binding site or by a modification of a downstream protein). In these cases, the subfunction is captured by the relation 'has part'; for example ABC transporters have the relation 'has part' some 'ATP hydrolysis activity'.
- Specific substrate and positional information: Positional information of modifications on proteins and RNA, such as targets of protein kinases, are usually not in the scope of GO, with the following exceptions:
- histone code, for which the activity of histone methyltransferases on specific substrates is captured in the ontology, for example
GO:0042800 histone methyltransferase activity (H3-K4 specific) GO:0046976 histone methyltransferase activity (H3-K27 specific) GO:0046975 histone methyltransferase activity (H3-K36 specific)
- RNA modifications
Justification for this - specificity of the activity ? e.g NOT an exception: snoRNA modification uses 'guides', so the enzymatic reaction is not specific, but the guiding activity provides the specificity, so we dont make the activity for the specific modification
Out of scope
- Gene products names: e.g. cytochrome c is not in the ontologies, but attributes of cytochrome c, such as oxidoreductase activity, are.
- Substrates beyond the specificity of known protein functions: e. g. GO does not include the term protein threonine kinase activity, as there is no known protein kinase that specifically acts on threonine. Hence, 'GO:0004674 protein serine/threonine kinase activity' does not have a descendant for protein threonine kinase activity, since enzymes with that specificity are not known.
- Note that RHEA represents this reaction,
ATP + L-threonyl-[protein]= ADP + H+ + O-phospho-L-threonyl-[protein](RHEA:46608)
- Note that RHEA represents this reaction,
- Sub-reactions: If an enzyme catalyzes a multi-step reaction, only the overall reaction is defined as a GO function; for example tyrosinase EC:188.8.131.52 catalyzes two reactions:
L-tyrosine + O2 = dopaquinone + H2Oand
2 L-dopa + O2 = 2 dopaquinone + 2 H2O
- Submolecular events GO functions are described at the level of molecules rather than atoms. Therefore a reaction would not be split into function terms describing each step of the reaction in atomic or subatomic terms (eg. electrons attracted to positive charge or formation of unstable intermediate); it captures the start and end states with respect to the molecules involved. As a consequence of this, separate function terms are not created to cover situations in which different reaction mechanisms provide the route between the same set of same reactants and products.
- Spontaneous reactions, in which covalent bonds are made/broken at a biologically relevant rate without an enzyme catalyst. Note that reactions that can occur spontaneously but that are are catalyzed by an enzyme are described in GO.
- Attributes of sequence such as intron/exon parameters: these are not attributes of gene products and are described in a separate Sequence Ontology (see the Open Biomedical Ontologies website for more information).
- Protein domains or structural features.
- Reactions that cannot be precisely described at the molecular level
For binding terms, the precise substrate must be known. For example you wouldn't say 'vesicle binding'; instead you would find out which protein in the vesicle membrane was being bound and use that in the term name. (BY THAT GUIDELINE, WE WOULD NOT CREATE 'PROTEIN COMPLEX BINDING??' membrane binding, etc? although we certainly should not have 'vesicle binding', the reason could be formulated more clearly)
Example: Multi-step reactions captured as multiple GO MFs
(+++ CHECK RHEA) - is this a good example??? Is this catalyzed by the same protein, looks like its a complex
Conversely, the reactions of the glycine cleavage system , a multienzyme complex involved in the catabolism of glycine, should be represented by separate function terms.
The overall reaction of the complex is glycine + tetrahydrofolate + NAD = NH3 + 5,10-methylene-THF + CO2 + NADH
but this can be split into steps, which, by the criteria above, warrant individual function terms.
GO molecular function terms:
GO:0004375 glycine dehydrogenase (decarboxylating) activity
Catalysis of the reaction: Catalysis of the reaction: glycine + lipoylprotein = S-aminomethyldihydrolipoylprotein + CO2. (EC:184.108.40.206, RHEA:24304)
GO:0004047 aminomethyltransferase activity
Catalysis of the reaction: (6S)-tetrahydrofolate + S-aminomethyldihydrolipoylprotein = (6R)-5,10-methylenetetrahydrofolate + NH3 + dihydrolipoylprotein. (EC:220.127.116.11, RHEA:16945)
Avoid Cellular Component Information
Cellular structures are not functions. Many cellular component references have been made obsolete in the function ontology. For example, a mitochondrial primase needs only be primase activity because annotators can assign location to gene products by annotating with appropriate terms from the cellular component ontology. By contrast, there are many cases where component terms are appropriate in the process ontology. For example, Golgi organization and biogenesis is different from lysosome organization and biogenesis, so the anatomical qualifiers 'Golgi' and 'lysosome' are necessary.
Avoid Binding Relationships
See the binding working group pages on the GO wiki for more on binding and the current thinking on how to annotate binding.
Catalytic activities should not be related in the ontology to binding terms; for example, ATPase activity should not have a relationship to ATP binding hard-coded in the ontology. Binding terms should only be used in cases where a stable binding interaction occurs. There are several reasons for this.
Firstly, transporter, catalysis and binding activities are all in the function ontology, which is used to describe elemental single step activities that occur at the macromolecular level. That means that if we were to further subdivide these functions - for example, splitting the catalysis of a reaction into steps such as "substrate binding", "formation of unstable intermediate" or "attraction of electrons to positive charge" - we would be saying that a reaction was actually a series of functions - i.e. a process. Additionally, we would be going beyond the scope of the molecular function ontology as we would be dealing with events on a molecular or atomic level.
Another reason is the sheer practicality of sorting through the 4000+ catalytic reactions we have in GO and deciding which of the substrates and products should be given 'binding' terms. Should we say that only substrates are bound by an enzyme? How about reversible reactions or cases where the reaction mechanism is unknown?
Finally, the GO binding terms are supposed to represent stable binding interactions, as opposed to the transient binding that occurs prior to catalysis. Hence there should not be a connection between stable binding and catalysis. Function Grouping Terms
Terms in the function ontology should be grouped on the basis of functional similarity, rather than being involved in the same process. For example, the grouping term monosaccharide transporter activity might have children such as glucose transporter activity and ribose transporter activity, and is a valid function term in its own right. However, the term defense/immunity protein activity, used to group terms such as antigen binding, blood coagulation factor activity and Fc receptor activity, is not a valid function as it represents a protein involved in the defense or immune response (process) of an organism. If a grouping term is not a function itself, or it contains disparate children with no functional similarity, it should be made obsolete.
Appending Terms with 'Activity'
GO molecular function terms are all appended (with the exception of the root 'molecular function' and binding terms) with the word 'activity'. This is because GO molecular functions are what philosophers refer to as 'occurrents', meaning events, processes or activities, rather than 'continuants' which are entities e.g. organisms, cells, or chromosomes. The word activity helps distinguish between a protein and its activity, for example, nuclease and nuclease activity.
In fact, a molecular 'function' is distinct from a molecular 'activity'. A function is the potential to perform an activity, whereas an activity is the realization, the occurrence of that function; so in fact, 'molecular function' might more properly be renamed 'molecular activity'. However, for reasons of consistency and stability, the string 'molecular function' endures.
Classifying Enzymatic Reactions
The function ontology has terms representing many of the enzymes in the Enzyme Commission (EC) database, and it uses the EC classification system to group and classify them. To stay in line with EC, proposed enzyme function terms should be checked in the EC database to ensure that the EC recommended name is used. Not all enzyme entries in the EC database are converted directly to single entries in the function ontology, because some enzymes carry out multiple functions.
If an EC number is given, the term can be added under the EC parent term. For example, thiamin-triphosphatase activity, EC:18.104.22.168, should be added under the parent EC:3.6.1.-, hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides ; GO:0016818. A check of the proposed sibling terms should reveal similar reactions and EC numbers in the same range (EC:3.6.1.x in this case).
If an EC entry cannot be found for the enzyme, it may be worth checking some other databases for it. BRENDA contains the same enzymes as the EC database, but with a greater number of alternative names; MetaCyc University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD) both contain reactions not covered by EC which they have given a partial EC number (eg. EC:2.1.-.-). Both these databases can be used in the general dbxrefs; examples would be MetaCyc:GLYOXIII-RXN and UM-BBD_enzymeID:e0225 respectively. Bi-directional Reactions
For bi-directional reactions, we will create a single term that describes both directions of the reaction unless there is reason to believe that there is a biological justification to separate the two directions of the reaction into separate terms.
A single term covering both directions of a reaction
NADP phosphatase activity
Catalysis of the reaction: H2O + NADP = NAD + phosphate.
Two separate terms covering the opposing directions of a single reaction
sodium-transporting ATP synthase activity, rotational mechanism ; GO:0046932
Catalysis of the reaction: ADP + phosphate + Na+(out) = ATP + H2O + Na+(in), by a rotational mechanism.
sodium-transporting ATPase activity, rotational mechanism ; GO:0046962
Catalysis of the reaction: ATP + H2O + Na+(in) = ADP + phosphate + Na+(out), by a rotational mechanism.
<MOVE TO DESIGN PATTERNS>
The following function terms have standard definitions:
Interacting selectively and non-covalently with x, [brief description of x].
Catalysis of the reaction: [reaction catalyzed by enzyme].
x receptor activity
Combining with x to initiate a change in cell activity.
x transporter activity
Enables the directed movement of x into, out of or within a cell, or between cells.
Synonyms are added to aid searching. The following standard synonyms are used in the molecular function ontology.
x receptor binding
x receptor ligand*
x receptor binding
The gene name or gene class of any ligand that can bind the x receptor.*
N.B. The synonym x receptor ligand describes something that acts as a ligand for the x receptor. Similarly, a gene name or gene class could be added if it binds to that receptor. These phrases are not exact synonyms but are useful search tools for biologists in a specific field. In given biological fields, gene names or classes may be used in the literature to refer to a concept that is analogous to a GO function.
<DELETE> The function must be a single reaction step. Anything that requires multiple steps is a process. </DELETE>
<DELETE> Do not confuse the following:
Two things that happen at the same time or that are done by the same molecule. Two things that are dependent on each other and cannot occur independently.
For example, the proposed term actin binding with sliding actually includes the two functions binding and motor activity, so it is not appropriate as a function term.
Following on from this, it is also important not to confuse the case of two interdependent activities with the superficially similar situation where a process and an activity are dependent on each other. For example, cell adhesion receptor activity would not be a function ontology term since it describes the activity of receptor activity coupled to the process of cell adhesion.
It helps to consider the term name. Is it immediately obvious what's going on or does it sound like a gene product with 'activity' stuck on the end? For example with transporter activity, you know immediately what kind of function this is describing; whereas with actin activity it not really clear. It should be obvious what a function is without in-depth biological knowledge of a certain area. </DELETE>
<DELETE> The Essence of a Function Term
The functions of a gene product are the jobs that it does or the "abilities" that it has. These may include transporting things around, binding to things, holding things together and changing one thing into another. This is different from the biological processes the gene product is involved in, which involve more than one activity. One way to understand this is to consider the analogy of a company or organization. Individuals (gene products) have different abilities or tasks (functions) and they work together to achieve different goals (processes). It is easy to confuse a job title (gene product name) with a function; for example, 'secretarial activity' may seem like a valid function because you have a good conceptual idea of what a secretary does. However, in different companies, secretaries might do different things. One secretary might have the functions 'typing', 'answering phone' and 'making coffee', whilst another might have these functions and additionally 'photocopying'. In the Gene Ontology, a function should be unambiguous and it should mean the same thing no matter what species you are dealing with. If there's any conceptual ambiguity, or you think there could be ambiguity, check to make sure that you're actually talking about a function and not about a gene product. </DELETE>
<DELETE> Parentage and Annotation
- A gene product may have many different functions, but it would be wrong to create a function term that represents multiple functions.
- Gene product information should be captured at the annotation stage, by annotating the gene product to several function terms, rather than by hardwiring the information into the ontology by adding extra parents.
<DELETE> If a term has parentage which isn't immediately obvious from the term name or the definition, and therefore requires you to have background knowledge, then it's probable that the function term has been mistaken for the gene product of the same name and gene product specific information has been incorporated by adding extra parents.
A good example is the term retinoic acid receptor activity, which was wrongly given the parents receptor activity and transcription regulator activity. The ontology structure looked like this:
molecular function [i]receptor activity [i]retinoic acid receptor activity
[i]transcription regulator activity [i]retinoic acid receptor activity
The gene product retinoic acid receptor alpha could be annotated to retinoic acid receptor activity, and, by reasoning over the GO graph, it would also be annotated to the parent terms receptor activity and transcription regulator activity.
The definition (a standard GO receptor activity definition) is "Combining with retinoic acid to initiate a change in cell activity". With extra background reading, you might find out that retinoic acid receptors can function as transcriptional regulators, but there is nothing in the term name or definition to suggest any relationship with transcriptional regulators. If a relationship isn't obvious from the term name or definition, it's probably referring to a gene product property. Encoding gene product information in the function ontology like this has the added hazard of species specificity: in some organisms, a gene product may have different functions to those it has in another organism.
The correct way to deal with this situation is to remove the link in the ontology between retinoic acid receptor activity and transcription regulator activity and capture the information in the annotation instead.
molecular function [i]receptor activity [i]retinoic acid receptor activity
[i]transcription regulator activity
We would annotate retinoic acid receptor alpha to both retinoic acid receptor activity and transcription regulator activity. </DELETE> Back to: Editing the Ontology