Guidelines for new Molecular Functions: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
 
(54 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  From: http://geneontology.org/page/molecular-function-ontology-guidelines
=Molecular Function versus Biological Process=
* '''A GO Molecular Function''' is defined as a molecular process that can be carried out by the action of a single macromolecular machine, either a protein, a non-coding RNA, or a complex. Function in this sense denotes an action, or activity, that a gene product (or a complex) performs, and which usually involved direct physical interactions with other molecular entities. In GO, these actions are described from two distinct but related perspectives: (1) biochemical activity, and (2) role as a component in a larger system/process.
* '''A GO Biological Process''' is a specific objective that an organism is genetically programmed to achieve. Biological processes are often described by their outcome or ending state, e.g., the biological process of cell division results in the creation of two daughter cells (a divided cell) from a single parent cell. A biological process is accomplished by a particular set of molecular functions carried out by specific gene products or macromolecular complexes, often in a highly regulated manner and in a particular sequence.
* '''Molecular functions and biological processes should be orthogonal'''; i. e. a specific concept should correspond to either a molecular function or to a biological process, but should not represented in both aspects of the ontology.
** Example of MF represented as a BP: 'GO:0016569 covalent chromatin modification' corresponds to either DNA methyltransferase activity or to a histone post-translational modification, resulting from a kinase activity, a methyltransferase activity, an acetyltransferase activity, etc., so this term was obsoleted.
** Example of BP represented as a MF: 'GO:0008189 apoptosis inhibitor activity': this term represents the negative regulation of apoptosis, but not a specific molecular function.


=The Essence of a Function Term=
=GO Molecular Functions=
The functions of a gene product are the jobs that it does or the "abilities" that it has. These may include transporting things around, binding to things, holding things together and changing one thing into another. This is different from the biological processes the gene product is involved in, which involve more than one activity. One way to understand this is to consider the analogy of a company or organization. Individuals (gene products) have different abilities or tasks (functions) and they work together to achieve different goals (processes). It is easy to confuse a job title (gene product name) with a function; for example, 'secretarial activity' may seem like a valid function because you have a good conceptual idea of what a secretary does. However, in different companies, secretaries might do different things. One secretary might have the functions 'typing', 'answering phone' and 'making coffee', whilst another might have these functions and additionally 'photocopying'. In the Gene Ontology, a function should be unambiguous and it should mean the same thing no matter what species you are dealing with. If there's any conceptual ambiguity, or you think there could be ambiguity, check to make sure that you're actually talking about a function and not about a gene product.
* '''MF substrate/target specificity:''' The substrate specificity of MFs should represent the inferred, ''in vivo'' biological specificity of the gene product, which may or may not correspond exactly to the substrate used in the experiment.
* '''Compound functions:''' Some functions contain two or more inseparable steps, for example, GO:0015611 ABC-type D-ribose transporter activity, that hydrolyzes ATP to provide energy for the transport of a substrate, or a GO:0038023 signaling receptor activity, which binds to a ligand signal and acts to transmit that signal (for example by a conformational change that exposes a new binding site or by a modification of a downstream protein). In these cases, the subfunction is captured by the relation 'has part'; for example ABC transporters have the relation 'has part' some 'ATP hydrolysis activity'.
* '''Specific substrate and positional information:''' Positional information of modifications on proteins and RNA, such as targets of protein kinases, are usually not in the scope of GO, with the following exceptions:
** histone code, for which the activity of histone methyltransferases on specific substrates is captured in the ontology, for example
***  GO:0042800 histone H3K4 methyltransferase activity
***  GO:0046976 histone H3K27 methyltransferase activity
*** ''Note that irreversible modifications that are not part of the histone code are not captured ([https://github.com/geneontology/go-ontology/issues/24963 see discussion]).''


=Standard Definitions=
===Appending Terms with 'Activity'===
The following function terms have standard definitions:


x binding
GO molecular function terms are all appended (with the exception of the root 'molecular function' and binding terms) with the word 'activity', to help distinguish between a protein and its activity, for example, nuclease and nuclease activity.
    Interacting selectively and non-covalently with x, [brief description of x].
[enzyme] activity
    Catalysis of the reaction: [reaction catalyzed by enzyme].
x receptor activity
    Combining with x to initiate a change in cell activity.
x transporter activity
    Enables the directed movement of x into, out of or within a cell, or between cells.  


=Standard Synonyms=
===Enzymatic Reactions===
Synonyms are added to aid searching. The following standard synonyms are used in the molecular function ontology.
====Classifying Enzymatic Reactions====


x receptor binding
The function ontology has terms representing many of the enzymes in the Enzyme Commission (EC) database, and it uses the EC classification system to group and classify them. To stay in line with EC, proposed enzyme function terms should be checked in the EC database to ensure that the EC recommended name is used. Not all enzyme entries in the EC database are converted directly to single entries in the function ontology, because some enzymes carry out multiple functions.
    x receptor ligand*
x receptor binding
    The gene name or gene class of any ligand that can bind the x receptor.*


*N.B. The synonym x receptor ligand describes something that acts as a ligand for the x receptor. Similarly, a gene name or gene class could be added if it binds to that receptor. These phrases are not exact synonyms but are useful search tools for biologists in a specific field. In given biological fields, gene names or classes may be used in the literature to refer to a concept that is analogous to a GO function.
If an EC number is given, the term can be added under the EC parent term. For example, thiamin-triphosphatase activity, EC:3.6.1.28, should be added under the parent EC:3.6.1.-, hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides ; GO:0016818. A check of the proposed sibling terms should reveal similar reactions and EC numbers in the same range (EC:3.6.1.x in this case).


=Parentage and Annotation=
If an EC entry cannot be found for the enzyme, it may be worth checking some other databases for it. BRENDA contains the same enzymes as the EC database, but with a greater number of alternative names; MetaCyc University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD) both contain reactions not covered by EC which they have given a partial EC number (eg. EC:2.1.-.-). Both these databases can be used in the general dbxrefs; examples would be MetaCyc:GLYOXIII-RXN and UM-BBD_enzymeID:e0225 respectively.


A gene product may have many different functions, but it would be wrong to create a function term that represents multiple functions. Gene product information should be captured at the annotation stage, by annotating the gene product to several function terms, rather than by hardwiring the information into the ontology by adding extra parents. If a term has parentage which isn't immediately obvious from the term name or the definition, and therefore requires you to have background knowledge, then it's probable that the function term has been mistaken for the gene product of the same name and gene product specific information has been incorporated by adding extra parents.
==Multi-step reaction==
A multi-step reaction is represented as a single GO MF, as long as the intermediates are immediately consumed in the reaction and that a single product is released. The exception to this is situations in which the intermediates are released and used in other biological processes.


A good example is the term retinoic acid receptor activity, which was wrongly given the parents receptor activity and transcription regulator activity. The ontology structure looked like this:
==Bi-directional reactions==
Normally, bi-directional reactions are represented as a single term that describes both directions of the reaction, ''unless there is a biological justification to separate the two directions of the reaction into separate terms.'' An example of this exception is


    molecular function
* GO:0046932 sodium-transporting ATP synthase activity, rotational mechanism: Enables the transfer of a solute or solutes from one side of a membrane to the other according to the reaction: ADP + phosphate + Na+(out) => ATP + H2O + Na+(in), by a rotational mechanism.
        [i]receptor activity
and
            [i]retinoic acid receptor activity
* GO:0046962 sodium-transporting ATPase activity, rotational mechanism: Enables the transfer of a solute or solutes from one side of a membrane to the other according to the reaction: ATP + H2O + Na+(in) -> ADP + phosphate + Na+(out), by a rotational mechanism.


...
==NAD/NADP cofactors==
* Reactions represented with NAD(P) in GO indicates that it is not know whether an enzyme uses NAD or NADP (note that this is different from the IUBMB practice).  
** If the cofactor (NAD or NADP) is unknown for a class of reactions, then we only create a single GO term with [NAD(P)].  
*** If there are RHEA IDs for both NAD and NADP-dependent reactions, we add these as NARROW xref to the general [NAD(P)].  


    [i]transcription regulator activity
* If the specific cofactor is known, we create one or both relevant reactions. In this case, we also create (or keep a general parent grouping class with [NAD(P)], if it exists). The structure of the ontology is:
        [i]retinoic acid receptor activity


The gene product retinoic acid receptor alpha could be annotated to retinoic acid receptor activity, and, by reasoning over the GO graph, it would also be annotated to the parent terms receptor activity and transcription regulator activity.
* x activity NAD(P)
** x activity NAD
** x activity NADP


The definition (a standard GO receptor activity definition) is "Combining with retinoic acid to initiate a change in cell activity". With extra background reading, you might find out that retinoic acid receptors can function as transcriptional regulators, but there is nothing in the term name or definition to suggest any relationship with transcriptional regulators. If a relationship isn't obvious from the term name or definition, it's probably referring to a gene product property. Encoding gene product information in the function ontology like this has the added hazard of species specificity: in some organisms, a gene product may have different functions to those it has in another organism.
* If an enzyme uses both, then it is annotated to both NAD and NADP terms.
* If the cofactor specificity is always known, then we marke the grouping class 'do not annotate'. An example of this is [http://amigo.geneontology.org/amigo/term/GO:0004448 isocitrate dehydrogenase activity].


The correct way to deal with this situation is to remove the link in the ontology between retinoic acid receptor activity and transcription regulator activity and capture the information in the annotation instead.
=Out of scope=
* '''Gene products names''': e.g. cytochrome c is not in the ontologies, but attributes of cytochrome c, such as oxidoreductase activity, are.
** Example: GO:0102113 hypoxia-inducible factor-asparagine oxygenase activity  was merged into its parent GO:0036140 [protein]-asparagine 3-dioxygenase activity since it represented a gene product. Note that GO:0102113 is in [https://enzyme.expasy.org/EC/1.14.11.30 EC] and in [https://www.rhea-db.org/rhea/54268 RHEA], but these are outside the scope of GO.
* '''Substrates beyond the specificity of known protein functions:''' e. g. GO does not include the term ''protein threonine kinase activity'', as there is no known protein kinase that specifically acts on threonine. Hence, 'GO:0004674 protein serine/threonine kinase activity' does not have a descendant for protein threonine kinase activity, since enzymes with that specificity are not known.
** Note that RHEA represents this reaction, <code>ATP + L-threonyl-<nowiki>[</nowiki>protein<nowiki>]</nowiki>= ADP + H+ + O-phospho-L-threonyl-<nowiki>[</nowiki>protein<nowiki>]</nowiki></code> ([https://www.rhea-db.org/rhea/46608 RHEA:46608)]
* '''Sub-reactions''':  If an enzyme catalyzes a multi-step reaction, only the overall reaction is defined as a GO function; for example [https://enzyme.expasy.org/EC/1.14.18.1 tyrosinase EC:1.14.18.1] catalyzes two reactions: <code>L-tyrosine + O2 = dopaquinone + H2O</code> and <code>2 L-dopa + O2 = 2 dopaquinone + 2 H2O</code>
* '''Submolecular events''' GO functions are described at the level of molecules rather than atoms. Therefore a reaction would not be split into function terms describing each step of the reaction in atomic or subatomic terms (eg. electrons attracted to positive charge or formation of unstable intermediate); it captures the start and end states with respect to the molecules involved.  As a consequence of this, separate function terms are not created to cover situations in which different reaction mechanisms provide the route between the same set of same reactants and products.
* '''Spontaneous reactions''', in which covalent bonds are made/broken at a biologically relevant rate without an enzyme catalyst. Note that reactions that ''can'' occur spontaneously but that are are catalyzed by an enzyme are described in GO; for example GO:0050453 obsolete cob(II)alamin reductase activity.
* '''Attributes of sequence such as intron/exon parameters''': these are not attributes of gene products and are described in a separate Sequence Ontology (see the Open Biomedical Ontologies website for more information).
* '''Protein domains or structural features.'''
* '''Reactions that cannot be precisely described at the molecular level'''
For binding terms, the precise substrate must be known. For example you wouldn't say 'vesicle binding'; instead you would find out which protein in the vesicle membrane was being bound and use that in the term name. (BY THAT GUIDELINE, WE WOULD NOT CREATE 'PROTEIN COMPLEX BINDING??' membrane binding, etc? although we certainly should not have 'vesicle binding', the reason could be formulated more clearly)


    molecular function
        [i]receptor activity
            [i]retinoic acid receptor activity


...


    [i]transcription regulator activity
====Example: Multi-step reactions captured as multiple GO MFs====
 
(+++ CHECK RHEA) - is this a good example??? Is this catalyzed by the same protein, looks like its a complex
We would annotate retinoic acid receptor alpha to both retinoic acid receptor activity and transcription regulator activity.
 
=Granularity=
 
GO functions describe interactions at the level of molecules, rather than atoms. Therefore a reaction would not be split into function terms describing each step of the reaction in atomic or subatomic terms (eg. electrons attracted to positive charge or formation of unstable intermediate); it would consider the starting state and the end state in terms of the molecules involved. As a consequence of this, separate function terms would be created to cover the various situations in which different reaction mechanisms provide the route between the same set of same reactants and products. In addition to this, GO functions should not cover reactions that always occur spontaneously and without the need for a gene product catalyst. Since there is no gene product involved in such a reaction, the term would never be used for annotation.
 
If you have a reaction where you can pick out molecular intermediates, you should consider whether to make multiple function terms. If the intermediates are known to be released and used in other biological processes, function terms should be added to represent the steps of the reaction, and a biological process term added to represent the sum of these functions. If the separate activities generating and acting on the molecular intermediates can be associated with different subunits of the enzyme, two or more functions should probably be made. If neither of these conditions holds, a single function term can be made.
 
'''Example''': In the reaction catalyzed by magnesium-protoporphyrin IX monomethyl ester (oxidative) cyclase, the intermediates are not released and the three separate catalytic activities are not associated with different subunits of the enzyme, so a single function term is appropriate.
 
Enzyme Commission entry for magnesium-protoporphyrin IX monomethyl ester (oxidative) cyclase, EC 1.14.13.81:
 
    magnesium-protoporphyrin IX 13-monomethyl ester + NADPH + H+ + O2 = 13(1)-hydroxy-magnesium-protoporphyrin IX 13-monomethyl ester + NADP+ + H2O
    13(1)-hydroxy-magnesium-protoporphyrin IX 13-monomethyl ester + NADPH + H+ + O2 = 13(1)-oxo-magnesium-protoporphyrin IX 13-monomethyl ester + NADP+ + 2 H2O
    13(1)-oxo-magnesium-protoporphyrin IX 13-monomethyl ester + NADPH + H+ + O2 = divinylprotochlorophyllide + NADP+ + 2 H2O
 
GO molecular function term:
 
magnesium-protoporphyrin IX monomethyl ester (oxidative) cyclase activity
    Catalysis of the reaction: magnesium-protoporphyrin IX 13-monomethyl ester + 3 NADPH + 3 H+ + 3 O2 = divinylprotochlorophyllide + 3 NADP+ + 5 H2O.


Conversely, the reactions of the glycine cleavage system , a multienzyme complex involved in the catabolism of glycine, should be represented by separate function terms.
Conversely, the reactions of the glycine cleavage system , a multienzyme complex involved in the catabolism of glycine, should be represented by separate function terms.
Line 84: Line 78:
GO molecular function terms:
GO molecular function terms:


glycine dehydrogenase (decarboxylating) activity
GO:0004375 glycine dehydrogenase (decarboxylating) activity
    Catalysis of the reaction: glycine + H-protein-lipoyllysine = H-protein-S-aminomethyldihydrolipoyllysine + CO2.
aminomethyltransferase activity
    Catalysis of the reaction: protein-S-aminomethyldihydrolipoyllysine + tetrahydrofolate = protein-dihydrolipoyllysine + 5,10-methylenetetrahydrofolate + NH3.
 
=Enzymatic Reactions=
==Classifying Enzymatic Reactions==
The function ontology has terms representing many of the enzymes in the Enzyme Commission (EC) database, and it uses the EC classification system to group and classify them. To stay in line with EC, proposed enzyme function terms should be checked in the EC database to ensure that the EC recommended name is used. Not all enzyme entries in the EC database are converted directly to single entries in the function ontology, because some enzymes carry out multiple functions.
 
If an EC number is given, the term can be added under the EC parent term. For example, thiamin-triphosphatase activity, EC:3.6.1.28, should be added under the parent EC:3.6.1.-, hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides ; GO:0016818. A check of the proposed sibling terms should reveal similar reactions and EC numbers in the same range (EC:3.6.1.x in this case).


If an EC entry cannot be found for the enzyme, it may be worth checking some other databases for it. BRENDA contains the same enzymes as the EC database, but with a greater number of alternative names; MetaCyc University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD) both contain reactions not covered by EC which they have given a partial EC number (eg. EC:2.1.-.-). Both these databases can be used in the general dbxrefs; examples would be MetaCyc:GLYOXIII-RXN and UM-BBD_enzymeID:e0225 respectively.
  Catalysis of the reaction: Catalysis of the reaction: glycine + lipoylprotein = S-aminomethyldihydrolipoylprotein + CO2. (EC:1.4.4.2, RHEA:24304)


==Bi-directional Reactions==
GO:0004047 aminomethyltransferase activity
For bi-directional reactions, we will create a single term that describes both directions of the reaction unless there is reason to believe that there is a biological justification to separate the two directions of the reaction into separate terms.


A single term covering both directions of a reaction
  Catalysis of the reaction: (6S)-tetrahydrofolate + S-aminomethyldihydrolipoylprotein = (6R)-5,10-methylenetetrahydrofolate + NH3 + dihydrolipoylprotein. (EC:2.1.2.10, RHEA:16945)


NADP phosphatase activity
=TO BE REVIEWED=
    Catalysis of the reaction: H2O + NADP = NAD + phosphate.


Two separate terms covering the opposing directions of a single reaction
===Avoid Cellular Component Information===


sodium-transporting ATP synthase activity, rotational mechanism ; GO:0046932
    Catalysis of the reaction: ADP + phosphate + Na+(out) = ATP + H2O + Na+(in), by a rotational mechanism.
sodium-transporting ATPase activity, rotational mechanism ; GO:0046962
    Catalysis of the reaction: ATP + H2O + Na+(in) = ADP + phosphate + Na+(out), by a rotational mechanism.
==[[Curator_Guide:_Enzymes_and_Reactions]]==
=Valid Function Terms=
These are some guidelines for deciding whether a term is a valid molecular function or not.
For a function term that considers binding you must know the molecule that is being bound. For example you wouldn't say 'vesicle binding'; instead you would find out which protein in the vesicle membrane was being bound and use that in the term name.
The function must be a single reaction step. Anything that requires multiple steps is a process.
Functions are not restricted to the activities of single gene products; multi-gene product complexes can also have functions.
Do not confuse the following:
    Two things that happen at the same time or that are done by the same molecule.
    Two things that are dependent on each other and cannot occur independently.
For example, the proposed term actin binding with sliding actually includes the two functions binding and motor activity, so it is not appropriate as a function term. However, calcium-transporting ATPase activity represents two activities that are dependent on each other and cannot occur independently; thus calcium-transporting ATPase activity is appropriate as a GO molecular function.
Following on from this, it is also important not to confuse the case of two interdependent activities with the superficially similar situation where a process and an activity are dependent on each other. For example, cell adhesion receptor activity would not be a function ontology term since it describes the activity of receptor activity coupled to the process of cell adhesion.
It helps to consider the term name. Is it immediately obvious what's going on or does it sound like a gene product with 'activity' stuck on the end? For example with transporter activity, you know immediately what kind of function this is describing; whereas with actin activity it not really clear. It should be obvious what a function is without in-depth biological knowledge of a certain area.
==Avoid Cellular Component Information==
Cellular structures are not functions. Many cellular component references have been made obsolete in the function ontology. For example, a mitochondrial primase needs only be primase activity because annotators can assign location to gene products by annotating with appropriate terms from the cellular component ontology. By contrast, there are many cases where component terms are appropriate in the process ontology. For example, Golgi organization and biogenesis is different from lysosome organization and biogenesis, so the anatomical qualifiers 'Golgi' and 'lysosome' are necessary.
Cellular structures are not functions. Many cellular component references have been made obsolete in the function ontology. For example, a mitochondrial primase needs only be primase activity because annotators can assign location to gene products by annotating with appropriate terms from the cellular component ontology. By contrast, there are many cases where component terms are appropriate in the process ontology. For example, Golgi organization and biogenesis is different from lysosome organization and biogenesis, so the anatomical qualifiers 'Golgi' and 'lysosome' are necessary.


==Avoid Gene Products==
Gene products in themselves are not nodes of the function ontology, although doing something with or to a specific gene product can be one. For example, being hedgehog or a hedgehog receptor are not functions, but hedgehog receptor binding and hedgehog binding are functions. Most GO molecular function terms include the word 'activity' to help differentiate them from the physical gene product. When defining molecular function terms, be careful not to describe them as gene products. For example, the molecular function term kinase activity is defined as 'Catalysis of the transfer of a phosphate group, usually from ATP, to a substrate molecule', not 'an enzyme that catalyzes the transfer of a phosphate group, usually from ATP, to a substrate molecule'.
An activity should not be named after a gene product, as a gene product could potentially have multiple molecular functions and not just the one it's named after. For example, <code>protein phosphatase inhibitor activity</code> means <code>"directly [i.e. via direct physical interaction] inhibits some protein phosphatase activity"</code>, not "directly inhibits some protein named protein phosphatase 2A"-- properly speaking the protein itself is not inhibited, instead, its activity is.
==Avoid Gene Products: Exceptions==
There are cases in which the specificity, for instance terms describing histone modifications:
* GO:0046976 histone methyltransferase activity (H3-K27 specific)
* GO:0046975 histone methyltransferase activity (H3-K36 specific)
* GO:0042800 histone methyltransferase activity (H3-K4 specific)
This is to capture the histone code.
==Function Terms for Subunits==
Regulatory and catalytic subunits of kinases, heterotrimeric G proteins, etc., are represented in the function ontology by a regulator activity term, under the enzyme regulator activity node, and an enzyme activity term, under the catalytic activity node.


Note that GO no longer uses the part of relationship between the enzyme regulator term and the catalytic activity term. A full discussion of this topic can be found under 'Annotation Issues' in the minutes from the September 2003 Bar Harbor GO meeting.


Please see the GO annotation guide for advice on how to annotate subunits of a complex.
===Avoid Binding Relationships===


==Avoid Binding Relationships==
See the binding working group pages on the GO wiki for more on binding and the current thinking on how to annotate binding.
See the binding working group pages on the GO wiki for more on binding and the current thinking on how to annotate binding.


Line 167: Line 105:


Finally, the GO binding terms are supposed to represent stable binding interactions, as opposed to the transient binding that occurs prior to catalysis. Hence there should not be a connection between stable binding and catalysis.
Finally, the GO binding terms are supposed to represent stable binding interactions, as opposed to the transient binding that occurs prior to catalysis. Hence there should not be a connection between stable binding and catalysis.
Function Grouping Terms


=Function Grouping Terms=
Terms in the function ontology should be grouped on the basis of functional similarity, rather than being involved in the same process. For example, the grouping term monosaccharide transporter activity might have children such as glucose transporter activity and ribose transporter activity, and is a valid function term in its own right. However, the term defense/immunity protein activity, used to group terms such as antigen binding, blood coagulation factor activity and Fc receptor activity, is not a valid function as it represents a protein involved in the defense or immune response (process) of an organism. If a grouping term is not a function itself, or it contains disparate children with no functional similarity, it should be made obsolete.
Terms in the function ontology should be grouped on the basis of functional similarity, rather than being involved in the same process. For example, the grouping term monosaccharide transporter activity might have children such as glucose transporter activity and ribose transporter activity, and is a valid function term in its own right. However, the term defense/immunity protein activity, used to group terms such as antigen binding, blood coagulation factor activity and Fc receptor activity, is not a valid function as it represents a protein involved in the defense or immune response (process) of an organism. If a grouping term is not a function itself, or it contains disparate children with no functional similarity, it should be made obsolete.


=Appending Terms with 'Activity'=
GO molecular function terms are all appended (with the exception of the root 'molecular function', and all binding terms) with the word 'activity'. This is because GO molecular functions are what philosophers would call 'occurrents', meaning events, processes or activities, rather than 'continuants' which are entities e.g. organisms, cells, or chromosomes. The word activity helps distinguish between the protein and the activity of that protein, for example, nuclease and nuclease activity.


In fact, a molecular 'function' is distinct from a molecular 'activity'. A function is the potential to perform an activity, whereas an activity is the realization, the occurrence of that function; so in fact, 'molecular function' might more properly be renamed 'molecular activity'. However, for reasons of consistency and stability, the string 'molecular function' endures.
 
----


[[Ontology_Development#Editing_the_Ontology |Back to: Editing the Ontology]]
[[Ontology_Development#Editing_the_Ontology |Back to: Editing the Ontology]]


[[Category:Ontology]][[Category:GO Editors]]
[[Category:Ontology]][[Category:GO Editors]]

Latest revision as of 11:34, 9 March 2023

Molecular Function versus Biological Process

  • A GO Molecular Function is defined as a molecular process that can be carried out by the action of a single macromolecular machine, either a protein, a non-coding RNA, or a complex. Function in this sense denotes an action, or activity, that a gene product (or a complex) performs, and which usually involved direct physical interactions with other molecular entities. In GO, these actions are described from two distinct but related perspectives: (1) biochemical activity, and (2) role as a component in a larger system/process.
  • A GO Biological Process is a specific objective that an organism is genetically programmed to achieve. Biological processes are often described by their outcome or ending state, e.g., the biological process of cell division results in the creation of two daughter cells (a divided cell) from a single parent cell. A biological process is accomplished by a particular set of molecular functions carried out by specific gene products or macromolecular complexes, often in a highly regulated manner and in a particular sequence.
  • Molecular functions and biological processes should be orthogonal; i. e. a specific concept should correspond to either a molecular function or to a biological process, but should not represented in both aspects of the ontology.
    • Example of MF represented as a BP: 'GO:0016569 covalent chromatin modification' corresponds to either DNA methyltransferase activity or to a histone post-translational modification, resulting from a kinase activity, a methyltransferase activity, an acetyltransferase activity, etc., so this term was obsoleted.
    • Example of BP represented as a MF: 'GO:0008189 apoptosis inhibitor activity': this term represents the negative regulation of apoptosis, but not a specific molecular function.

GO Molecular Functions

  • MF substrate/target specificity: The substrate specificity of MFs should represent the inferred, in vivo biological specificity of the gene product, which may or may not correspond exactly to the substrate used in the experiment.
  • Compound functions: Some functions contain two or more inseparable steps, for example, GO:0015611 ABC-type D-ribose transporter activity, that hydrolyzes ATP to provide energy for the transport of a substrate, or a GO:0038023 signaling receptor activity, which binds to a ligand signal and acts to transmit that signal (for example by a conformational change that exposes a new binding site or by a modification of a downstream protein). In these cases, the subfunction is captured by the relation 'has part'; for example ABC transporters have the relation 'has part' some 'ATP hydrolysis activity'.
  • Specific substrate and positional information: Positional information of modifications on proteins and RNA, such as targets of protein kinases, are usually not in the scope of GO, with the following exceptions:
    • histone code, for which the activity of histone methyltransferases on specific substrates is captured in the ontology, for example
      • GO:0042800 histone H3K4 methyltransferase activity
      • GO:0046976 histone H3K27 methyltransferase activity
      • Note that irreversible modifications that are not part of the histone code are not captured (see discussion).

Appending Terms with 'Activity'

GO molecular function terms are all appended (with the exception of the root 'molecular function' and binding terms) with the word 'activity', to help distinguish between a protein and its activity, for example, nuclease and nuclease activity.

Enzymatic Reactions

Classifying Enzymatic Reactions

The function ontology has terms representing many of the enzymes in the Enzyme Commission (EC) database, and it uses the EC classification system to group and classify them. To stay in line with EC, proposed enzyme function terms should be checked in the EC database to ensure that the EC recommended name is used. Not all enzyme entries in the EC database are converted directly to single entries in the function ontology, because some enzymes carry out multiple functions.

If an EC number is given, the term can be added under the EC parent term. For example, thiamin-triphosphatase activity, EC:3.6.1.28, should be added under the parent EC:3.6.1.-, hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides ; GO:0016818. A check of the proposed sibling terms should reveal similar reactions and EC numbers in the same range (EC:3.6.1.x in this case).

If an EC entry cannot be found for the enzyme, it may be worth checking some other databases for it. BRENDA contains the same enzymes as the EC database, but with a greater number of alternative names; MetaCyc University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD) both contain reactions not covered by EC which they have given a partial EC number (eg. EC:2.1.-.-). Both these databases can be used in the general dbxrefs; examples would be MetaCyc:GLYOXIII-RXN and UM-BBD_enzymeID:e0225 respectively.

Multi-step reaction

A multi-step reaction is represented as a single GO MF, as long as the intermediates are immediately consumed in the reaction and that a single product is released. The exception to this is situations in which the intermediates are released and used in other biological processes.

Bi-directional reactions

Normally, bi-directional reactions are represented as a single term that describes both directions of the reaction, unless there is a biological justification to separate the two directions of the reaction into separate terms. An example of this exception is

  • GO:0046932 sodium-transporting ATP synthase activity, rotational mechanism: Enables the transfer of a solute or solutes from one side of a membrane to the other according to the reaction: ADP + phosphate + Na+(out) => ATP + H2O + Na+(in), by a rotational mechanism.

and

  • GO:0046962 sodium-transporting ATPase activity, rotational mechanism: Enables the transfer of a solute or solutes from one side of a membrane to the other according to the reaction: ATP + H2O + Na+(in) -> ADP + phosphate + Na+(out), by a rotational mechanism.

NAD/NADP cofactors

  • Reactions represented with NAD(P) in GO indicates that it is not know whether an enzyme uses NAD or NADP (note that this is different from the IUBMB practice).
    • If the cofactor (NAD or NADP) is unknown for a class of reactions, then we only create a single GO term with [NAD(P)].
      • If there are RHEA IDs for both NAD and NADP-dependent reactions, we add these as NARROW xref to the general [NAD(P)].
  • If the specific cofactor is known, we create one or both relevant reactions. In this case, we also create (or keep a general parent grouping class with [NAD(P)], if it exists). The structure of the ontology is:
  • x activity NAD(P)
    • x activity NAD
    • x activity NADP
  • If an enzyme uses both, then it is annotated to both NAD and NADP terms.
  • If the cofactor specificity is always known, then we marke the grouping class 'do not annotate'. An example of this is isocitrate dehydrogenase activity.

Out of scope

  • Gene products names: e.g. cytochrome c is not in the ontologies, but attributes of cytochrome c, such as oxidoreductase activity, are.
    • Example: GO:0102113 hypoxia-inducible factor-asparagine oxygenase activity was merged into its parent GO:0036140 [protein]-asparagine 3-dioxygenase activity since it represented a gene product. Note that GO:0102113 is in EC and in RHEA, but these are outside the scope of GO.
  • Substrates beyond the specificity of known protein functions: e. g. GO does not include the term protein threonine kinase activity, as there is no known protein kinase that specifically acts on threonine. Hence, 'GO:0004674 protein serine/threonine kinase activity' does not have a descendant for protein threonine kinase activity, since enzymes with that specificity are not known.
    • Note that RHEA represents this reaction, ATP + L-threonyl-[protein]= ADP + H+ + O-phospho-L-threonyl-[protein] (RHEA:46608)
  • Sub-reactions: If an enzyme catalyzes a multi-step reaction, only the overall reaction is defined as a GO function; for example tyrosinase EC:1.14.18.1 catalyzes two reactions: L-tyrosine + O2 = dopaquinone + H2O and 2 L-dopa + O2 = 2 dopaquinone + 2 H2O
  • Submolecular events GO functions are described at the level of molecules rather than atoms. Therefore a reaction would not be split into function terms describing each step of the reaction in atomic or subatomic terms (eg. electrons attracted to positive charge or formation of unstable intermediate); it captures the start and end states with respect to the molecules involved. As a consequence of this, separate function terms are not created to cover situations in which different reaction mechanisms provide the route between the same set of same reactants and products.
  • Spontaneous reactions, in which covalent bonds are made/broken at a biologically relevant rate without an enzyme catalyst. Note that reactions that can occur spontaneously but that are are catalyzed by an enzyme are described in GO; for example GO:0050453 obsolete cob(II)alamin reductase activity.
  • Attributes of sequence such as intron/exon parameters: these are not attributes of gene products and are described in a separate Sequence Ontology (see the Open Biomedical Ontologies website for more information).
  • Protein domains or structural features.
  • Reactions that cannot be precisely described at the molecular level

For binding terms, the precise substrate must be known. For example you wouldn't say 'vesicle binding'; instead you would find out which protein in the vesicle membrane was being bound and use that in the term name. (BY THAT GUIDELINE, WE WOULD NOT CREATE 'PROTEIN COMPLEX BINDING??' membrane binding, etc? although we certainly should not have 'vesicle binding', the reason could be formulated more clearly)


Example: Multi-step reactions captured as multiple GO MFs

(+++ CHECK RHEA) - is this a good example??? Is this catalyzed by the same protein, looks like its a complex

Conversely, the reactions of the glycine cleavage system , a multienzyme complex involved in the catabolism of glycine, should be represented by separate function terms.

The overall reaction of the complex is glycine + tetrahydrofolate + NAD = NH3 + 5,10-methylene-THF + CO2 + NADH

but this can be split into steps, which, by the criteria above, warrant individual function terms.

GO molecular function terms:

GO:0004375 glycine dehydrogenase (decarboxylating) activity

  Catalysis of the reaction: Catalysis of the reaction: glycine + lipoylprotein = S-aminomethyldihydrolipoylprotein + CO2. (EC:1.4.4.2, RHEA:24304)

GO:0004047 aminomethyltransferase activity

  Catalysis of the reaction: (6S)-tetrahydrofolate + S-aminomethyldihydrolipoylprotein = (6R)-5,10-methylenetetrahydrofolate + NH3 + dihydrolipoylprotein. (EC:2.1.2.10, RHEA:16945)

TO BE REVIEWED

Avoid Cellular Component Information

Cellular structures are not functions. Many cellular component references have been made obsolete in the function ontology. For example, a mitochondrial primase needs only be primase activity because annotators can assign location to gene products by annotating with appropriate terms from the cellular component ontology. By contrast, there are many cases where component terms are appropriate in the process ontology. For example, Golgi organization and biogenesis is different from lysosome organization and biogenesis, so the anatomical qualifiers 'Golgi' and 'lysosome' are necessary.


Avoid Binding Relationships

See the binding working group pages on the GO wiki for more on binding and the current thinking on how to annotate binding.

Catalytic activities should not be related in the ontology to binding terms; for example, ATPase activity should not have a relationship to ATP binding hard-coded in the ontology. Binding terms should only be used in cases where a stable binding interaction occurs. There are several reasons for this.

Firstly, transporter, catalysis and binding activities are all in the function ontology, which is used to describe elemental single step activities that occur at the macromolecular level. That means that if we were to further subdivide these functions - for example, splitting the catalysis of a reaction into steps such as "substrate binding", "formation of unstable intermediate" or "attraction of electrons to positive charge" - we would be saying that a reaction was actually a series of functions - i.e. a process. Additionally, we would be going beyond the scope of the molecular function ontology as we would be dealing with events on a molecular or atomic level.

Another reason is the sheer practicality of sorting through the 4000+ catalytic reactions we have in GO and deciding which of the substrates and products should be given 'binding' terms. Should we say that only substrates are bound by an enzyme? How about reversible reactions or cases where the reaction mechanism is unknown?

Finally, the GO binding terms are supposed to represent stable binding interactions, as opposed to the transient binding that occurs prior to catalysis. Hence there should not be a connection between stable binding and catalysis. Function Grouping Terms

Terms in the function ontology should be grouped on the basis of functional similarity, rather than being involved in the same process. For example, the grouping term monosaccharide transporter activity might have children such as glucose transporter activity and ribose transporter activity, and is a valid function term in its own right. However, the term defense/immunity protein activity, used to group terms such as antigen binding, blood coagulation factor activity and Fc receptor activity, is not a valid function as it represents a protein involved in the defense or immune response (process) of an organism. If a grouping term is not a function itself, or it contains disparate children with no functional similarity, it should be made obsolete.



Back to: Editing the Ontology