Has input: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
(23 intermediate revisions by 2 users not shown)
Line 4: Line 4:


===What to capture with the has_input relation===
===What to capture with the has_input relation===
* <code>has input</code> is used to specify the specific molecular target of a Molecular Function or a Biological Process. Targets correspond to specific substrates for enzymes, interactors for binding and adaptor activities, gene for transcription factors. Is also used to describe inputs of Biological Processes that have a high degree of molecular specificity, e.g. to specify the target gene for 'regulation of transcription'.
* <code>has input</code> is used to specify the '''specific molecular target''' of a Molecular Function or a Biological Process.
* Use of <code>has input</code> is particularly encouraged when the molecular target of the Molecular Function or Biological Process is more specific than what the GO term describes. A common use case is to specify the exact protein target of an MF, e.g. the target of a protein kinase activity.
* The entity captured must have a unique, resolvable database identifier.  As with annotation objects, gene identifiers may be used as a stand-in for a gene product, e.g. an mRNA or a protein.  Curators may also use more specific gene product identifiers, e.g. RNACentral, PRO, Complex Portal, in extensions.
* More than one input may be captured for an annotation; this means that there are multiple substrates for a single reaction. However, if an enzyme can act on ''different'' substrates, or if a transcription factor has multiple targets, this should be captured as independent annotations or independent annotation extensions for an annotation.  
* Specific protein isoforms or post-translationally modified proteins may be captured as inputs.
* In standard annotations, different annotations are not linked, so the BP annotation that corresponds to a MF should also have an extension if appropriate.
* For Molecular Functions, inputs include:
** Specific substrates for enzymes
** Interactors for binding and adaptor activities
** Target genes for transcription factors
* For Biological Processes that have a high degree of molecular specificity inputs include:
** Target genes for 'regulation of transcription' or other 'regulation of gene expression' terms.
* Use of <code>has input</code> is particularly encouraged when the target of the Molecular Function or Biological Process is more specific than what the GO term describes. A common use case is to specify the exact protein target of an MF, e.g. the substrate of a protein kinase activity, or a more specific chemical entity acted upon by an MF that is not otherwise stated in the term definition.
* Note that there does NOT need to be evidence for a direct physical interaction in order to capture an input.  Direct physical interaction between an enabler and its input may be implied (e.g. an enzymatic activity assay) and does not necessarily need to be shown with a separate physical interaction assay (e.g. Y2H or co-IP).
* More than one input may be captured for an annotation; this means that there are multiple substrates for a single reaction. However, if an enzyme can act on ''different'' substrates, or if a transcription factor has multiple targets, this should be captured as independent annotations with independent annotation extensions for each.  
* In standard annotations, different annotations are not linked, so the BP annotation that corresponds to an MF should also have an extension if appropriate.


===What not to capture===
===What not to capture===
* It not necessary to capture the input if it already represented by the GO term label or definition. For example for ''GO:0004396 hexokinase activity, has input(CHEBI:4194 D-hexose)'' is redundant with the definition of the term and does not need to be captured.
* It not necessary to capture the input if it already represented by the GO term label or definition. For example for ''GO:0004396 hexokinase activity, has input(CHEBI:4194 D-hexose)'' is redundant with the textual definition of the term and does not need to be captured.
* ''Currency substrates'' that are general to a class of reactions and that do not provide information about substrate specificity; for example for a kinase activity, ''ATP'' should not be added as an input in an annotation extension.
* 'Currency' substrates that are general to a class of reactions and that do not provide information about substrate specificity; for example for a kinase activity, ''ATP'' should not be added as an input in an annotation extension.
* Chemical analogs and other assay conditions should not be captured. The input should represent the biologically meaningful input.
* Chemical analogs and other assay conditions should not be captured. The input should represent the biologically meaningful input.
* Co-factors should not be captured as inputs.
*
* Co-factors should not be captured as inputs.  For example, many enzymes use metal ions such as magnesium, copper or zinc as cofactors.  These inorganic ions should not be captured as inputs to the corresponding Molecular Function.  Likewise, organic cofactors, or coenzymes, such as flavin mononucleotide or coenzyme Q should not be captured.
* x-dependent activities : for example, calcium-dependent protein kinase: do not capture ''has input calcium'', since this is not the molecule that is being acted on.
* x-dependent activities : for example, calcium-dependent protein kinase: do not capture ''has input calcium'', since this is not the molecule that is being acted on.
* Sequence Ontology terms
* Sequence Ontology terms
Line 28: Line 38:
* [https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:24431 CHEBI:24431 chemical entity]
* [https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:24431 CHEBI:24431 chemical entity]
* [http://amigo.geneontology.org/amigo/term/GO:0032991 GO:0032991 protein-containing complex]
* [http://amigo.geneontology.org/amigo/term/GO:0032991 GO:0032991 protein-containing complex]
* Gene or Gene product (includes transcript, ncRNAs, protein and modified gene products): UniProt, MOD IDs, PRO (Protein Ontology)
* Gene or Gene product (includes transcripts, ncRNAs, proteins, and modified gene products): UniProtKB, Model Organism Database, RNACentral, PRO (Protein Ontology)


'''Most common ontology branches where input is specified'''
'''Most common ontology branches where input is specified'''
Line 170: Line 180:
|-
|-
| '''GP2term relation'''
| '''GP2term relation'''
| enables
| involved in
|-
|-
| '''GO term'''
| '''GO term'''
Line 260: Line 270:
==Cross-reference to Relations Ontology (RO) term==
==Cross-reference to Relations Ontology (RO) term==
[http://purl.obolibrary.org/obo/RO_0002233 RO:0002233 has input]
[http://purl.obolibrary.org/obo/RO_0002233 RO:0002233 has input]


== Review Status ==
== Review Status ==


Last reviewed: 2021-10-07
Last reviewed: 2021-10-19


Back to [http://wiki.geneontology.org/index.php/Annotation_Extension Annotation_Extension]
Back to [http://wiki.geneontology.org/index.php/Annotation_Extension Annotation_Extension]


[[Category:relations]]
[[Category:relations]]
[[Category:annotation extension]]
[[Category:annotation extension]]
[[Category: GO-CAM Relations]]
[[Category: GO-CAM Relations]]

Revision as of 15:52, 11 January 2022

The purpose of annotation extensions is described in the main Annotation_Extension documentation page.

Usage guidelines

What to capture with the has_input relation

  • has input is used to specify the specific molecular target of a Molecular Function or a Biological Process.
  • The entity captured must have a unique, resolvable database identifier. As with annotation objects, gene identifiers may be used as a stand-in for a gene product, e.g. an mRNA or a protein. Curators may also use more specific gene product identifiers, e.g. RNACentral, PRO, Complex Portal, in extensions.
  • Specific protein isoforms or post-translationally modified proteins may be captured as inputs.
  • For Molecular Functions, inputs include:
    • Specific substrates for enzymes
    • Interactors for binding and adaptor activities
    • Target genes for transcription factors
  • For Biological Processes that have a high degree of molecular specificity inputs include:
    • Target genes for 'regulation of transcription' or other 'regulation of gene expression' terms.
  • Use of has input is particularly encouraged when the target of the Molecular Function or Biological Process is more specific than what the GO term describes. A common use case is to specify the exact protein target of an MF, e.g. the substrate of a protein kinase activity, or a more specific chemical entity acted upon by an MF that is not otherwise stated in the term definition.
  • Note that there does NOT need to be evidence for a direct physical interaction in order to capture an input. Direct physical interaction between an enabler and its input may be implied (e.g. an enzymatic activity assay) and does not necessarily need to be shown with a separate physical interaction assay (e.g. Y2H or co-IP).
  • More than one input may be captured for an annotation; this means that there are multiple substrates for a single reaction. However, if an enzyme can act on different substrates, or if a transcription factor has multiple targets, this should be captured as independent annotations with independent annotation extensions for each.
  • In standard annotations, different annotations are not linked, so the BP annotation that corresponds to an MF should also have an extension if appropriate.

What not to capture

  • It not necessary to capture the input if it already represented by the GO term label or definition. For example for GO:0004396 hexokinase activity, has input(CHEBI:4194 D-hexose) is redundant with the textual definition of the term and does not need to be captured.
  • 'Currency' substrates that are general to a class of reactions and that do not provide information about substrate specificity; for example for a kinase activity, ATP should not be added as an input in an annotation extension.
  • Chemical analogs and other assay conditions should not be captured. The input should represent the biologically meaningful input.
  • Co-factors should not be captured as inputs. For example, many enzymes use metal ions such as magnesium, copper or zinc as cofactors. These inorganic ions should not be captured as inputs to the corresponding Molecular Function. Likewise, organic cofactors, or coenzymes, such as flavin mononucleotide or coenzyme Q should not be captured.
  • x-dependent activities : for example, calcium-dependent protein kinase: do not capture has input calcium, since this is not the molecule that is being acted on.
  • Sequence Ontology terms

Scope of use

Domain

Domain refers to the GO terms that can be further specified with the relation.

Range

Range describes the types of entities that can be used with the relation.

Most common ontology branches where input is specified

Distinction between 'has input' and 'with/from'

  • with/from is intended to capture the sequence supporting the evidence; while the input represents the physiological input represented by the experiment.
  • for example, for an IPI experiment where a mouse protein is tested for its ability to bind a protein using a human protein ortholog in the assay, the annotation would be:
IPI annotation example: 'input' versus 'with'
Gene product mouse protein A
GP2term relation enables
GO term GO:0005515 protein binding, has input(mouse protein B)
Evidence IPI
with/from human protein B

Usage examples for the has_input extension

1. Specifying the substrate (chemical) of a catalytic activity

Human DGKA phosphorylates 1-O-palmityl-2-acetyl-sn-glycerol, PMID:22627129

Annotation for DGKA - input & output
Gene product UniProtKB:P23743 DGKA
GP2term relation enables
GO term GO:0004143 diacylglycerol kinase activity, has input(CHEBI:75936 1-O-palmityl-2-acetyl-sn-glycerol)
Evidence IDA
Reference PMID:22627129

Note that this reaction has both an input and an output; if this data is available, both the input and the output are captured in the same annotation:

Annotation for DGKA
Gene product UniProtKB:P23743 DGKA
GP2term relation enables
GO term GO:0004143 diacylglycerol kinase activity, has input(CHEBI:75936 1-O-palmityl-2-acetyl-sn-glycerol), has output(CHEBI:78385 1-palmityl-2-acetyl-sn-glycero-3-phosphate(2−))
Evidence IDA
Reference PMID:22627129

2. Specifying the protein target of a catalytic activity

Human CDC7 phosphorylates MCM2, PMID:15668232

Annotation for CDC7
Gene product UniProtKB:O00311 CDC7
GP2term relation enables
GO term GO:0004672 protein kinase activity, has input(UniProtKB:P49736 MCM2)
Evidence IMP
Reference PMID:15668232

3. Specifying the gene target of a DNA binding transcription factor

Human NKX6-3 regulates transcription of BAK1, PMID:26314965

Annotation for NKX6-3
Gene product UniProtKB:A6NJ46 NKX6-3
GP2term relation enables
GO term GO:0001228 DNA-binding transcription activator activity, RNA polymerase II-specific, has input(UniProtKB:Q16611, BAK1)
Evidence IDA
Reference PMID:26314965

The corresponding Biological Process can be annotated with its input:

Annotation for NKX6-3
Gene product UniProtKB:A6NJ46 NKX6-3
GP2term relation involved in
GO term GO:0045944 positive regulation of transcription by RNA polymerase II, has input(UniProtKB:Q16611, BAK1)
Evidence IDA
Reference PMID:26314965

4. Specifying an interaction partner

Human DNM1L binds RAB29, PMID:25767741

Annotation for DNM1L
Gene product UniProtKB:O00429 DNM1L
GP2term relation enables
GO term GO:0031267 small GTPase binding, has input(UniProtKB:O14966, RAB29)
Evidence IDA
Reference PMID:25767741

5. Specifying the target(s) of a macromolecule adaptor

Human TJP2 is a molecular adaptor for tight junction proteins F11R and AFDN, PMID:23885123

Annotation for TJP2
Gene product UniProtKB:Q9UDY2 TJP2
GP2term relation enables
GO term GO:0030674 protein-macromolecule adaptor activity, has input(UniProtKB:Q9Y624, F11R), has input(UniProtKB:P55196, AFDN)
Evidence IDA
Reference PMID:23885123

6. Specifying the input of a catabolic process

Human ACOT7 degrades palmitoyl-CoA, PMID:10578051

Annotation for ACOT7
Gene product UniProtKB:O00154 ACOT7
GP2term relation enables
GO term GO:0036116 long-chain fatty-acyl-CoA catabolic process, has input(CHEBI:15525 palmitoyl-CoA)
Evidence IDA
Reference PMID:10578051

Cross-reference to Relations Ontology (RO) term

RO:0002233 has input

Review Status

Last reviewed: 2021-10-19

Back to Annotation_Extension