Protein complexes: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
 
(111 intermediate revisions by 4 users not shown)
Line 1: Line 1:
*NOTE: This is a work in progress. It needs to be wrapped up, and revised by editors, Becky and Birgit. Also, we need to add examples - what works and what doesn't.
=GO definition of a protein-containing complex=
*Last updated: 30/4/2015, Birgit Meldal
* A cellular component should be composed of more than one subunit (protein and another protein or a RNA), forming a stable interaction that exists as a functional unit <i>in vivo</i>. All complexes in the component ontology are created under the general term [https://amigo.geneontology.org/amigo/term/GO:0032991 GO:0032991 protein-containing complex].  
* Protein-containing complex terms should have 'complex' in the term label to avoid ambiguity. For example, the molecular function term <code>GO:0004738 pyruvate dehydrogenase activity</code> describes the enzyme activity whereas the cellular component term <code>GO:0045254 pyruvate dehydrogenase complex</code> describes the multi-subunit structure in which the enzyme activity resides.
* Complexes should be as species-agnostic as possible; for example if an homologous complex is present in different species and has different subunit composition, the definition should either be more vague about the number of subunits, or explain how the complex differs in different species.


=Textual definition for protein-containing complex terms=
* The textual definitions of protein-containing complex terms should start with "A protein-containing complex that", and continue with either "catalyzes" (some enzymatic activity), "is capable of" (some molecular function) and/or "consisting of" (and list the components).
* Term definitions for protein-containing complexes should be generic and species-agnostic as much as possible. To provide guidance, it is possible to add specific components for a (small) number of species, formulated as 'For example, in human this complex contains...' as a definition gloss or term comment.


=In scope=
* Complexes that exist in an ''in vivo'', physiologically relevant context.
* Homomultimeric proteins, e.g. the homodimeric alcohol dehydrogenase, may be included as cellular component terms, as should heteromultimeric proteins, e.g. hemoglobin with alpha and beta chains.
* Enzyme/substrate, receptor/ligand in which these are a critical part of the complex assembly (e.g. PDGF receptors only become 'dimeric' when linked by the dimeric ligand forming a tetramer).


== Background and rationale ==
=Out of scope=
* Complexes of one gene product with a cofactor, e.g. heme, chlorophyll, magnesium.
* Enzyme/substrate, receptor/ligand or any similar transient interactions unless these are a critical part of the complex assembly. These unstable interactions should be captured with 'GO:005488 binding' or 'GO:0005515 protein binding'.
* Putative complexes where the only evidence is based on genetic interaction data.
* Proteins associated in a pulldown/coimmunoprecipitation assay with no functional link or any evidence that this is a defined biological entity rather than a loose affinity complex. In other words, a <i>bona fide</i> complex should form under physiological conditions as part of an evolved function; things formed  <i>in vitro</i> as part of an experimental procedure are assays.
* Partial complexes and subcomplexes. Note that crystallization experiments often use partial complexes, for technical reasons: some subunits (e.g. transmembrane subunits) cannot be expressed as recombinant proteins and are 'left out' of detailed studies. More reading is often necessary to find out what the full complex is thought to be.
* Complexes differentiated from their parent by the cell type in which they are present.
* Complexes should NOT be defined by their stoichiometry, though this may be mentioned in the definition as a definition gloss, or in a comment. The rationale behind this recommendation is that, as knowledge advances and more examples are found, definitions mentioning stoichiometry would have to be updated, causing a lot of work. Also, stoichiometry can vary in different organisms; it is better to keep the definition more general. It is perfectly fine though to mention something like 'usually consists of a catalytic and a regulatory subunit and possibly further accessory subunits...'.


Recently, GO and IntAct have started to work together to improve the 'protein complex' branch in GO, making it less flat and more informative, and to provide species-agnostic GO terms that IntAct can reference to for their species-specific curation projects (At the time of writing the focus lies on human, mouse, yeast and ecoli but we take direct curation requests as well. We'd like more MODs to collaborate directly.) Here, we collect current guidelines on protein complex terms, to aid GO curators in discerning whether a protein complex belongs in GO or not, and if yes, in including all necessary information when requesting a new protein complex.
=Specific complexes ("instances")=
* GO describes general classes of concepts, not specific ones. To describe specific complexes, described by their exact subunits in a specific organisms, can be submitted to [[https://www.ebi.ac.uk/about/contact/support/complexportal Complex Portal]] and/or [[https://github.com/PROconsortium/PRoteinOntology/issues/new?assignees=nataled&labels=Term+Request&projects=&template=1--term-request.md&title=Term+issue%3A+ Protein Ontology (PRO)]. These resources capture complexes with their exact subunit composition (similar to GO annotations).


=Taxon constraints=
For complexes known to be only present in certain taxa, curators are encouraged to provide this information, if applicable, when they request a new term, or come across an existing one that is missing useful taxon constraints. Typically there are prokaryote- and eukaryote-specific complexes, but this can apply to any complex.


== Protein complexes in GO ==
=Interontology links=


==Protein-containing complex link to MF==
* A protein-containing complex can be linked to a molecular function using the 'capable_of' relation. Note that these cannot be used to annotate individual subunits to a MF, as an annotation to a protein-containing complex doesn't indicate which is the active subunit.


=== Rule 1: Is the complex stable? ===
==Protein-containing complex link to BP==
* A protein-containing complex can be linked to a biological process using the 'capable_of_part_of' relation. These CC to BP relations can be used for inference of the BP from the annotation to the protein-containing complex.


From the Complex Portal Rules http://www.ebi.ac.uk/intact/complex/documentation/
==Protein-containing complex link to CC==
* A protein-containing complex can be linked to a cellular anatomical entity using the 'part_of' relation.  


== How to request protein complexes in GO==


===== What can be described as a complex? =====
* Use the [https://github.com/geneontology/go-ontology/issues/new?assignees=&labels=&projects=&template=ntr--protein-containing-complex.md&title= GO-ontology GitHub tracker]
 
A stable set of (two or more) interacting macromolecules such as proteins which can be co-purified by an acceptable method and have been shown to exist as an isolated, functional unit in vivo. Any interacting non-protein molecular (e.g. small molecules, nucleic acids) will also be included.
 
 
===== What should not be captured: =====
*Enzyme/substrate, receptor/ligand or any similar transient interactions unless these are a critical part of the complex assembly.
*Proteins associated in a pulldown/coimmunoprecipitation with no functional link or any evidence that this is a defined biological entity rather than a loose affinity complex.
*Proteins with the same function but with either no demonstrable physical link or one that can be inferred by sequence homology. [wording to be updated by Sandra]
*Any literature complex where the only evidence is based on genetic interaction data.
 
 
===== Comments: =====
*If the complex is not stable, it's just protein binding. Interactions can then be captured by a protein-protein interaction DB such as IntAct.
 
*The Complex Portal could also hold transient complexes, e.g. signaling complexes that form for only split seconds but have some experimental evidence that they exist. We haven't done any of these but they are possible. BUT - they would probably fall outside the scope of GO if they limit themselves to stable complexes.
 
*We can also curate complexes that have no full experimental evidence but are commonly regarded as truly real, e.g. complexes submitted by ChEMBL for which we only have pharmacological evidence. These complexes are tagged with ECO:0000306 - inferred from background scientific knowledge by manual assertion.
 
 
=== Rule 2: Is the complex species-agnostic? ===
 
*GO should host species-agnostic complexes, ideally conserved across taxa. Where this isn't known, still make the def generic, and add 'For example, in human this complex contains...' as a def gloss or def comment.
*Species-specific complexes don't belong in GO, but IntAct/Complex Portal and/or PRO can take them.
 
*We may, however, need taxon restrictions on a case by case basis such as complexes that only exist in prokaryots or eukaryotes.
 
=== Rule 3: Does the complex have a molecular function? ===
 
*Ideally, add capable_of functions link. If not possible, see if capable_of_part_of process links can be made. If none is applicable, we do host complexes based on their subunits only. [DOS to look into some automatic reasoning across subunits but we think it may become tricky.]
 
 
=== Rule 4: Is the complex known to be involved in one or more biological processes? ===
 
*If yes, add capable_of_part_of process links.
 
 
=== Rule 5: Does the complex contain conserved subunits? ===
 
*GO does host complexes based on their subunits only, when no function or process information is available.
 
*Most complexes contain some wording such as: "In human, it is composed of..." BUT, this is getting messy where subunit composition is different in different branches of the tree of life and different groups/MODs add their own examples. Should these just go in as NARROW synonyms? [to be dicussed]
 
*Complexes defined by their subunits but functionally identical to a more generic parent term should not be created as separate GO terms but added to the parent term as synonyms. The specific complex belongs in the Complex Portal.
 
 
=== Rule 6: Where is the complex located? ===
 
*Indicate cellular location as specifically as possible, unless parent already has one.
 
*The CC is for the complex as a whole. We discussed this in the context of transmembrane complexes with members that are only located on one side of the membrane or have no membrane attachment at all. As gene products have the part_of relationship with the complexes this is fine (and the only way of reflecting the CC for the complex as a whole).
 
 
== How to request protein complexes in GO based on the above (TG template, TG freeform) ==
 
*If the complex is generic and its function exists as a GO term, use the complex-by-activity template (and add relevant synonyms as discussed above).
 
*If the function does not yet exist in GO but is clearly defined, create the new MF term first (via SF or TG FF depending on the curator's experience), then create the CC term for the complex via the template.
 
*If the complex-by-activity template is not applicable, create the complex term either via SF or TG FF depending on the curator's experience.
 
*Curator new to requesting complex term can be trained by IntAct. We are happy to curate the complexes into the Complex Portal at the same time as adding to the GO structure.
 
 
== Future plans ==
as discussed in a meeting with Birgit Meldal, Sandra Orchard, David Osumi-Sunderland and Paola Roncaglia on 28/4/2015
 
We discussed how we can make 'quick gains' in making the ontology more granular beyond the fixes Birgit does on a case by case basis. This is to target history terms that have only 'protein complex' as a parent because they have no annotation extensions. The aim is to have most complexes grouped either by their function, location or subunit composition.
*Do a pass through term names and definitions to find major groups of complexes that can be grouped by function, e.g. catalytic complexes (the term exists but many historic terms have not automatically classified as such as they have no capable_of extensions) [BM, SO & DOS]
 
*Add parent terms based on location, such membrane complexes and children [BM, DOS]
 
*We discussed grouping by protein families but this may be tricky. Decide on a case by case basis. A working example are the BCL protein family complexes which cannot be grouped by function as they may be pro- and/or antiapoptotic.
 
 
== Previous work ==
 
Emily started documentation here, in case it's helpful, but this wasn't worked on since 2011:
http://wiki.geneontology.org/index.php/Protein_Complex_ids_as_GO_annotation_objects
 
[Birgit] Inheritance of annotations:
I agree with the wiki, you cannot inherit MF from a complex to a subunit and even a CC is problematic, see the transmembrane example above. This needs more thinking about. I don't know what you are doing right now...
 
Orthologies:
We infer within taxon groups, e.g. human to mouse to rat or any other mammal etc, depending on where the exp evidence comes from. We systematically infer human-mouse. We have a few pombe complexes inferred from yeast (Sc!) but we don't do it systematically.
 
Paralogues:
We make inferences between related complexes in the same species when the gene products are very similar, e.g. hemoglobin chains for adult and developmental complexes.
 
'Large' complexes:
We have tackled the 'mediator' and we can now link to RNACentral for RNAs so time permitting we'll tackle the 'biggies' soon!
 
Pro:
We have a list of Pro complexes that we consult for refs.
 
- What IntAct is doing - a summary:
 
[Birgit] We didn't draw up an official set of rules but in summary this is what we do (and it pretty much matches what Paola says below and the wiki she cites):
A complex should be taxon agnostic but may be restricted to certain taxonomic groups, such as pro- vs eukaryotes.
... should contain subunits in the def
... should have a 'as precise as possible' part_of relationship to the CC (may have to create new terms here as well of course!) which can be a complex (in cases of subcomplexes) or a location
... have, if possible, capable_of and capable_of_part_of annotation extensions.
... should have is_a relationship to an appropriate child term of 'protein complex'. This could be a term based on it's composition or function but NOT based on the PB. If no appropriate term exists, we create one based on either of the two classes. There is now a TG template for creating complex-by-MF which make curators' life much easier :) If there is no appropriate CC or complex-by-MF parent the new complex will be a direct child of 'protein complex'.
 


== Useful links ==
== Useful links ==


IntAct Complex portal, http://www.ebi.ac.uk/intact/complex/
* [http://www.ebi.ac.uk/complexportal/ Complex Portal]


== Review Status ==
Last reviewed: 2023-09-07


[[Category:Ontology]]
Reviewed by: Peter D'Eustachio, Pascale Gaudet
----
[[Category:GO Editors]][[Category:Ontology]]

Latest revision as of 03:38, 30 January 2024

GO definition of a protein-containing complex

  • A cellular component should be composed of more than one subunit (protein and another protein or a RNA), forming a stable interaction that exists as a functional unit in vivo. All complexes in the component ontology are created under the general term GO:0032991 protein-containing complex.
  • Protein-containing complex terms should have 'complex' in the term label to avoid ambiguity. For example, the molecular function term GO:0004738 pyruvate dehydrogenase activity describes the enzyme activity whereas the cellular component term GO:0045254 pyruvate dehydrogenase complex describes the multi-subunit structure in which the enzyme activity resides.
  • Complexes should be as species-agnostic as possible; for example if an homologous complex is present in different species and has different subunit composition, the definition should either be more vague about the number of subunits, or explain how the complex differs in different species.

Textual definition for protein-containing complex terms

  • The textual definitions of protein-containing complex terms should start with "A protein-containing complex that", and continue with either "catalyzes" (some enzymatic activity), "is capable of" (some molecular function) and/or "consisting of" (and list the components).
  • Term definitions for protein-containing complexes should be generic and species-agnostic as much as possible. To provide guidance, it is possible to add specific components for a (small) number of species, formulated as 'For example, in human this complex contains...' as a definition gloss or term comment.

In scope

  • Complexes that exist in an in vivo, physiologically relevant context.
  • Homomultimeric proteins, e.g. the homodimeric alcohol dehydrogenase, may be included as cellular component terms, as should heteromultimeric proteins, e.g. hemoglobin with alpha and beta chains.
  • Enzyme/substrate, receptor/ligand in which these are a critical part of the complex assembly (e.g. PDGF receptors only become 'dimeric' when linked by the dimeric ligand forming a tetramer).

Out of scope

  • Complexes of one gene product with a cofactor, e.g. heme, chlorophyll, magnesium.
  • Enzyme/substrate, receptor/ligand or any similar transient interactions unless these are a critical part of the complex assembly. These unstable interactions should be captured with 'GO:005488 binding' or 'GO:0005515 protein binding'.
  • Putative complexes where the only evidence is based on genetic interaction data.
  • Proteins associated in a pulldown/coimmunoprecipitation assay with no functional link or any evidence that this is a defined biological entity rather than a loose affinity complex. In other words, a bona fide complex should form under physiological conditions as part of an evolved function; things formed in vitro as part of an experimental procedure are assays.
  • Partial complexes and subcomplexes. Note that crystallization experiments often use partial complexes, for technical reasons: some subunits (e.g. transmembrane subunits) cannot be expressed as recombinant proteins and are 'left out' of detailed studies. More reading is often necessary to find out what the full complex is thought to be.
  • Complexes differentiated from their parent by the cell type in which they are present.
  • Complexes should NOT be defined by their stoichiometry, though this may be mentioned in the definition as a definition gloss, or in a comment. The rationale behind this recommendation is that, as knowledge advances and more examples are found, definitions mentioning stoichiometry would have to be updated, causing a lot of work. Also, stoichiometry can vary in different organisms; it is better to keep the definition more general. It is perfectly fine though to mention something like 'usually consists of a catalytic and a regulatory subunit and possibly further accessory subunits...'.

Specific complexes ("instances")

  • GO describes general classes of concepts, not specific ones. To describe specific complexes, described by their exact subunits in a specific organisms, can be submitted to [Complex Portal] and/or [Protein Ontology (PRO). These resources capture complexes with their exact subunit composition (similar to GO annotations).

Taxon constraints

For complexes known to be only present in certain taxa, curators are encouraged to provide this information, if applicable, when they request a new term, or come across an existing one that is missing useful taxon constraints. Typically there are prokaryote- and eukaryote-specific complexes, but this can apply to any complex.

Interontology links

Protein-containing complex link to MF

  • A protein-containing complex can be linked to a molecular function using the 'capable_of' relation. Note that these cannot be used to annotate individual subunits to a MF, as an annotation to a protein-containing complex doesn't indicate which is the active subunit.

Protein-containing complex link to BP

  • A protein-containing complex can be linked to a biological process using the 'capable_of_part_of' relation. These CC to BP relations can be used for inference of the BP from the annotation to the protein-containing complex.

Protein-containing complex link to CC

  • A protein-containing complex can be linked to a cellular anatomical entity using the 'part_of' relation.

How to request protein complexes in GO

Useful links

Review Status

Last reviewed: 2023-09-07

Reviewed by: Peter D'Eustachio, Pascale Gaudet