Protein complexes

From GO Wiki
Jump to navigation Jump to search

GO definition of a protein-containing complex

  • A cellular component should be composed of more than one subunit (protein and another protein or a RNA), forming a stable interaction that exists as a functional unit in vivo. All complexes in the component ontology are created under the general term GO:0032991 protein-containing complex.
  • Protein-containing complex terms should have 'complex' in the term label to avoid ambiguity. For example, the molecular function term GO:0004738 pyruvate dehydrogenase activity describes the enzyme activity whereas the cellular component term GO:0045254 pyruvate dehydrogenase complex describes the multi-subunit structure in which the enzyme activity resides.
  • Complexes should be as species-agnostic as possible; for example if an homologous complex is present in different species and has different subunit composition, the definition should either be more vague about the number of subunits, or explain how the complex differs in different species.

Textual definition for protein-containing complex terms

  • The textual definitions of protein-containing complex terms should start with "A protein-containing complex that", and continue with either "catalyzes" (some enzymatic activity), "is capable of" (some molecular function) and/or "consisting of" (and list the components).
  • Term definitions for protein-containing complexes should be generic and species-agnostic as much as possible. To provide guidance, it is possible to add specific components for a (small) number of species, formulated as 'For example, in human this complex contains...' as a definition gloss or term comment.

In scope

  • Complexes that exist in an in vivo, physiologically relevant context.
  • Homomultimeric proteins, e.g. the homodimeric alcohol dehydrogenase, may be included as cellular component terms, as should heteromultimeric proteins, e.g. hemoglobin with alpha and beta chains.
  • Enzyme/substrate, receptor/ligand in which these are a critical part of the complex assembly (e.g. PDGF receptors only become 'dimeric' when linked by the dimeric ligand forming a tetramer).

Out of scope

  • Complexes of one gene product with a cofactor, e.g. heme, chlorophyll, magnesium.
  • Enzyme/substrate, receptor/ligand or any similar transient interactions unless these are a critical part of the complex assembly. These unstable interactions should be captured with 'GO:005488 binding' or 'GO:0005515 protein binding'.
  • Putative complexes where the only evidence is based on genetic interaction data.
  • Proteins associated in a pulldown/coimmunoprecipitation assay with no functional link or any evidence that this is a defined biological entity rather than a loose affinity complex. In other words, a bona fide complex should form under physiological conditions as part of an evolved function; things formed in vitro as part of an experimental procedure are assays.
  • Partial complexes and subcomplexes. Note that crystallization experiments often use partial complexes, for technical reasons: some subunits (e.g. transmembrane subunits) cannot be expressed as recombinant proteins and are 'left out' of detailed studies. More reading is often necessary to find out what the full complex is thought to be.
  • Complexes differentiated from their parent by the cell type in which they are present.
  • Complexes should NOT be defined by their stoichiometry, though this may be mentioned in the definition as a definition gloss, or in a comment. The rationale behind this recommendation is that, as knowledge advances and more examples are found, definitions mentioning stoichiometry would have to be updated, causing a lot of work. Also, stoichiometry can vary in different organisms; it is better to keep the definition more general. It is perfectly fine though to mention something like 'usually consists of a catalytic and a regulatory subunit and possibly further accessory subunits...'.

Specific complexes ("instances")

  • GO describes general classes of concepts, not specific ones. To describe specific complexes, described by their exact subunits in a specific organisms, can be submitted to [Complex Portal] and/or [Protein Ontology (PRO). These resources capture complexes with their exact subunit composition (similar to GO annotations).

Taxon constraints

For complexes known to be only present in certain taxa, curators are encouraged to provide this information, if applicable, when they request a new term, or come across an existing one that is missing useful taxon constraints. Typically there are prokaryote- and eukaryote-specific complexes, but this can apply to any complex.

Interontology links

Protein-containing complex link to MF

  • A protein-containing complex can be linked to a molecular function using the 'capable_of' relation. Note that these cannot be used to annotate individual subunits to a MF, as an annotation to a protein-containing complex doesn't indicate which is the active subunit.

Protein-containing complex link to BP

  • A protein-containing complex can be linked to a biological process using the 'capable_of_part_of' relation. These CC to BP relations can be used for inference of the BP from the annotation to the protein-containing complex.

Protein-containing complex link to CC

  • A protein-containing complex can be linked to a cellular anatomical entity using the 'part_of' relation.

How to request protein complexes in GO

Useful links

Review Status

Last reviewed: 2023-09-07

Reviewed by: Peter D'Eustachio, Pascale Gaudet