Protein Complex ids as GO annotation objects

From GO Wiki
Revision as of 07:46, 16 June 2022 by Pascale (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


The creation of an association between a GO term and a protein complex identifier (that is species and gene-product specific) as the annotations object is appropriate and possible for terms from each of the three GO ontologies. It is desirable to use these annotations to supply annotations to the member subunit gene products.

• Such annotation practice would take advantage of the protein complex curation efforts begun by diverse external groups (PRO, IntAct, Reactome)

• The transfer of annotations from protein complex identifiers to its member subunits would improve the consistency of annotation of gene product subunits

• Such annotation practice would more correctly represent published data that characterizes functioning of the protein complex rather than subunits and enable curators to indicate where a protein is involved in a particular process in the context of a certain complex (particularly important for proteins that are involved in multiple complexes that have differing activities; such as activin and inhibins). See annotation examples below.

Inheritance of GO annotation from a protein complex annotation to its member subunits

Although annotating directly to PC identifiers would best represent published data in certain instances, the large majority of GO annotation users expect to be provided with annotations to gene or gene product identifiers. Currently, any annotations made to PC identifiers are likely to be ignored. However, Cellular Component and Biological Process terms can often be inherited by the subunits of a protein complex.

Extra caution needs to be taken with Molecular Function terms, as such GO terms should often only be annotated specifically to the appropriate catalytic subunit, where knowledge exists.

Summarizes as:

[1] GP part_of some MC, MC localizes_to some CC ==> GP localizes_to some CC

[2] GP part_of some MC, MC capable_of_part_of some BP ==> GP capable_of_part_of some BP

But not:

[3] GP part_of some MC, MC capable_of some MF ==> GP capable_of some MF

However, automatically allowing [2] but excluding [3] would not have any principled basis, as some BPs are just a chain of two MFs. Therefore BP inheritance should be a probabilistic rule, with p increasing for larger processes and smaller complexes.

When creating an annotation to a subunit using a GO term inherited from the parent protein complex GO annotation, the identity of the specific complex of which the subunit is a member needs to be captured somewhere in the subunit's annotation line.

The most appropriate field for this data is Column 17(Gene Product From ID), as this column intended to provide further specification on the specific annotation object in column 2. Currently this field contains protein accessions and isoform identifiers.

The clearest way to document this relationship between complex and subunit would be primarily via the GPAD+GPI formats (Chris)

Annotation Example 1

PMID:19001025 - Calcineurin Complex.

Investigators use a constitutively active mouse Calcineurin complex (deltaCnA and CnB) to induce dephosphorylation of ARC

Page 2271

[mouse calcineurin complex] GO:0016311 dephosphorylation IMP PMID:19001025 [input_id]humanARC

Annotation Example 2

PMID:21827945 - Apoptosome

page 1090, shows procaspase-3 being cleaved by the pc9 CARD apoptosome.

[Apoptosome] GO:0004197 cysteine-type endopeptidase activity IDA PMID:21827945

As procaspase 9 is known to be the catalytic subunit,then the annotation could also be made:

Procaspase 9 GO:0004197 cysteine-type endopeptidase activity IDA PMID:21827945 col17=[Apoptosome]

How to request PC identifiers from Protein Complex Databases

1. PRO (Contact: Harold Drabkin)

2. IntAct (Contact: Sandra Orchard)

SOP: These databases will need to be supplied with the paper that provide the direct experimental data that can support the creation of a species-specific, subunit-specific protein complex identifier.

Groups will supply a protein complex identifier that can be used as the object of a GO annotation.

These groups will also collaborate together to ensure each others resource is cross-referenced and ids are comparable.

When requesting a protein complex identifier, please also request an equivalent GO term identifier for the complex.

If you would like to search for IntAct protein complex identifiers, QuickGO indexes known identifiers for each GO protein complex term. e.g. for Calcineurin:

- How these groups infer complex composition based on orthology?

Outstanding Issues

o If protein complexes were to be described in column 17 do we need to include relationship types to clarify how the id in column 17 relates to the id in column 2? Perhaps a relationship such as 'belongs_to', 'member_of' or 'component_of' should be supplied alongside the protein complex identifier?

o How to annotate subunits with Molecular Function terms that are not known to directly carry out this function.

Central to this issue appears to be the use of the ‘contributes_to’ qualifier with molecular functions annotations. This has been used by some curators to provide Molecular Function annotations to non-catalytic subunits. Many curators at the GO camp would prefer not to annotate MF terms to unknown subunits, others feel this is the most complete way of annotating (a middle way needed?).

o How annotations made to protein complex identifiers should be transferred to orthologous subunits.

The suggestion of a new evidence code: ‘ICM’ (Inferred from Complex Membership’) was rejected at the Geneva GO camp.

o Concern that a similarly functional protein complex in another organism might have a dissimilar subunit composition, or orthologous subunits may be involved in differently functional protein complexes (raised by Sandra IntAct protein complex group), therefore it might not always be appropriate to transfer annotations between large evolutionary distances

• No Protein complex resource has yet fully characterised large protein complexes, e.g. ribosomes.

• Should all annotations to protein complexes include a cellular component annotation to the most appropriate GO term describing the equivalent protein complex? (such annotations should be inherited by subunits)

  • to add:*

- definition of a protein complex in PRO/IntAct, links out to these resources. - Varsha's curation example

- emphasize caution when using a complex id for a paper that does not specify complex composition - for some small subunits (e.g. calcineurin), the composition is quite clear, for others there may be variants/transitory subunits. Therefore curation judgement is required.

contributes_to (data from SGD)

There are 2 interpretations for this qualifier

  • First Interpretation: Contributes_to qualifier was originally instated for use in cases where a complex has been purified and the activity of the complex has been identified but the roles of individual subunits are not known. So in this case all subunits get annotated to the function of the complex with a 'contributes_to' qualifier.
  • Second interpretation of contributes_to: if 2 subunits (or more) are absolutely required for the catalytic activity then both subunits are annotated to the function term with contributes.
  • Some data from SGD (compiled by Dianna et al at SGD)- File:ContributesTo.pdf