Protein Complex ids as GO annotation objects: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
(13 intermediate revisions by 4 users not shown)
Line 1: Line 1:
* the creation of an association between a GO term and  a protein complex as the annotations object is appropriate and possible for terms from each of the three GO ontologies
[[Category:Annotation]]
=Introduction=


* annotations to molecular function terms should not automatically be transferred from a protein complex id to its subunit components. In addition, the issue of how curators apply the 'contributes_to' qualifier to MF terms will need to be resolved, to ensure groups represent this data in the same manner.
The creation of an association between a GO term and  a protein complex identifier (that is species and gene-product specific) as the annotations object is appropriate and possible for terms from each of the three GO ontologies. It is desirable to use these annotations to supply annotations to the member subunit gene products.
 
• Such annotation practice would take advantage of the protein complex curation efforts begun by diverse external groups (PRO, IntAct, Reactome)
 
• The transfer of annotations from protein complex identifiers to its member subunits would improve the consistency of annotation of gene product subunits
 
• Such annotation practice would more correctly represent published data that characterizes functioning of the protein complex rather than subunits and enable curators to indicate where a protein is involved in a particular process in the context of a certain complex (particularly important for proteins that are involved in multiple complexes that have differing activities; such as activin and inhibins). See annotation examples below.
 
 
'''Inheritance of GO annotation from a protein complex annotation to its member subunits'''
 
Although annotating directly to PC identifiers would best represent published data in certain instances, the large majority of GO annotation users expect to be provided with annotations to gene or gene product identifiers. Currently, any annotations made to PC identifiers are likely to be ignored. However, Cellular Component and Biological Process terms can often be inherited by the subunits of a protein complex.
 
Extra caution needs to be taken with Molecular Function terms, as such GO terms should often only be annotated specifically to the appropriate catalytic subunit, where knowledge exists.


Summarizes as:  
Summarizes as:  


[1] GP part_of some MC, MC localizes_to some CC ==> GP localizes_to some CC
[1] GP part_of some MC, MC localizes_to some CC ==> GP localizes_to some CC
[2] GP part_of some MC, MC has_function_in some BP ==> GP  has_function_in some BP
 
[2] GP part_of some MC, MC capable_of_part_of some BP ==> GP  capable_of_part_of some BP


But not:
But not:
Line 12: Line 27:
[3] GP part_of some MC, MC capable_of some MF ==> GP capable_of some MF
[3] GP part_of some MC, MC capable_of some MF ==> GP capable_of some MF


However, automatically allowing [2] but excluding [3] would not have any principled basis, as some BPs are just a chain of two MFs. Therefore BP inheritance should be a probabilistic rule, with p increasing for larger processes and smaller complexes.


Allowing [2] but excluding [3] doesn't seem to have any principled basis. I agree that it's a good rule of thumb, but it can't be a hard and fast rule. Some BPs are just a chain of two MFs, why allow them but nit the single MF? It seems there are probably complexes with multiple units, where some of the units are just hangers-on for smaller BPs. (Chris)
When creating an annotation to a subunit using a GO term inherited from the parent protein complex GO annotation, the identity of the specific complex of which the subunit is a member needs to be captured somewhere in the subunit's annotation line.
The most appropriate field for this data is Column 17(Gene Product From ID), as this column intended to provide further specification on the specific annotation object in column 2. Currently this field contains protein accessions and isoform identifiers.


Seems more like a probabilistic rule, with p increasing for larger process and smaller complexes. (Chris)
The clearest way to document this relationship between complex and subunit would be primarily via the GPAD+GPI formats (Chris)


==Annotation Example 1==


PMID:19001025 - Calcineurin Complex.


* When creating the subunit BP or CC annotation, the involvement of the subunit in the complex needs to be captured somewhere in the annotation line.
Investigators use a constitutively active mouse Calcineurin complex (deltaCnA and CnB) to induce dephosphorylation of ARC
* Possible fields for capturing the protein complex for a subunit's annotation include column 8(with), column 16( annotation extension) or column 17 (Gene Product Form ID). However the most appropriate appears to be Column 17(Gene Product From ID), as column 17 is intended to provide further specification on the specific annotation object in column 2. Currently contains protein accessions, isoform identifiers. If protein complex identifiers were included then the description of this column would need to be widened, however it would still be correct to describe the aims of this field as to provide further details on the object (column 2) of an annotation.  As the accepted contents for this field is currently so well defined, there might be resistance from groups to widen its remit. If protein complexes were to be described in this field, perhaps relationship types should appear to clarify how the id in column 17 relates to the id in column 2. Perhaps a relationship such as 'belongs_to', 'member_of' or 'component_of' should be supplied alongside the protein complex identifier.


The clearest way to document this is primarily via the GPAD+GPI formats (Chris)
Page 2271


* if we decide that the implicit/default gene_product to GO Term relationship is very conservative, such as: [participates_in] for biological process terms or [occurs_in/located_in] for cellular component terms, then no further qualifier for subunit annotations should be needed, especially as enough information to interpret the annotation should be already in column 16/17.


'''Annotation Example'''
[mouse calcineurin complex]  GO:0016311 dephosphorylation IMP PMID:19001025 [input_id]humanARC


PMID:19001025 - Calcineurin Complex.
==Annotation Example 2==
PMID:21827945 - Apoptosome
page 1090, shows procaspase-3 being cleaved by the pc9 CARD apoptosome.


Investigators use a constitutively active mouse Calcineurin complex (deltaCnA and CnB) to induce dephosphorylation of ARC
[Apoptosome] GO:0004197 cysteine-type endopeptidase activity IDA PMID:21827945


Page 2271
As procaspase 9 is known to be the catalytic subunit,then the annotation could also be made:


Procaspase 9 GO:0004197 cysteine-type endopeptidase activity IDA PMID:21827945 col17=[Apoptosome]


[mouse calcineurin complex]  GO:0016311 dephosphorylation IMP PMID:19001025 [input_id]humanARC


Communication with Protein Complex Databases
'''How to request PC identifiers from Protein Complex Databases'''


1. PRO (Contact: Harold Drabkin)
1. PRO (Contact: Harold Drabkin)
Line 45: Line 64:




SOP: These databases will need to be supplied with papers that provide the direct experimental data that can support the creation of a species-specific, subunit-specific protein complex identifier.
SOP: These databases will need to be supplied with the paper that provide the direct experimental data that can support the creation of a species-specific, subunit-specific protein complex identifier.


These groups will supply you with a protein complex identifier that can be used as the object of a GO annotation.
Groups will supply a protein complex identifier that can be used as the object of a GO annotation.


These groups will also collaborate together to ensure each others resource is cross-referenced and ids are comparable.
These groups will also collaborate together to ensure each others resource is cross-referenced and ids are comparable.
Line 54: Line 73:


If you would like to search for IntAct protein complex identifiers, QuickGO indexes known identifiers for each GO protein complex term. e.g. for Calcineurin: http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0005955#info=2
If you would like to search for IntAct protein complex identifiers, QuickGO indexes known identifiers for each GO protein complex term. e.g. for Calcineurin: http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0005955#info=2
- How these groups infer complex composition based on orthology?
=Outstanding Issues=
o If protein complexes were to be described in column 17 do we need to include relationship types to clarify how the id in column 17 relates to the id in column 2? Perhaps a relationship such as 'belongs_to', 'member_of' or 'component_of' should be supplied alongside the protein complex identifier?
o How to annotate subunits with Molecular Function terms that are not known to directly carry out this function.
Central to this issue appears to be the use of the ‘contributes_to’ qualifier with molecular functions annotations. This has been used by some curators to provide Molecular Function annotations to non-catalytic subunits.  Many curators at the GO camp would prefer not to annotate MF terms to unknown subunits, others feel this is the most complete way of annotating (a middle way needed?).
o How annotations made to protein complex identifiers should be transferred to orthologous subunits.
The suggestion of a new evidence code: ‘ICM’ (Inferred from Complex Membership’) was rejected at the Geneva GO camp.
o Concern that a similarly functional protein complex in another organism might have a dissimilar subunit composition, or orthologous subunits may be involved in differently functional protein complexes (raised by Sandra IntAct protein complex group), therefore it might not always be appropriate to transfer annotations between large evolutionary distances
• No Protein complex resource has yet fully characterised  large protein complexes, e.g. ribosomes.
• Should all annotations to protein complexes include a cellular component annotation to the most appropriate GO term describing the equivalent protein complex? (such annotations should be inherited by subunits)
*to add:*
- definition of a protein complex in PRO/IntAct, links out to these resources.
- Varsha's curation example
- emphasize caution when using a complex id for a paper that does not specify complex composition - for some small subunits (e.g. calcineurin), the composition is quite clear, for others there may be variants/transitory subunits. Therefore curation judgement is required.
==contributes_to (data from SGD)==
There are 2 interpretations for this qualifier
* First Interpretation: Contributes_to qualifier was originally instated for use in cases where a complex has been purified and the activity of the complex has been identified but the roles of individual subunits are not known. So in this case all subunits get annotated to the function of the complex with a 'contributes_to' qualifier.
* Second interpretation of contributes_to: if 2 subunits (or more) are absolutely required for the catalytic activity then both subunits are annotated to the function term with contributes.
* Some data from SGD (compiled by Dianna et al at SGD)- [[File:ContributesTo.pdf‎]]

Revision as of 08:46, 16 June 2022

Introduction

The creation of an association between a GO term and a protein complex identifier (that is species and gene-product specific) as the annotations object is appropriate and possible for terms from each of the three GO ontologies. It is desirable to use these annotations to supply annotations to the member subunit gene products.

• Such annotation practice would take advantage of the protein complex curation efforts begun by diverse external groups (PRO, IntAct, Reactome)

• The transfer of annotations from protein complex identifiers to its member subunits would improve the consistency of annotation of gene product subunits

• Such annotation practice would more correctly represent published data that characterizes functioning of the protein complex rather than subunits and enable curators to indicate where a protein is involved in a particular process in the context of a certain complex (particularly important for proteins that are involved in multiple complexes that have differing activities; such as activin and inhibins). See annotation examples below.


Inheritance of GO annotation from a protein complex annotation to its member subunits

Although annotating directly to PC identifiers would best represent published data in certain instances, the large majority of GO annotation users expect to be provided with annotations to gene or gene product identifiers. Currently, any annotations made to PC identifiers are likely to be ignored. However, Cellular Component and Biological Process terms can often be inherited by the subunits of a protein complex.

Extra caution needs to be taken with Molecular Function terms, as such GO terms should often only be annotated specifically to the appropriate catalytic subunit, where knowledge exists.

Summarizes as:

[1] GP part_of some MC, MC localizes_to some CC ==> GP localizes_to some CC

[2] GP part_of some MC, MC capable_of_part_of some BP ==> GP capable_of_part_of some BP

But not:

[3] GP part_of some MC, MC capable_of some MF ==> GP capable_of some MF

However, automatically allowing [2] but excluding [3] would not have any principled basis, as some BPs are just a chain of two MFs. Therefore BP inheritance should be a probabilistic rule, with p increasing for larger processes and smaller complexes.

When creating an annotation to a subunit using a GO term inherited from the parent protein complex GO annotation, the identity of the specific complex of which the subunit is a member needs to be captured somewhere in the subunit's annotation line.

The most appropriate field for this data is Column 17(Gene Product From ID), as this column intended to provide further specification on the specific annotation object in column 2. Currently this field contains protein accessions and isoform identifiers.

The clearest way to document this relationship between complex and subunit would be primarily via the GPAD+GPI formats (Chris)

Annotation Example 1

PMID:19001025 - Calcineurin Complex.

Investigators use a constitutively active mouse Calcineurin complex (deltaCnA and CnB) to induce dephosphorylation of ARC

Page 2271


[mouse calcineurin complex] GO:0016311 dephosphorylation IMP PMID:19001025 [input_id]humanARC

Annotation Example 2

PMID:21827945 - Apoptosome

page 1090, shows procaspase-3 being cleaved by the pc9 CARD apoptosome.

[Apoptosome] GO:0004197 cysteine-type endopeptidase activity IDA PMID:21827945

As procaspase 9 is known to be the catalytic subunit,then the annotation could also be made:

Procaspase 9 GO:0004197 cysteine-type endopeptidase activity IDA PMID:21827945 col17=[Apoptosome]


How to request PC identifiers from Protein Complex Databases

1. PRO (Contact: Harold Drabkin)

2. IntAct (Contact: Sandra Orchard)


SOP: These databases will need to be supplied with the paper that provide the direct experimental data that can support the creation of a species-specific, subunit-specific protein complex identifier.

Groups will supply a protein complex identifier that can be used as the object of a GO annotation.

These groups will also collaborate together to ensure each others resource is cross-referenced and ids are comparable.

When requesting a protein complex identifier, please also request an equivalent GO term identifier for the complex.

If you would like to search for IntAct protein complex identifiers, QuickGO indexes known identifiers for each GO protein complex term. e.g. for Calcineurin: http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0005955#info=2

- How these groups infer complex composition based on orthology?


Outstanding Issues

o If protein complexes were to be described in column 17 do we need to include relationship types to clarify how the id in column 17 relates to the id in column 2? Perhaps a relationship such as 'belongs_to', 'member_of' or 'component_of' should be supplied alongside the protein complex identifier?

o How to annotate subunits with Molecular Function terms that are not known to directly carry out this function.

Central to this issue appears to be the use of the ‘contributes_to’ qualifier with molecular functions annotations. This has been used by some curators to provide Molecular Function annotations to non-catalytic subunits. Many curators at the GO camp would prefer not to annotate MF terms to unknown subunits, others feel this is the most complete way of annotating (a middle way needed?).

o How annotations made to protein complex identifiers should be transferred to orthologous subunits.

The suggestion of a new evidence code: ‘ICM’ (Inferred from Complex Membership’) was rejected at the Geneva GO camp.

o Concern that a similarly functional protein complex in another organism might have a dissimilar subunit composition, or orthologous subunits may be involved in differently functional protein complexes (raised by Sandra IntAct protein complex group), therefore it might not always be appropriate to transfer annotations between large evolutionary distances

• No Protein complex resource has yet fully characterised large protein complexes, e.g. ribosomes.

• Should all annotations to protein complexes include a cellular component annotation to the most appropriate GO term describing the equivalent protein complex? (such annotations should be inherited by subunits)

  • to add:*

- definition of a protein complex in PRO/IntAct, links out to these resources. - Varsha's curation example

- emphasize caution when using a complex id for a paper that does not specify complex composition - for some small subunits (e.g. calcineurin), the composition is quite clear, for others there may be variants/transitory subunits. Therefore curation judgement is required.

contributes_to (data from SGD)

There are 2 interpretations for this qualifier

  • First Interpretation: Contributes_to qualifier was originally instated for use in cases where a complex has been purified and the activity of the complex has been identified but the roles of individual subunits are not known. So in this case all subunits get annotated to the function of the complex with a 'contributes_to' qualifier.
  • Second interpretation of contributes_to: if 2 subunits (or more) are absolutely required for the catalytic activity then both subunits are annotated to the function term with contributes.
  • Some data from SGD (compiled by Dianna et al at SGD)- File:ContributesTo.pdf