Protein Complex ids as GO annotation objects: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
(10 intermediate revisions by 4 users not shown)
Line 1: Line 1:
'''Introduction'''
[[Category:Annotation]]
=Introduction=


It is desirable to provide GO annotations directly to species- and gene product -specific protein complex identifiers and use these annotations to supply annotations to the gene product subunits.
The creation of an association between a GO term and  a protein complex identifier (that is species and gene-product specific) as the annotations object is appropriate and possible for terms from each of the three GO ontologies. It is desirable to use these annotations to supply annotations to the member subunit gene products.  


This takes advantage of the protein complex curation efforts begun by diverse external groups (PRO, IntAct, Reactome)
Such annotation practice would take advantage of the protein complex curation efforts begun by diverse external groups (PRO, IntAct, Reactome)


• The transfer of annotations from protein complex identifiers to its subunits would improve the consistency of annotation of gene product subunits
• The transfer of annotations from protein complex identifiers to its member subunits would improve the consistency of annotation of gene product subunits


This would more correctly represent published data that characterizes functioning of the protein complex rather than subunits and enable curators to indicate where a protein is involved in a particular process in the context of a certain complex (particularly important for proteins that are involved in multiple complexes that have differing activities; such as activin and inhibins).See annotation examples below.
Such annotation practice would more correctly represent published data that characterizes functioning of the protein complex rather than subunits and enable curators to indicate where a protein is involved in a particular process in the context of a certain complex (particularly important for proteins that are involved in multiple complexes that have differing activities; such as activin and inhibins). See annotation examples below.




* Therefore the creation of an association between a GO term and  a protein complex as the annotations object is appropriate and possible for terms from each of the three GO ontologies
'''Inheritance of GO annotation from a protein complex annotation to its member subunits'''


Annotation Issues.
Although annotating directly to PC identifiers would best represent published data in certain instances, the large majority of GO annotation users expect to be provided with annotations to gene or gene product identifiers. Currently, any annotations made to PC identifiers are likely to be ignored. However, Cellular Component and Biological Process terms can often be inherited by the subunits of a protein complex.


Inheritance of GO annotation by Protein Complex Subunits
Extra caution needs to be taken with Molecular Function terms, as such GO terms should often only be annotated specifically to the appropriate catalytic subunit, where knowledge exists.


i. Molecular Function, Biological Process and Cellular Component annotations can be made to species and subunit-specific Protein Complex identifiers.
Summarizes as:


ii. Cellular Component and Biological Process terms can often be inherited by the subunits of a protein complex (a probabilistic rule, with p increasing for larger process and smaller complexes). However caution needs to be taken with Molecular Function terms, as such GO ids should often only be annotated to the appropriate catalytic subunit, where knowledge exists.
[1] GP part_of some MC, MC localizes_to some CC ==> GP localizes_to some CC
 
Summarized as:


[1] GP part_of some MC, MC localizes_to some CC ==> GP localizes_to some CC
[2] GP part_of some MC, MC capable_of_part_of some BP ==> GP  capable_of_part_of some BP
[2] GP part_of some MC, MC has_function_in some BP ==> GP  has_function_in some BP


But not:
But not:
Line 29: Line 27:
[3] GP part_of some MC, MC capable_of some MF ==> GP capable_of some MF
[3] GP part_of some MC, MC capable_of some MF ==> GP capable_of some MF


However, automatically allowing [2] but excluding [3] would not have any principled basis. Some BPs are just a chain of two MFs, why allow them but not the single MF?
However, automatically allowing [2] but excluding [3] would not have any principled basis, as some BPs are just a chain of two MFs. Therefore BP inheritance should be a probabilistic rule, with p increasing for larger processes and smaller complexes.


 
When creating an annotation to a subunit using a GO term inherited from the parent protein complex GO annotation, the identity of the specific complex of which the subunit is a member needs to be captured somewhere in the subunit's annotation line.
iii. When creating an annotation to a subunit id using a GO BP or CC term inherited from the parent protein complex GO annotation, the involvement of the subunit in the specific complex needs to be captured somewhere in the annotation line.
   
   
The most appropriate field for this data is Column 17(Gene Product From ID), as column 17 is intended to provide further specification on the specific annotation object in column 2. Currently this field contains protein accessions and isoform identifiers.
The most appropriate field for this data is Column 17(Gene Product From ID), as this column intended to provide further specification on the specific annotation object in column 2. Currently this field contains protein accessions and isoform identifiers.  
 
'''QUESTION:''' If protein complexes were to be described in this field, perhaps relationship types should appear to clarify how the id in column 17 relates to the id in column 2. Perhaps a relationship such as 'belongs_to', 'member_of' or 'component_of' should be supplied alongside the protein complex identifier.  


The clearest way to document this relationship between complex and subunit would be primarily via the GPAD+GPI formats (Chris)
The clearest way to document this relationship between complex and subunit would be primarily via the GPAD+GPI formats (Chris)


* if we decide that the implicit/default gene_product to GO Term relationship is very conservative, such as: [participates_in] for biological process terms or [occurs_in/located_in] for cellular component terms, then no further qualifier for subunit annotations should be needed, especially as enough information to interpret the annotation should be already in column 16/17.
==Annotation Example 1==
 
'''Annotation Example'''


PMID:19001025 - Calcineurin Complex.
PMID:19001025 - Calcineurin Complex.
Line 53: Line 46:
[mouse calcineurin complex]  GO:0016311 dephosphorylation IMP PMID:19001025 [input_id]humanARC
[mouse calcineurin complex]  GO:0016311 dephosphorylation IMP PMID:19001025 [input_id]humanARC


'''How to work with the Protein Complex Databases'''
==Annotation Example 2==
PMID:21827945 - Apoptosome
page 1090, shows procaspase-3 being cleaved by the pc9 CARD apoptosome.
 
[Apoptosome] GO:0004197 cysteine-type endopeptidase activity IDA PMID:21827945
 
As procaspase 9 is known to be the catalytic subunit,then the annotation could also be made:
 
Procaspase 9 GO:0004197 cysteine-type endopeptidase activity IDA PMID:21827945 col17=[Apoptosome]
 
 
'''How to request PC identifiers from Protein Complex Databases'''


1. PRO (Contact: Harold Drabkin)
1. PRO (Contact: Harold Drabkin)
Line 73: Line 77:




'''Outstanding Issues'''
=Outstanding Issues=
 
o If protein complexes were to be described in column 17 do we need to include relationship types to clarify how the id in column 17 relates to the id in column 2? Perhaps a relationship such as 'belongs_to', 'member_of' or 'component_of' should be supplied alongside the protein complex identifier?


o How to annotate subunits with Molecular Function terms that are not known to directly carry out this function.
o How to annotate subunits with Molecular Function terms that are not known to directly carry out this function.
Line 94: Line 100:


- emphasize caution when using a complex id for a paper that does not specify complex composition - for some small subunits (e.g. calcineurin), the composition is quite clear, for others there may be variants/transitory subunits. Therefore curation judgement is required.
- emphasize caution when using a complex id for a paper that does not specify complex composition - for some small subunits (e.g. calcineurin), the composition is quite clear, for others there may be variants/transitory subunits. Therefore curation judgement is required.
==contributes_to (data from SGD)==
There are 2 interpretations for this qualifier
* First Interpretation: Contributes_to qualifier was originally instated for use in cases where a complex has been purified and the activity of the complex has been identified but the roles of individual subunits are not known. So in this case all subunits get annotated to the function of the complex with a 'contributes_to' qualifier.
* Second interpretation of contributes_to: if 2 subunits (or more) are absolutely required for the catalytic activity then both subunits are annotated to the function term with contributes.
* Some data from SGD (compiled by Dianna et al at SGD)- [[File:ContributesTo.pdf‎]]

Revision as of 08:46, 16 June 2022

Introduction

The creation of an association between a GO term and a protein complex identifier (that is species and gene-product specific) as the annotations object is appropriate and possible for terms from each of the three GO ontologies. It is desirable to use these annotations to supply annotations to the member subunit gene products.

• Such annotation practice would take advantage of the protein complex curation efforts begun by diverse external groups (PRO, IntAct, Reactome)

• The transfer of annotations from protein complex identifiers to its member subunits would improve the consistency of annotation of gene product subunits

• Such annotation practice would more correctly represent published data that characterizes functioning of the protein complex rather than subunits and enable curators to indicate where a protein is involved in a particular process in the context of a certain complex (particularly important for proteins that are involved in multiple complexes that have differing activities; such as activin and inhibins). See annotation examples below.


Inheritance of GO annotation from a protein complex annotation to its member subunits

Although annotating directly to PC identifiers would best represent published data in certain instances, the large majority of GO annotation users expect to be provided with annotations to gene or gene product identifiers. Currently, any annotations made to PC identifiers are likely to be ignored. However, Cellular Component and Biological Process terms can often be inherited by the subunits of a protein complex.

Extra caution needs to be taken with Molecular Function terms, as such GO terms should often only be annotated specifically to the appropriate catalytic subunit, where knowledge exists.

Summarizes as:

[1] GP part_of some MC, MC localizes_to some CC ==> GP localizes_to some CC

[2] GP part_of some MC, MC capable_of_part_of some BP ==> GP capable_of_part_of some BP

But not:

[3] GP part_of some MC, MC capable_of some MF ==> GP capable_of some MF

However, automatically allowing [2] but excluding [3] would not have any principled basis, as some BPs are just a chain of two MFs. Therefore BP inheritance should be a probabilistic rule, with p increasing for larger processes and smaller complexes.

When creating an annotation to a subunit using a GO term inherited from the parent protein complex GO annotation, the identity of the specific complex of which the subunit is a member needs to be captured somewhere in the subunit's annotation line.

The most appropriate field for this data is Column 17(Gene Product From ID), as this column intended to provide further specification on the specific annotation object in column 2. Currently this field contains protein accessions and isoform identifiers.

The clearest way to document this relationship between complex and subunit would be primarily via the GPAD+GPI formats (Chris)

Annotation Example 1

PMID:19001025 - Calcineurin Complex.

Investigators use a constitutively active mouse Calcineurin complex (deltaCnA and CnB) to induce dephosphorylation of ARC

Page 2271


[mouse calcineurin complex] GO:0016311 dephosphorylation IMP PMID:19001025 [input_id]humanARC

Annotation Example 2

PMID:21827945 - Apoptosome

page 1090, shows procaspase-3 being cleaved by the pc9 CARD apoptosome.

[Apoptosome] GO:0004197 cysteine-type endopeptidase activity IDA PMID:21827945

As procaspase 9 is known to be the catalytic subunit,then the annotation could also be made:

Procaspase 9 GO:0004197 cysteine-type endopeptidase activity IDA PMID:21827945 col17=[Apoptosome]


How to request PC identifiers from Protein Complex Databases

1. PRO (Contact: Harold Drabkin)

2. IntAct (Contact: Sandra Orchard)


SOP: These databases will need to be supplied with the paper that provide the direct experimental data that can support the creation of a species-specific, subunit-specific protein complex identifier.

Groups will supply a protein complex identifier that can be used as the object of a GO annotation.

These groups will also collaborate together to ensure each others resource is cross-referenced and ids are comparable.

When requesting a protein complex identifier, please also request an equivalent GO term identifier for the complex.

If you would like to search for IntAct protein complex identifiers, QuickGO indexes known identifiers for each GO protein complex term. e.g. for Calcineurin: http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0005955#info=2

- How these groups infer complex composition based on orthology?


Outstanding Issues

o If protein complexes were to be described in column 17 do we need to include relationship types to clarify how the id in column 17 relates to the id in column 2? Perhaps a relationship such as 'belongs_to', 'member_of' or 'component_of' should be supplied alongside the protein complex identifier?

o How to annotate subunits with Molecular Function terms that are not known to directly carry out this function.

Central to this issue appears to be the use of the ‘contributes_to’ qualifier with molecular functions annotations. This has been used by some curators to provide Molecular Function annotations to non-catalytic subunits. Many curators at the GO camp would prefer not to annotate MF terms to unknown subunits, others feel this is the most complete way of annotating (a middle way needed?).

o How annotations made to protein complex identifiers should be transferred to orthologous subunits.

The suggestion of a new evidence code: ‘ICM’ (Inferred from Complex Membership’) was rejected at the Geneva GO camp.

o Concern that a similarly functional protein complex in another organism might have a dissimilar subunit composition, or orthologous subunits may be involved in differently functional protein complexes (raised by Sandra IntAct protein complex group), therefore it might not always be appropriate to transfer annotations between large evolutionary distances

• No Protein complex resource has yet fully characterised large protein complexes, e.g. ribosomes.

• Should all annotations to protein complexes include a cellular component annotation to the most appropriate GO term describing the equivalent protein complex? (such annotations should be inherited by subunits)

  • to add:*

- definition of a protein complex in PRO/IntAct, links out to these resources. - Varsha's curation example

- emphasize caution when using a complex id for a paper that does not specify complex composition - for some small subunits (e.g. calcineurin), the composition is quite clear, for others there may be variants/transitory subunits. Therefore curation judgement is required.

contributes_to (data from SGD)

There are 2 interpretations for this qualifier

  • First Interpretation: Contributes_to qualifier was originally instated for use in cases where a complex has been purified and the activity of the complex has been identified but the roles of individual subunits are not known. So in this case all subunits get annotated to the function of the complex with a 'contributes_to' qualifier.
  • Second interpretation of contributes_to: if 2 subunits (or more) are absolutely required for the catalytic activity then both subunits are annotated to the function term with contributes.
  • Some data from SGD (compiled by Dianna et al at SGD)- File:ContributesTo.pdf