Protein Complex ids as GO annotation objects: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
Line 1: Line 1:
'''Introduction'''
'''Introduction'''


It is desirable to provide GO annotations directly to species- and gene product -specific protein complex identifiers and use these annotations to supply annotations to the gene product subunits. The creation of an association between a GO term and  a protein complex as the annotations object is appropriate and possible for terms from each of the three GO ontologies.
The creation of an association between a GO term and  a protein complex identifier (that is species and gene-product specific) as the annotations object is appropriate and possible for terms from each of the three GO ontologies. It is desirable to use these annotations to supply annotations to the member subunit gene products.  


This takes advantage of the protein complex curation efforts begun by diverse external groups (PRO, IntAct, Reactome)
Such annotation practice would take advantage of the protein complex curation efforts begun by diverse external groups (PRO, IntAct, Reactome)


• The transfer of annotations from protein complex identifiers to its subunits would improve the consistency of annotation of gene product subunits
• The transfer of annotations from protein complex identifiers to its member subunits would improve the consistency of annotation of gene product subunits


This would more correctly represent published data that characterizes functioning of the protein complex rather than subunits and enable curators to indicate where a protein is involved in a particular process in the context of a certain complex (particularly important for proteins that are involved in multiple complexes that have differing activities; such as activin and inhibins).See annotation examples below.
Such annotation practice would more correctly represent published data that characterizes functioning of the protein complex rather than subunits and enable curators to indicate where a protein is involved in a particular process in the context of a certain complex (particularly important for proteins that are involved in multiple complexes that have differing activities; such as activin and inhibins). See annotation examples below.




'''Inheritance of GO annotation from a protein complex annotation to its member subunits'''
'''Inheritance of GO annotation from a protein complex annotation to its member subunits'''


Although annotating directly to PC identifiers would best represent published data in certain instances, the large majority of GO annotation users expect to be provided with annotations to gene or gene product identifiers. Currently, any annotations to PC identifiers are likely to be ignored. However, Cellular Component and Biological Process terms can often be inherited by the subunits of a protein complex.
Although annotating directly to PC identifiers would best represent published data in certain instances, the large majority of GO annotation users expect to be provided with annotations to gene or gene product identifiers. Currently, any annotations made to PC identifiers are likely to be ignored. However, Cellular Component and Biological Process terms can often be inherited by the subunits of a protein complex.


However caution needs to be taken with Molecular Function terms, as such GO term should often only be annotated to the appropriate catalytic subunit, where knowledge exists.
Extra caution needs to be taken with Molecular Function terms, as such GO terms should often only be annotated specifically to the appropriate catalytic subunit, where knowledge exists.


Summarizes as:  
Summarizes as:  
Line 25: Line 25:
[3] GP part_of some MC, MC capable_of some MF ==> GP capable_of some MF
[3] GP part_of some MC, MC capable_of some MF ==> GP capable_of some MF


However, automatically allowing [2] but excluding [3] would not have any principled basis, as ome BPs are just a chain of two MFs. Therefore BP inheritance should be a probabilistic rule, with p increasing for larger processes and smaller complexes.  
However, automatically allowing [2] but excluding [3] would not have any principled basis, as some BPs are just a chain of two MFs. Therefore BP inheritance should be a probabilistic rule, with p increasing for larger processes and smaller complexes.  
 


When creating an annotation to a subunit using a GO term inherited from the parent protein complex GO annotation, the identity of the specific complex of which the subunit is a member needs to be captured somewhere in the subunit's annotation line.
When creating an annotation to a subunit using a GO term inherited from the parent protein complex GO annotation, the identity of the specific complex of which the subunit is a member needs to be captured somewhere in the subunit's annotation line.
   
   
The most appropriate field for this data is Column 17(Gene Product From ID), as this column intended to provide further specification on the specific annotation object in column 2. Currently this field contains protein accessions and isoform identifiers.  
The most appropriate field for this data is Column 17(Gene Product From ID), as this column intended to provide further specification on the specific annotation object in column 2. Currently this field contains protein accessions and isoform identifiers.  
'''QUESTION:''' If protein complexes were to be described in this field, perhaps relationship types should appear to clarify how the id in column 17 relates to the id in column 2. Perhaps a relationship such as 'belongs_to', 'member_of' or 'component_of' should be supplied alongside the protein complex identifier.


The clearest way to document this relationship between complex and subunit would be primarily via the GPAD+GPI formats (Chris)
The clearest way to document this relationship between complex and subunit would be primarily via the GPAD+GPI formats (Chris)
* if we decide that the implicit/default gene_product to GO Term relationship is very conservative, such as: [participates_in] for biological process terms or [occurs_in/located_in] for cellular component terms, then no further qualifier for subunit annotations should be needed, especially as enough information to interpret the annotation should be already in column 16/17.


'''Annotation Example'''
'''Annotation Example'''
Line 49: Line 44:
[mouse calcineurin complex]  GO:0016311 dephosphorylation IMP PMID:19001025 [input_id]humanARC
[mouse calcineurin complex]  GO:0016311 dephosphorylation IMP PMID:19001025 [input_id]humanARC


'''How to work with the Protein Complex Databases'''
'''How to request PC identifiers from Protein Complex Databases'''


1. PRO (Contact: Harold Drabkin)
1. PRO (Contact: Harold Drabkin)
Line 70: Line 65:


'''Outstanding Issues'''
'''Outstanding Issues'''
o If protein complexes were to be described in column 17 do we need to include relationship types to clarify how the id in column 17 relates to the id in column 2? Perhaps a relationship such as 'belongs_to', 'member_of' or 'component_of' should be supplied alongside the protein complex identifier?


o How to annotate subunits with Molecular Function terms that are not known to directly carry out this function.
o How to annotate subunits with Molecular Function terms that are not known to directly carry out this function.

Revision as of 08:10, 15 August 2011

Introduction

The creation of an association between a GO term and a protein complex identifier (that is species and gene-product specific) as the annotations object is appropriate and possible for terms from each of the three GO ontologies. It is desirable to use these annotations to supply annotations to the member subunit gene products.

• Such annotation practice would take advantage of the protein complex curation efforts begun by diverse external groups (PRO, IntAct, Reactome)

• The transfer of annotations from protein complex identifiers to its member subunits would improve the consistency of annotation of gene product subunits

• Such annotation practice would more correctly represent published data that characterizes functioning of the protein complex rather than subunits and enable curators to indicate where a protein is involved in a particular process in the context of a certain complex (particularly important for proteins that are involved in multiple complexes that have differing activities; such as activin and inhibins). See annotation examples below.


Inheritance of GO annotation from a protein complex annotation to its member subunits

Although annotating directly to PC identifiers would best represent published data in certain instances, the large majority of GO annotation users expect to be provided with annotations to gene or gene product identifiers. Currently, any annotations made to PC identifiers are likely to be ignored. However, Cellular Component and Biological Process terms can often be inherited by the subunits of a protein complex.

Extra caution needs to be taken with Molecular Function terms, as such GO terms should often only be annotated specifically to the appropriate catalytic subunit, where knowledge exists.

Summarizes as:

[1] GP part_of some MC, MC localizes_to some CC ==> GP localizes_to some CC [2] GP part_of some MC, MC has_function_in some BP ==> GP has_function_in some BP

But not:

[3] GP part_of some MC, MC capable_of some MF ==> GP capable_of some MF

However, automatically allowing [2] but excluding [3] would not have any principled basis, as some BPs are just a chain of two MFs. Therefore BP inheritance should be a probabilistic rule, with p increasing for larger processes and smaller complexes.

When creating an annotation to a subunit using a GO term inherited from the parent protein complex GO annotation, the identity of the specific complex of which the subunit is a member needs to be captured somewhere in the subunit's annotation line.

The most appropriate field for this data is Column 17(Gene Product From ID), as this column intended to provide further specification on the specific annotation object in column 2. Currently this field contains protein accessions and isoform identifiers.

The clearest way to document this relationship between complex and subunit would be primarily via the GPAD+GPI formats (Chris)

Annotation Example

PMID:19001025 - Calcineurin Complex.

Investigators use a constitutively active mouse Calcineurin complex (deltaCnA and CnB) to induce dephosphorylation of ARC

Page 2271


[mouse calcineurin complex] GO:0016311 dephosphorylation IMP PMID:19001025 [input_id]humanARC

How to request PC identifiers from Protein Complex Databases

1. PRO (Contact: Harold Drabkin)

2. IntAct (Contact: Sandra Orchard)


SOP: These databases will need to be supplied with the paper that provide the direct experimental data that can support the creation of a species-specific, subunit-specific protein complex identifier.

Groups will supply a protein complex identifier that can be used as the object of a GO annotation.

These groups will also collaborate together to ensure each others resource is cross-referenced and ids are comparable.

When requesting a protein complex identifier, please also request an equivalent GO term identifier for the complex.

If you would like to search for IntAct protein complex identifiers, QuickGO indexes known identifiers for each GO protein complex term. e.g. for Calcineurin: http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0005955#info=2

- How these groups infer complex composition based on orthology?


Outstanding Issues

o If protein complexes were to be described in column 17 do we need to include relationship types to clarify how the id in column 17 relates to the id in column 2? Perhaps a relationship such as 'belongs_to', 'member_of' or 'component_of' should be supplied alongside the protein complex identifier?

o How to annotate subunits with Molecular Function terms that are not known to directly carry out this function.

Central to this issue appears to be the use of the ‘contributes_to’ qualifier with molecular functions annotations. This has been used by some curators to provide Molecular Function annotations to non-catalytic subunits. Many curators at the GO camp would prefer not to annotate MF terms to unknown subunits, others feel this is the most complete way of annotating (a middle way needed?).

o How annotations made to protein complex identifiers should be transferred to orthologous subunits.

The suggestion of a new evidence code: ‘ICM’ (Inferred from Complex Membership’) was rejected at the Geneva GO camp.

o Concern that a similarly functional protein complex in another organism might have a dissimilar subunit composition, or orthologous subunits may be involved in differently functional protein complexes (raised by Sandra IntAct protein complex group), therefore it might not always be appropriate to transfer annotations between large evolutionary distances

• No Protein complex resource has yet fully characterised large protein complexes, e.g. ribosomes.

• Should all annotations to protein complexes include a cellular component annotation to the most appropriate GO term describing the equivalent protein complex? (such annotations should be inherited by subunits)

  • to add:*

- definition of a protein complex in PRO/IntAct, links out to these resources. - Varsha's curation example

- emphasize caution when using a complex id for a paper that does not specify complex composition - for some small subunits (e.g. calcineurin), the composition is quite clear, for others there may be variants/transitory subunits. Therefore curation judgement is required.