Protein Complex ids as GO annotation objects: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
Line 1: Line 1:
'''Introduction'''
'''Introduction'''


It is desirable to provide GO annotations directly to species- and gene product -specific protein complex identifiers and use these annotations to supply annotations to the gene product subunits.
It is desirable to provide GO annotations directly to species- and gene product -specific protein complex identifiers and use these annotations to supply annotations to the gene product subunits. The creation of an association between a GO term and  a protein complex as the annotations object is appropriate and possible for terms from each of the three GO ontologies.


• This takes advantage of the protein complex curation efforts begun by diverse external groups (PRO, IntAct, Reactome)
• This takes advantage of the protein complex curation efforts begun by diverse external groups (PRO, IntAct, Reactome)
Line 10: Line 10:




* Therefore the creation of an association between a GO term and  a protein complex as the annotations object is appropriate and possible for terms from each of the three GO ontologies
'''Inheritance of GO annotation from a protein complex annotation to its member subunits'''


Annotation Issues.
i. Cellular Component and Biological Process terms can often be inherited by the subunits of a protein complex (a probabilistic rule, with p increasing for larger process and smaller complexes).  


Inheritance of GO annotation by Protein Complex Subunits
However caution needs to be taken with Molecular Function terms, as such GO term should often only be annotated to the appropriate catalytic subunit, where knowledge exists.


i. Molecular Function, Biological Process and Cellular Component annotations can be made to species and subunit-specific Protein Complex identifiers.
Summarizes as:  
 
ii. Cellular Component and Biological Process terms can often be inherited by the subunits of a protein complex (a probabilistic rule, with p increasing for larger process and smaller complexes). However caution needs to be taken with Molecular Function terms, as such GO ids should often only be annotated to the appropriate catalytic subunit, where knowledge exists.
 
Summarized as:  


[1] GP part_of some MC, MC localizes_to some CC ==> GP localizes_to some CC
[1] GP part_of some MC, MC localizes_to some CC ==> GP localizes_to some CC
Line 29: Line 25:
[3] GP part_of some MC, MC capable_of some MF ==> GP capable_of some MF
[3] GP part_of some MC, MC capable_of some MF ==> GP capable_of some MF


However, automatically allowing [2] but excluding [3] would not have any principled basis. Some BPs are just a chain of two MFs, why allow them but not the single MF?
However, automatically allowing [2] but excluding [3] would not have any principled basis, as ome BPs are just a chain of two MFs.




iii. When creating an annotation to a subunit id using a GO BP or CC term inherited from the parent protein complex GO annotation, the involvement of the subunit in the specific complex needs to be captured somewhere in the annotation line.
ii. When creating an annotation to a subunit using a GO term inherited from the parent protein complex GO annotation, the identity of the specific complex the subunit is a member of needs to be captured somewhere in the annotation line.
   
   
The most appropriate field for this data is Column 17(Gene Product From ID), as column 17 is intended to provide further specification on the specific annotation object in column 2. Currently this field contains protein accessions and isoform identifiers.  
The most appropriate field for this data is Column 17(Gene Product From ID), as this column intended to provide further specification on the specific annotation object in column 2. Currently this field contains protein accessions and isoform identifiers.  


'''QUESTION:''' If protein complexes were to be described in this field, perhaps relationship types should appear to clarify how the id in column 17 relates to the id in column 2. Perhaps a relationship such as 'belongs_to', 'member_of' or 'component_of' should be supplied alongside the protein complex identifier.  
'''QUESTION:''' If protein complexes were to be described in this field, perhaps relationship types should appear to clarify how the id in column 17 relates to the id in column 2. Perhaps a relationship such as 'belongs_to', 'member_of' or 'component_of' should be supplied alongside the protein complex identifier.  

Revision as of 08:00, 15 August 2011

Introduction

It is desirable to provide GO annotations directly to species- and gene product -specific protein complex identifiers and use these annotations to supply annotations to the gene product subunits. The creation of an association between a GO term and a protein complex as the annotations object is appropriate and possible for terms from each of the three GO ontologies.

• This takes advantage of the protein complex curation efforts begun by diverse external groups (PRO, IntAct, Reactome)

• The transfer of annotations from protein complex identifiers to its subunits would improve the consistency of annotation of gene product subunits

• This would more correctly represent published data that characterizes functioning of the protein complex rather than subunits and enable curators to indicate where a protein is involved in a particular process in the context of a certain complex (particularly important for proteins that are involved in multiple complexes that have differing activities; such as activin and inhibins).See annotation examples below.


Inheritance of GO annotation from a protein complex annotation to its member subunits

i. Cellular Component and Biological Process terms can often be inherited by the subunits of a protein complex (a probabilistic rule, with p increasing for larger process and smaller complexes).

However caution needs to be taken with Molecular Function terms, as such GO term should often only be annotated to the appropriate catalytic subunit, where knowledge exists.

Summarizes as:

[1] GP part_of some MC, MC localizes_to some CC ==> GP localizes_to some CC [2] GP part_of some MC, MC has_function_in some BP ==> GP has_function_in some BP

But not:

[3] GP part_of some MC, MC capable_of some MF ==> GP capable_of some MF

However, automatically allowing [2] but excluding [3] would not have any principled basis, as ome BPs are just a chain of two MFs.


ii. When creating an annotation to a subunit using a GO term inherited from the parent protein complex GO annotation, the identity of the specific complex the subunit is a member of needs to be captured somewhere in the annotation line.

The most appropriate field for this data is Column 17(Gene Product From ID), as this column intended to provide further specification on the specific annotation object in column 2. Currently this field contains protein accessions and isoform identifiers.

QUESTION: If protein complexes were to be described in this field, perhaps relationship types should appear to clarify how the id in column 17 relates to the id in column 2. Perhaps a relationship such as 'belongs_to', 'member_of' or 'component_of' should be supplied alongside the protein complex identifier.

The clearest way to document this relationship between complex and subunit would be primarily via the GPAD+GPI formats (Chris)

  • if we decide that the implicit/default gene_product to GO Term relationship is very conservative, such as: [participates_in] for biological process terms or [occurs_in/located_in] for cellular component terms, then no further qualifier for subunit annotations should be needed, especially as enough information to interpret the annotation should be already in column 16/17.

Annotation Example

PMID:19001025 - Calcineurin Complex.

Investigators use a constitutively active mouse Calcineurin complex (deltaCnA and CnB) to induce dephosphorylation of ARC

Page 2271


[mouse calcineurin complex] GO:0016311 dephosphorylation IMP PMID:19001025 [input_id]humanARC

How to work with the Protein Complex Databases

1. PRO (Contact: Harold Drabkin)

2. IntAct (Contact: Sandra Orchard)


SOP: These databases will need to be supplied with the paper that provide the direct experimental data that can support the creation of a species-specific, subunit-specific protein complex identifier.

Groups will supply a protein complex identifier that can be used as the object of a GO annotation.

These groups will also collaborate together to ensure each others resource is cross-referenced and ids are comparable.

When requesting a protein complex identifier, please also request an equivalent GO term identifier for the complex.

If you would like to search for IntAct protein complex identifiers, QuickGO indexes known identifiers for each GO protein complex term. e.g. for Calcineurin: http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0005955#info=2

- How these groups infer complex composition based on orthology?


Outstanding Issues

o How to annotate subunits with Molecular Function terms that are not known to directly carry out this function.

Central to this issue appears to be the use of the ‘contributes_to’ qualifier with molecular functions annotations. This has been used by some curators to provide Molecular Function annotations to non-catalytic subunits. Many curators at the GO camp would prefer not to annotate MF terms to unknown subunits, others feel this is the most complete way of annotating (a middle way needed?).

o How annotations made to protein complex identifiers should be transferred to orthologous subunits.

The suggestion of a new evidence code: ‘ICM’ (Inferred from Complex Membership’) was rejected at the Geneva GO camp.

o Concern that a similarly functional protein complex in another organism might have a dissimilar subunit composition, or orthologous subunits may be involved in differently functional protein complexes (raised by Sandra IntAct protein complex group), therefore it might not always be appropriate to transfer annotations between large evolutionary distances

• No Protein complex resource has yet fully characterised large protein complexes, e.g. ribosomes.

• Should all annotations to protein complexes include a cellular component annotation to the most appropriate GO term describing the equivalent protein complex? (such annotations should be inherited by subunits)

  • to add:*

- definition of a protein complex in PRO/IntAct, links out to these resources. - Varsha's curation example

- emphasize caution when using a complex id for a paper that does not specify complex composition - for some small subunits (e.g. calcineurin), the composition is quite clear, for others there may be variants/transitory subunits. Therefore curation judgement is required.