Improving protein binding annotations using InterPro domains
Current Situation
- curators are faced with multiple choices of GO terms under GO:0005515; protein binding, with over 900 child terms that describe different aspects of the interaction:
This includes, descriptions of the:
- the protein class/family of the interactor: TBP-class protein binding
- the role/activity of the interactor: kinase binding
- the dependancies of the interaction: copper-dependent protein binding
- the state of the interacting protein phosphoprotein binding
- the domain being bound in the interactor: MADS box domain binding
- the function the interaction contributes towards sterol regulatory element binding protein import into nucleus involved in sterol depletion response
- all these different ways of describing the interaction mean it is possible to describe an interaction in many different ways, and makes it less likely for the curator to be able to annotate consistently and comprehensively. However different curators feel strongly as to the usefulness of different, diverse terms.
Ideas for moving forward
- Many curators would like to keep more descriptive terms under protein binding that describe roles/activities (e.g. London 2011 GOC meeting)
- Ideally, curators would be able to annotate to a protein binding term that indicated its functional relevance, e.g. ‘protein binding involved in heterotypic cell-cell adhesion’
- However, many papers just don’t give this amount of information. In order for the curator to make this decision, > 1 piece of evidence from different sources might need to be combined to make a more descriptive annotation (e.g. [http://wiki.geneontology.org/index.php/Annotation_Conf._Call,_August_9,_2011 compound IC-evidenced annotations)
- Perhaps the second best option, would be to indicate the type of protein being bound provides more information to users. This might also help curators search for the pieces of information to enable them to make the annotation to 'protein binding involved in BP X', users to infer this possibility if it is not strong enough to be included directly in the annotation.
Example:
Protein: Q14114: Low-density lipoprotein receptor-related protein 8
With ID: P02649: Apolipoprotein E
InterPro: IPR000074: Apolipoprotein A1/A4/E
Old Term: GO:0005515: protein binding
New Term: GO:0034185: apolipoprotein binding
UniProt-GOA student project to improve protein binding: Marijn Berg.
Question:
Can the information we have on the identity of interactors be used to help curators make a decision as to what GO term under 'GO:00005515; protein binding', could be used to improve current annotation web displays?
Work Carried out:
Using InterPro family groupings, and the GO annotation attached to the protein interactors to supply curators with more specific GO term suggestions
- 2068 InterPro family mappings of which 1960 distinct to 255 distinct GO terms.
- Currently 15% of binding terms that are now GO:0005515 or GO:0042802 could be given a more granular mapping.
- 148 GO activity terms mapped to GO binding terms.
Data to become available
Shortly to be come available as two column tables from QuickGO
Summer
InterPro family ID | IPR name | Specific GO binding term | name | |||
---|---|---|---|---|---|---|
PDT | MDT | CDT | EDT | BST | CEST | UTC |
5p.m. MDT, | 6p.m. CDT, | 7p.m. EDT, | Midnight BST, | 1a.m. CEST, | 23:00 UTC | |
5p.m. PDT, | 6p.m. MDT, | 7p.m. CDT, | 8p.m. EDT, | 1a.m. BST, | 2a.m. CEST, | 00:00 UTC |
6p.m. PDT, | 7p.m. MDT, | 8p.m. CDT, | 9p.m. EDT, | 2a.m. BST, | 3a.m. CEST, | 01:00 UTC |
7p.m. PDT, | 8p.m. MDT, | 9p.m. CDT, | 10p.m. EDT, | 3a.m. BST, | 4a.m. CEST, | 02:00 UTC |
8p.m. PDT, | 9p.m. MDT, | 10p.m. CDT, | 11p.m. EDT, | 4a.m. BST, | 5a.m. CEST, | 03:00 UTC |
9p.m. PDT, | 10p.m. MDT
Use by UniProt-GOA- to be included initially as a curator suggestion in Protein2GO. - the decisions that curators make (whether to use the suggestion/improve upon it/reject it) will be captured and assessed - after 6 months/sufficient data captured, an analysis of the data will determine whether the file can be used to automatically improve the GO term attached to existing protein binding annotations (e.g. from IntAct - where there are ~18,000 high-quality interactions which only apply 'protein binding' or 'protein self binding' GO terms. - Possibily the first type of positive annotation suggestions for the curator to be included in protein2go; where the suggestion offerred should be high-quality but not yet the correctness of a production IEA method. Identified issues
Further work1. Use these suggestions to improve consistency of curation and improve existing protein binding GO annotations. 2. The work has highlighted some terms whose position in the GO hierarchy under protein binding should be improved, and term naming consistency (to be highlighted at a future call). 3. GO to GO mapping 4. InterPro domain binding - add InterPro domain identifiers into c.16, to enable curators to indicate both function and domain-specific binding in the same annotation line. Further thoughts
However a large number of proteins have an obscure name and users will not be able to use the hierarchy structure of GO, therefore this would not negate the desirability for use of more specific GO terms.
|