Improving protein binding annotations using InterPro domains

From GO Wiki
Jump to navigation Jump to search

Current Situation

  • curators are faced with multiple choices of GO terms under GO:0005515; protein binding, with over 900 child terms that describe different aspects of the interaction:

This includes, descriptions of the:

  1. the protein class/family of the interactor: TBP-class protein binding
  2. the role/activity of the interactor: kinase binding
  3. the dependancies of the interaction: copper-dependent protein binding
  4. the state of the interacting protein phosphoprotein binding
  5. the domain being bound in the interactor: MADS box domain binding
  6. the function the interaction contributes towards sterol regulatory element binding protein import into nucleus involved in sterol depletion response
  • all these different ways of describing the interaction mean it is possible to describe an interaction in many different ways, and makes it less likely for the curator to be able to annotate consistently and comprehensively. However different curators feel strongly as to the usefulness of different, diverse terms.

Ideas for moving forward

  • Many curators would like to keep more descriptive terms under protein binding that describe roles/activities (e.g. London 2011 GOC meeting)
  • Ideally, curators would be able to annotate to a protein binding term that indicated its functional relevance, e.g. ‘protein binding involved in heterotypic cell-cell adhesion’
  • Perhaps the second best option, would be to indicate the type of protein being bound provides more information to users. This might also help curators search for the pieces of information to enable them to make the annotation to 'protein binding involved in BP X', users to infer this possibility if it is not strong enough to be included directly in the annotation.


Example:

Protein: Q14114: Low-density lipoprotein receptor-related protein 8
With ID: P02649: Apolipoprotein E
InterPro: IPR000074: Apolipoprotein A1/A4/E
Old Term: GO:0005515: protein binding
New Term: GO:0034185: apolipoprotein binding

UniProt-GOA student project to improve protein binding: Marijn Berg.

Question:

Can the information we have on the identity of interactors be used to help curators make a decision as to what GO term under 'GO:00005515; protein binding', could be used to improve current annotation web displays?

Work Carried out:

Using InterPro family groupings, and the GO annotation attached to the protein interactors to supply curators with more specific GO term suggestions

  • 2068 InterPro family mappings of which 1960 distinct to 255 distinct GO terms.
  • Currently 15% of binding terms that are now GO:0005515 or GO:0042802 could be given a more granular mapping.
  • 148 GO activity terms mapped to GO binding terms.

Data to become available

Shortly to be come available as two column tables from QuickGO

InterPro family ID IPR name Specific GO binding term name
IPR000057 CXC chemokine receptor, type 2/Interleukin 8 receptor beta GO:0005153 Interleukin-8 receptor binding
IPR000098 Interleukin-10 GO:0019969 interleukin-10 binding
IPR000105 Mu opioid receptor GO:0031628 opioid receptor binding
IPR000147 AT2 angiotensin II receptor GO:0031703 type 2 angiotensin receptor binding



The group welcome feedback on improvements to this data.

Use by UniProt-GOA

- to be included initially as a curator suggestion in Protein2GO.

- the decisions that curators make (whether to use the suggestion/improve upon it/reject it) will be captured and assessed

- after 6 months/sufficient data captured, an analysis of the data will determine whether the file can be used to automatically improve the GO term attached to existing protein binding annotations (e.g. from IntAct - where there are ~18,000 high-quality interactions which only apply 'protein binding' or 'protein self binding' GO terms.

- Possibily the first type of positive annotation suggestions for the curator to be included in protein2go; where the suggestion offerred should be high-quality but not yet the correctness of a production IEA method.

Identified issues

  • there are some very descriptive binding terms - whereas in other places, little information is available - e.g. protease binding, no oxidoreductase activity.



  • there will be cases where >1 term is suggested. In some cases it will be very reasonable to capture both in annotations, e.g. for bifunctional enzymes, in others this could lead to a focused discussion as to the desirability of certain terms, e.g. glycoprotein binding

Further work

1. Use these suggestions to improve consistency of curation and improve existing protein binding GO annotations.

2. The work has highlighted some terms whose position in the GO hierarchy under protein binding should be improved, and term naming consistency (to be highlighted at a future call).

3. GO to GO mapping

4. InterPro domain binding - add InterPro domain identifiers into c.16, to enable curators to indicate both function and domain-specific binding in the same annotation line.

Further thoughts

  • A discussion on the web display of improving the display of GO annotations; showing names rather than identifiers.

However a large number of proteins have an obscure name and users will not be able to use the hierarchy structure of GO, therefore this would not negate the desirability for use of more specific GO terms.

  • LEGO might be able to give us a more of a network view - so users can easily move from viewing the activities of one protein, to those in its immediate neighbourhood. However, LEGO will take time to be formalized, for the LEGO curation tool to become available and creation of a complementary display GO annotations will continue to be displayed in a list format for individual proteins on many database web pages for years to come. This project is looking to easily improve the information content to users _now_ and is not incompatible with longer-term LEGO-focused annotation improvements.