Improving protein binding annotations using InterPro domains: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
'''Current Situation'''
===Current Situation===


* curators are faced with multiple choices of GO terms under GO:0005515; protein binding, with over 900 child terms that describe different aspects of the interaction:
* curators are faced with multiple choices of GO terms under GO:0005515; protein binding, with over 900 child terms that describe different aspects of the interaction:
Line 14: Line 14:
* all these different ways of describing the interaction mean it is possible to describe an interaction in many different ways, and makes it less likely for the curator to be able to annotate consistently and comprehensively. However different curators feel strongly as to the usefulness of different, diverse terms.
* all these different ways of describing the interaction mean it is possible to describe an interaction in many different ways, and makes it less likely for the curator to be able to annotate consistently and comprehensively. However different curators feel strongly as to the usefulness of different, diverse terms.


'''Moving forward'''
=== Ideas for moving forward===


* Many curators would like to keep more descriptive terms under protein binding that describe roles/activities (e.g. London 2011 GOC meeting)
* Many curators would like to keep more descriptive terms under protein binding that describe roles/activities (e.g. London 2011 GOC meeting)
Line 30: Line 30:
<<include!>>>
<<include!>>>


'''UniProt-GOA student project to improve protein binding: Marijn Berg.'''
==UniProt-GOA student project to improve protein binding: Marijn Berg.==


Question:
Question:
Line 36: Line 36:
''Can the information we have on the identity of interactors be used to help curators make a decision as to what GO term under 'GO:00005515; protein binding', could be used to improve current annotation web displays?''
''Can the information we have on the identity of interactors be used to help curators make a decision as to what GO term under 'GO:00005515; protein binding', could be used to improve current annotation web displays?''


====Work Carried out:====
===Work Carried out:===


Using InterPro family groupings, and the GO annotation attached to the protein interactors to supply curators with more specific GO term suggestions
Using InterPro family groupings, and the GO annotation attached to the protein interactors to supply curators with more specific GO term suggestions
Line 43: Line 43:
!! number of mappings
!! number of mappings


====Data to become available====
===Data to become available===


Shortly to be come available as a two column file from QuickGO
Shortly to be come available as a two column file from QuickGO
Line 51: Line 51:
The group welcome feedback on improvements to this file.
The group welcome feedback on improvements to this file.


==== Use by UniProt-GOA ====
=== Use by UniProt-GOA ===


- to be included initially as a curator suggestion in Protein2GO.
- to be included initially as a curator suggestion in Protein2GO.
Line 58: Line 58:


[[File:Protein2go_binding_suggestion.png]]
[[File:Protein2go_binding_suggestion.png]]


- after 6 months/sufficient data captured, an analysis of the data will determine whether the file can be used to automatically improve the GO term attached to existing protein binding annotations (e.g.  from IntAct - where there are ~18,000 high-quality interactions which only apply 'protein binding' or 'protein self binding' GO terms.
- after 6 months/sufficient data captured, an analysis of the data will determine whether the file can be used to automatically improve the GO term attached to existing protein binding annotations (e.g.  from IntAct - where there are ~18,000 high-quality interactions which only apply 'protein binding' or 'protein self binding' GO terms.
Line 76: Line 72:




====Going Forward====
===Further work===


1. Use these suggestions to improve consistency of curation.
1. Use these suggestions to improve consistency of curation and improve existing protein binding GO annotations.


2. The work has highlighted some terms whose position in the GO hierarchy under protein binding should be improved, and term naming consistency (to be highlighted at a future call).
2. The work has highlighted some terms whose position in the GO hierarchy under protein binding should be improved, and term naming consistency (to be highlighted at a future call).
Line 87: Line 83:




 
===More Widely: Other ways to improve the _display_ of Protein binding annotations===
 
 
 
===More Widely: Other ways to improve the display of Protein binding annotations===


* A discussion on the web display of improving the display of GO annotations; showing names rather than identifiers.  
* A discussion on the web display of improving the display of GO annotations; showing names rather than identifiers.  

Revision as of 06:48, 26 June 2012

Current Situation

  • curators are faced with multiple choices of GO terms under GO:0005515; protein binding, with over 900 child terms that describe different aspects of the interaction:

This includes, descriptions of the:

  1. the protein class/family of the interactor: TBP-class protein binding
  2. the role/activity of the interactor: kinase binding
  3. the dependancies of the interaction: copper-dependent protein binding
  4. the state of the interacting protein phosphoprotein binding
  5. the domain being bound in the interactor: MADS box domain binding
  6. the function the interaction contributes towards sterol regulatory element binding protein import into nucleus involved in sterol depletion response
  • all these different ways of describing the interaction mean it is possible to describe an interaction in many different ways, and makes it less likely for the curator to be able to annotate consistently and comprehensively. However different curators feel strongly as to the usefulness of different, diverse terms.

Ideas for moving forward

  • Many curators would like to keep more descriptive terms under protein binding that describe roles/activities (e.g. London 2011 GOC meeting)
  • Ideally, curators would be able to annotate to a protein binding term that indicated its functional relevance, e.g. ‘protein binding involved in heterotypic cell-cell adhesion’
  • Perhaps the second best option, would be to indicate the type of protein being bound provides more information to users. This might also help curators search for the pieces of information to enable them to make the annotation to 'protein binding involved in BP X', users to infer this possibility if it is not strong enough to be included directly in the annotation.


Example:


<<include!>>>

UniProt-GOA student project to improve protein binding: Marijn Berg.

Question:

Can the information we have on the identity of interactors be used to help curators make a decision as to what GO term under 'GO:00005515; protein binding', could be used to improve current annotation web displays?

Work Carried out:

Using InterPro family groupings, and the GO annotation attached to the protein interactors to supply curators with more specific GO term suggestions


!! number of mappings

Data to become available

Shortly to be come available as a two column file from QuickGO

<!! show excerpt? >

The group welcome feedback on improvements to this file.

Use by UniProt-GOA

- to be included initially as a curator suggestion in Protein2GO.

- the decisions that curators make (whether to use the suggestion/improve upon it/reject it) will be captured and assessed

- after 6 months/sufficient data captured, an analysis of the data will determine whether the file can be used to automatically improve the GO term attached to existing protein binding annotations (e.g. from IntAct - where there are ~18,000 high-quality interactions which only apply 'protein binding' or 'protein self binding' GO terms.

- Possibily the first type of positive annotation suggestions for the curator to be included in protein2go; where the suggestion offerred should be high-quality but not yet the correctness of a production IEA method.

Identified issues

  • there are some very descriptive binding terms - whereas in other places, little information is available - e.g. protease binding, no oxidoreductase activity.
  • there will be cases where >1 term is suggested. In some cases it will be reasonable to capture both in annotations, in others - this could lead to a discussion as to the desirability of certain terms, e.g. glycoprotein binding



Further work

1. Use these suggestions to improve consistency of curation and improve existing protein binding GO annotations.

2. The work has highlighted some terms whose position in the GO hierarchy under protein binding should be improved, and term naming consistency (to be highlighted at a future call).

3. GO to GO mapping

4. InterPro domain binding - add InterPro domain identifiers into c.16, to enable curators to indicate both function + domain in one annotation line.


More Widely: Other ways to improve the _display_ of Protein binding annotations

  • A discussion on the web display of improving the display of GO annotations; showing names rather than identifiers.

However a large number of proteins have an obscure name and users will not be able to use the hierarchy structure of GO, therefore this would not negate the desirability for use of more specific GO terms.

  • LEGO might be able to give us a more of a network view - so users can easily move from viewing the activities of one protein, to those in its immediate neighbourhood. However, LEGO will take time to be formalized, for the LEGO curation tool to become available and creation of a complementary display GO annotations will continue to be displayed in a list format for individual proteins on many database web pages for years to come. This project is looking to easily improve the information content to users _now_ and is not incompatible with longer-term LEGO-focused annotation improvements.