InterPro2GO Session October 4th 2011: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 50: Line 50:
*Emily Dimmer (GOA)
*Emily Dimmer (GOA)
*Alex Mitchell (InterPro curation co-ordinator)
*Alex Mitchell (InterPro curation co-ordinator)
*Amaia (InterPro curator)
*Amaia Sangrador (InterPro curator)
*Craig (InterPro Bioinformatician)
*Craig (InterPro Bioinformatician)
*Siew-Yit (InterPro curator)
*Siew-Yit Yong (InterPro curator)
 


===Changes In InterPro Annotation Policy===
===Changes In InterPro Annotation Policy===

Revision as of 11:22, 4 October 2011

A face-to-face meeting at the EBI between GOA curators, GO editors and the InterPro curation team, to go through the InterPro 2 GO mapping process, problematic mappings, relationships between GO terms and InterPro domains etc etc...


Agenda

  1. InterPro to give an overview of the InterPro2GO mapping procedure.
  2. Jane to give an overview of the multi-organism process node (GO:0051704) in GO, and how to use the terms for annotation. [1]
  3. Jane to give an overview of the relations that are being developed for GO annotations, and how they'll be used, including the membrane terms. [2]
  4. Problematic areas of InterPro to cover:
    1. When to use particular membrane-associated component mappings (see the recent PAINT paper)
    2. Use of the protein-binding term ('protein binding ; GO:0005515' and 'binding ; GO:0005488' should only be used for annotation when an identifier is present in the with column (cases where the identifier is absent are stripped out of the GOA files): are there more specific terms InterPro can use instead?).
    3. How to GO map proteins that form complexes (the relationship ontology might help here)
    4. GO mapping proteins that have different functions according to the component they're present in.
    5. Component mappings in general (ie, should we be mapping terms based on proteins that are *only* found in a particular location, or do we map proteins that have been observed in that location at some stage?)
    6. Clarification on how GOA use the 'NOT' qualifier - are there implications that we need to be aware of in InterPro?
    7. The idea of using blacklists to prevent erroneous mappings to sequences based on InterPro matches
      1. Relating to black-lists, revisit protein-kinase catalytic domain entry (IPR000719), which maps the terms GO:0006468 protein phosphorylation, GO:0004672 protein kinase activity and GO:0005524 ATP binding to ~100K sequences in UniProt. However, among these are members of the tribbles family, which are psuedo-kinases. So are there sensible ways we can handle this kind of situation without sacrificing large numbers of true positive mappings.

Problematic InterPro Mappings

  • IPR000402. ATPase and ATP metabolism terms. [3]
  • IPR000342. Signal transducer activity. [4]

* IPR024738

    • It represents the Ada1/Tada1 subunit of SAGA-like complex
    • The SAGA complex is a transcriptional coactivator (involved in regulation of transcription by RNA polymerase II). Should be map it with: Contributes_to transcription coactivator activity (MF) GO:0003713 following the GO complex annotation guidelines?
  • IPR018767
    • Nucleus export protein Brr6. It is mapped to GO:0016021 integral to membrane. Should be instead mapped to: GO:0005635 nuclear envelope?
  • Ribosomal Proteins
    • In the database we have many entries for ribosomal proteins, but perhaps we are not mapping them correctly. Understanding better the relationships between MF and BP could help.
    • Example: IPR000439 Ribosomal protein L15e. At the moment it is mapped as:
    • Process GO:0006412 translation
    • Function GO:0003735 structural constituent of ribosome
    • Component GO:0005840 ribosome
  • NOT qualifier
    • I think the example of the pseudokinase TRIBBLES should be a candidate for the NOT qualifier. It matches IPR000719 and IPR017442, protein kinase domains. And the presence of a kinase domain is characteristic of TRIBBLES, it is only that they have lost their catalytic activity, that’s why they are called pseudokinases.
    • A different case is IPR000014. It integrates 3 signatures. One of them, SMART, is the one giving problems as it hits false positives (Q9C1W9). This is a signature for a PAS domain (mapped to signal transduction), and Q9C1W9 is a DNA ligase. SM00091 hits 44433 proteins in total.


Minutes

Present

  • Jane Lomax (GO)
  • Rebecca Foulger (GO)
  • Emily Dimmer (GOA)
  • Alex Mitchell (InterPro curation co-ordinator)
  • Amaia Sangrador (InterPro curator)
  • Craig (InterPro Bioinformatician)
  • Siew-Yit Yong (InterPro curator)

Changes In InterPro Annotation Policy

  • Amaia gave an overview (INSERT PRESENTATION HERE) of the annotation strategy of InterPro moving from annotating whole proteins to annotating the function of individual domains.
  • We discussed the two-tier approach of InterPro: 1) mapping the functions of individual domains, and 2) mapping the processes and functions that the whole protein is involved in.
  • InterPro are considering using 'contributes to/involved in' qualifier to inherit MF and BP terms up to the whole protein level. This would mean current annotations wouldn't be lost (contributes_to could be applied en-masse to all existing InterPro whole-protein mappings)
  • AI: InterPro to come up with a definition for 'contributes_to' before they start using it.


Multi-organism Processes

  • Jane gave an overview on the multi-organism node in GO.

GO Membrane Terms

  • Jane's presentation overviews the membrane terms in GO. The plan is to change the existing terms to capture this information (integral, intrinsic etc) at the annotation stage.

Current Membrane terms in GO

Useful Links