InterPro2GO Session October 4th 2011
A face-to-face meeting at the EBI between GOA curators, GO editors and the InterPro curation team, to go through the InterPro 2 GO mapping process, problematic mappings, relationships between GO terms and InterPro domains etc etc...
Agenda
- InterPro to give an overview of the InterPro2GO mapping procedure.
- Jane to give an overview of the multi-organism process node (GO:0051704) in GO, and how to use the terms for annotation. File:Slides interpro sept 11.pdf
- Jane to give an overview of the relations that are being developed for GO annotations, and how they'll be used, including the membrane terms. File:Slides interpro sept 11.pdf
- Problematic areas of InterPro to cover:
- When to use particular membrane-associated component mappings (see the recent PAINT paper)
- Use of the protein-binding term ('protein binding ; GO:0005515' and 'binding ; GO:0005488' should only be used for annotation when an identifier is present in the with column (cases where the identifier is absent are stripped out of the GOA files): are there more specific terms InterPro can use instead?).
- How to GO map proteins that form complexes (the relationship ontology might help here)
- GO mapping proteins that have different functions according to the component they're present in.
- Component mappings in general (ie, should we be mapping terms based on proteins that are *only* found in a particular location, or do we map proteins that have been observed in that location at some stage?)
- Clarification on how GOA use the 'NOT' qualifier - are there implications that we need to be aware of in InterPro?
- The idea of using blacklists to prevent erroneous mappings to sequences based on InterPro matches
- Relating to black-lists, revisit protein-kinase catalytic domain entry (IPR000719), which maps the terms GO:0006468 protein phosphorylation, GO:0004672 protein kinase activity and GO:0005524 ATP binding to ~100K sequences in UniProt. However, among these are members of the tribbles family, which are psuedo-kinases. So are there sensible ways we can handle this kind of situation without sacrificing large numbers of true positive mappings.
Problematic InterPro Mappings
* IPR024738
- It represents the Ada1/Tada1 subunit of SAGA-like complex
- The SAGA complex is a transcriptional coactivator (involved in regulation of transcription by RNA polymerase II). Should be map it with: Contributes_to transcription coactivator activity (MF) GO:0003713 following the GO complex annotation guidelines?
- IPR018767
- Nucleus export protein Brr6. It is mapped to GO:0016021 integral to membrane. Should be instead mapped to: GO:0005635 nuclear envelope?
- Ribosomal Proteins
- In the database we have many entries for ribosomal proteins, but perhaps we are not mapping them correctly. Understanding better the relationships between MF and BP could help.
- Example: IPR000439 Ribosomal protein L15e. At the moment it is mapped as:
- Process GO:0006412 translation
- Function GO:0003735 structural constituent of ribosome
- Component GO:0005840 ribosome
- NOT qualifier
- I think the example of the pseudokinase TRIBBLES should be a candidate for the NOT qualifier. It matches IPR000719 and IPR017442, protein kinase domains. And the presence of a kinase domain is characteristic of TRIBBLES, it is only that they have lost their catalytic activity, that’s why they are called pseudokinases.
- A different case is IPR000014. It integrates 3 signatures. One of them, SMART, is the one giving problems as it hits false positives (Q9C1W9). This is a signature for a PAS domain (mapped to signal transduction), and Q9C1W9 is a DNA ligase. SM00091 hits 44433 proteins in total.
Minutes
Present
- Jane Lomax (GO)
- Rebecca Foulger (GO)
- Emily Dimmer (GOA)
- Alex Mitchell (InterPro curation co-ordinator)
- Amaia (InterPro curator)
- Craig (InterPro Bioinformatician)
- Siew-Yit (InterPro curator)
Changes In InterPro Annotation Policy
- Amaia gave an overview (INSERT PRESENTATION HERE) of the annotation strategy of InterPro moving from annotating whole proteins to annotating the function of individual domains.
- We discussed the two-tier approach of InterPro: 1) mapping the functions of individual domains, and 2) mapping the processes and functions that the whole protein is involved in.
- InterPro are considering using 'contributes to/involved in' qualifier to inherit MF and BP terms up to the whole protein level. This would mean current annotations wouldn't be lost (contributes_to could be applied en-masse to all existing InterPro whole-protein mappings)
- AI: InterPro to come up with a definition for 'contributes_to' before they start using it.
Multi-organism Processes
- Jane gave an overview (INSERT PRESENTATION HERE) on the multi-organism node in GO.
GO Membrane Terms
- Jane's presentation (INSERT PRESENTATION HERE) overviews the membrane terms in GO. The plan is to change the existing terms to capture this information (integral, intrinsic etc) at the annotation stage.