InterPro2GO Session October 4th 2011: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 56: Line 56:
==Useful Links==
==Useful Links==


*[http://obofoundry.org/ro/ Relation Ontology]
*[http://www.geneontology.org/scratch/xps/go_annotation_extension_relations.obo GO annotation relations]
*[http://www.geneontology.org/scratch/xps/go_annotation_extension_relations.obo GO annotation relations]
*[http://www.geneontology.org/GO.annotation.conventions.shtml GO annotation conventions]
*[http://www.geneontology.org/GO.annotation.conventions.shtml GO annotation conventions]

Revision as of 09:50, 4 October 2011

A face-to-face meeting at the EBI between GOA curators, GO editors and the InterPro curation team, to go through the InterPro 2 GO mapping process, problematic mappings, relationships between GO terms and InterPro domains etc etc...


Agenda

  1. InterPro to give an overview of the InterPro2GO mapping procedure.
  2. Jane to give an overview of the multi-organism process node (GO:0051704) in GO, and how to use the terms for annotation.
  3. Jane to give an overview of the relations that are being developed for GO annotations, and how they'll be used, including the membrane terms.
  4. Problematic areas of InterPro to cover:
    1. When to use particular membrane-associated component mappings (see the recent PAINT paper)
    2. Use of the protein-binding term ('protein binding ; GO:0005515' and 'binding ; GO:0005488' should only be used for annotation when an identifier is present in the with column (cases where the identifier is absent are stripped out of the GOA files): are there more specific terms InterPro can use instead?).
    3. How to GO map proteins that form complexes (the relationship ontology might help here)
    4. GO mapping proteins that have different functions according to the component they're present in.
    5. Component mappings in general (ie, should we be mapping terms based on proteins that are *only* found in a particular location, or do we map proteins that have been observed in that location at some stage?)
    6. Clarification on how GOA use the 'NOT' qualifier - are there implications that we need to be aware of in InterPro?
    7. The idea of using blacklists to prevent erroneous mappings to sequences based on InterPro matches
      1. Relating to black-lists, revisit protein-kinase catalytic domain entry (IPR000719), which maps the terms GO:0006468 protein phosphorylation, GO:0004672 protein kinase activity and GO:0005524 ATP binding to ~100K sequences in UniProt. However, among these are members of the tribbles family, which are psuedo-kinases. So are there sensible ways we can handle this kind of situation without sacrificing large numbers of true positive mappings.


Problematic InterPro Mappings

  • IPR000402. ATPase and ATP metabolism terms. [1]
  • IPR000342. Signal transducer activity. [2]

* IPR024738

    • It represents the Ada1/Tada1 subunit of SAGA-like complex
    • The SAGA complex is a transcriptional coactivator (involved in regulation of transcription by RNA polymerase II). Should be map it with: Contributes_to transcription coactivator activity (MF) GO:0003713 following the GO complex annotation guidelines?
  • IPR018767
    • Nucleus export protein Brr6. It is mapped to GO:0016021 integral to membrane. Should be instead mapped to: GO:0005635 nuclear envelope?
  • Ribosomal Proteins
    • In the database we have many entries for ribosomal proteins, but perhaps we are not mapping them correctly. Understanding better the relationships between MF and BP could help.
    • Example: IPR000439 Ribosomal protein L15e. At the moment it is mapped as:
    • Process GO:0006412 translation
    • Function GO:0003735 structural constituent of ribosome
    • Component GO:0005840 ribosome
  • NOT qualifier
    • I think the example of the pseudokinase TRIBBLES should be a candidate for the NOT qualifier. It matches IPR000719 and IPR017442, protein kinase domains. And the presence of a kinase domain is characteristic of TRIBBLES, it is only that they have lost their catalytic activity, that’s why they are called pseudokinases.
    • A different case is IPR000014. It integrates 3 signatures. One of them, SMART, is the one giving problems as it hits false positives (Q9C1W9). This is a signature for a PAS domain (mapped to signal transduction), and Q9C1W9 is a DNA ligase. SM00091 hits 44433 proteins in total.


Minutes

Present:

  • Jane Lomax (GO)
  • Rebecca Foulger (GO)
  • Emily Dimmer (GOA)
  • Alex Mitchell (InterPro curation co-ordinator)
  • Amaia (InterPro curator)
  • Craig (InterPro Bioinformatician)
  • Siew-Yit (InterPro curator)

Useful Links