Protein Complex Conference Call July15, 2015
Minutes from last call- http://wiki.geneontology.org/index.php/Protein_Complex_Conference_Call_June19,_2015
Matters arising from last meeting
- Export of annotations from CP (Complex Portal) to GO (Birgit):
- The last call made me think about the fundamentally different philosophies of the annotations between the CP and GO: In the CP we can annotate to a complex with full experimental evidence or a complex that is assumed to exist. In both cases we create a unique, stable identifier for the complex that can be reused by other DBs. The question is, should we export GO annotation we make in the CP into GO if the evidence is inferred? I got the feeling the answer should be 'no' for complexes with no evidence even in related species, however, it might be 'yes' for those inferred by ISS/ISO as the codes can be transferred into the GOA. This can easily be filtered using ECO annotation on the complexes. (NB: We have overall ECO annotation for the complex and separate ECO annotations for each GO annotation line)
- So, when it comes to mixed species evidence (see below) this distinction may become rather relevant!
- Integration of curation via P2GO: Sandra/Tony to sort out account if we go this route.
- Correcting binding to small molecules that are part of the complex: Birgit still to fix annotations in the CP.
Bumped from last meeting
Annotating to non-protein molecule binding
Example: Maltose transport complex http://www.ebi.ac.uk/intact/complex/details/EBI-6477643 where I did annotate with the maltose binding term although maltose is an integral part of the complex. On the other hand, the ATP binding annotation is valid in any case, as ATP is not part of the complex.
Telomerase catalytic core complex http://www.ebi.ac.uk/intact/complex/details/EBI-10045577 where I didn't annotate with an RNA binding term as the RNA is part of the complex. But as it binds the telomeric DNA I have THAT MF binding term!
- We thought this was decided last time but it wasn't clear to some people who were not on the call. Does it need better documenting?
- Ruth (copied form email thread): The truth is that there are very few examples where a protein is binding something because it is the function of a protein to bind something (other than enzymes binding their substrate). Many interactions are occurring because these are necessary interactions in order for a protein to carry out its ‘function’ or for its function to be regulated. I do appreciate that the binding terms are listed under the MF ontology and possibly it would make everyone's life easier if we actually had 4 ontologies: CC, BP, MF and Interactions! Then the interactions could, when appropriate, have part_of relationships to specific functions and processes (note I haven’t actually thought this through, it is just that I seem to often have discussions which start with the concept that binding isn’t a function).
- Defs in GO: Nancy pointed out that the def for the maltose transporter does not mention the maltose. Do we need to change the defs for those complexes where the no-protein entity is part of the complex but not mentioned so that CP and GO are aligned?
- Should the GO entry have a xref to the molecule in ChEBI?
Mixed species evidence
- Ruth/Birgit: How do you handle human proteins expressed in mouse, mixed species? Something to think about!
- Birgit: We have a case I have been working on with Nancy. In that case the 2 sequences were identical, so we used the mixed-species evidence for both complexes. In another case the similarity if only ~50% but the proteins are thought to be homologuous. But we need to decide on a similarity cut-off.
Inferring annotations between species
- In the CP we infer between closely related spp, eg. mouse and human, if the orthology is 1:1 and the sequences 'fairly identical' but we haven't put a cut-off in yet.
- We use ISO for orthologous proteins (between species) and ISS when the proteins are paraloguous, usually within the same protein family.
- Ruth: Does it make sense to have process annotations with IPI?
Val : We do it, not very often, but sometimes it is the best evidence we have. See for example: http://curation.pombase.org/pombe/curs/1ae670cde3faca1e/ro/ (PMID:26122634) Seems reasonable to me......(we have only 80 examples I think they could all also be ISO to SGD. I prefer IPI with a paper if the author intent of detecting the conserved interaction was to demonstrate the likelihood an additional proteins involvement in a conserved process.
Curating in protein2GO
- IntAct will continue to curate in their portal because it takes some time for these complex_IDs to make it to protein2GO.
- May be in a year's time the pipeline can be optimized.
Binding small molecules
- Peter asked if IntAct can capture the specifics of covalently vs non-covalently bound small molecules.
- Sandra mentioned that they can.
- Maltose transport complex- should the definition be modified in GO?
- No. The definition is based on the function and not composition. The logical definition does include the detail about transporting maltose.
- If in doubt, the curator can follow the link to the Complex Portal and see if the small molecule has been defined as an integral part of the complex.
Mixed experimental evidence:
- If there is a crystal structure of a complex where a human protein is complexed with a mouse and a rat protein (3 proteins altogether), how do we capture this?
- NOW: IntAct curates these and uses the crystal as evidence only if the similarity is (nearly) 100%.
- NOW: If similarity is not (nearly) 100% the complex in the CP has ECO:0000306 (inference from background scientific knowledge used in manual assertion). These complexes could be flagged as 'do-not-export'.
- Should the corresponding complex be captured in IntAct with ECO:0000088 (biological system reconstituted)? Or ISO? (because two subunits are going to be inferred based on similarity to mouse and rat)
- Birgit: ECO:0000088 implies that all of the components are known in one system but here our evidence is mixed-species...?
- What should the evidence be in GO for CC, MF and BP? Since there is no direct experimental evidence for the complex in any one species, this may not be annotatable by GO?
- Example 1: 2x mouse, 1x rat
- As mouse and rat orthologs were 100% identical so we used the structure as evidence for both species and then inferred to human from the mouse entry.
- EM structure: http://www.ebi.ac.uk/pdbe/entry/emdb/EMD-2013/
- CP entry http://www.ebi.ac.uk/intact/complex/details/EBI-10817173 (mouse)
Inference via similarity:
- CP routinely infers complexes between related species (eg human/mouse) or paralogs within a species:
- We do want to annotate (using GO) to complexes that are inferred by similarity but we should be able to trace the function back to some experimental evidence. That is not the case in the reconstitution example (above).
- These complex ID can be exported to GO and tagged with ISS/ISO.
- This is conceptually the same as using inferred gene products as annotation objects (Val).
- We won't set a fixed similarity cut-off value. It's more important to know that the orthologs have been shown to have some function, too.
- Example 2: 60% similarity
- The shelterin complex has one protein (ADC) that has only 60% similarity between human and mouse (curation in progress so no CP link available)
- UniProt align: http://www.uniprot.org/align/A201507162HHGWPKF0R
- Because the genes are identified as orthologs we curated the human complex and infer to mouse by ISO.
IPI for BP
- Rama: It doesn't make sense to make a Process annotation with IPI. Just because proteinA interacts with proteinB doesn't mean it is involved in the same process.
- Val: It should be okay to make Process annotations with IPI. She will send some examples.