Talk:2010 GO camp Annotation of complexes issues
Conference call: April 27, 2010
Present: Silvia J., Serenella, Bernd (SIB), Tom (RGD), Sandra Orchard (IntAct), Emily (GOA), Pascale (DictyBase)
Could we possibly use IntAct to perform annotations ?
- IntAct are generating a collection of literature-mined stable and functionally defined molecular complexes for model organism species, complexes are linked to the experimental binding data. IntAct supplement their data with functional information (both free text and from GO) and consistent nomenclature to protein lists. Different variants of complexes will be captured with separate ids (e.g. the AP1 transcription factor, which is a homo/heterodimer that has approx. 16 different variants). Similarly different ribosomes and spliceosomes exist in multiple variations. However IntAct may need to make 'parent' complexes, which can group and be used before variants can be fully defined as in many cases, although the existence of alternative subunits is known, it is not understood what combinations exist and under what conditions.
IntAct: definition of complex: something that can be purified as an entity , ideally has a function. This data does not just come from co-ip which can pick up multiple complexes. Would not consider a substrate to be a member of a complex, similarly would consider only the relatively stable components.
Q: do transient complexes qualify?
A: (Sandra) grey area- for example cdk/cyclin is considered a complex - cdk/cyclin/substrate is not; would need to be looked at on a case-by-case basis.
GO: Any macromolecular complex composed of two or more polypeptide subunits, which may or may not be identical. Protein complexes may have other associated non-protein prosthetic groups, such as nucleotides, metal ions or other small molecules.
Possible action: Request that GO protein complex definitions should not include substrates, similarly re-iterate in curator guidelines that substrates should not be annotated to protein complex GO terms
Q: How does IntAct capture poorly characterized complexes? A: currently, IntAct are not focusing on these, however a possibility would be to develop CV terms (from PSI-MI) to label them as 'true' or 'composite' complexes, so IntAct can easily identify those that could be certainly defined as existing. This is work that needs to be done.
Q: How are different species captured?, Will IntAct be able to support all MODs? A: (Sandra) right now all species-specific complexes are kept separate; look for experimental support for a specific complex. It would be possibly later to generate complexes for other species by inference. We are currently work with external experts as to the reliability of inferring complex composition between species, as there is concern about the appropriateness of transferring complexes between species. While this issue wouldn't be such a problem for yeast, where IntAct are likely to be able to provide a full complement of complexes, this will be an issue for other species. IntAct has no species-limit, although model organisms are being focused on - this currently includes S.cerevisiae, mouse, rat, cow, Drosophila, C. elegans, S.pombe and Arabidopsis. Not so much from E.coli.
It would however be possible to propagate complexes between species, and IntAct are looking at using Ensembl Compara, such infered complexes would be marked as 'evidence from homology'
Swiss-Prot will be contributing towards this effort. However it is unlikely the group would be able to respond to all complex curation requests generated from GO Consortium groups.
Serenella: how would annotate to subunits vs complexes work?
Emily: would be possible to expand annotations from complexes, to all component subunits. This could be done automatically using the GOA-UniProtKB curation tool, would ensure consistency of annotations.
Sandra: would be happy to do this for all BP and CC terms, with MF, may need to be more conservative.
Are there other resources we should be looking into?
Q: Are there other resources we should be looking into?
A: PRO would be doing similar work. However would this only be for human and mouse? Sandra: may be possible to map between IntAct and PRO complexes Bernd: However the complexes in these different resources may not be identical - therefore could be complicated. Sandra: yes, would need to work together on this - but the effort would improve consistency.
Pascale: in the medium term, could GO protein complex terms be annotated to MF and BP terms? ie. the protein complex GO id would act as a DB_Object_ID in column 2.
Emily: this might be problematic. Could be confusion for users - and GO terms do not identify the subunits involved (esp. if we move towards a more function-orientated description). Unsure of benefit to users.
Emily: In the medium term, there could be annotations made directly to protein complexes (by those groups that have available resources), and papers providing protein complex information could be tagged and communicated to groups such as IntAct/PRO. Any annotations to protein complex ids in column 2 could then be expanded to the subunits (UniProtKB accessions or MGI ids), so that the annotations were made available using an id type most users would be familiar with. Groups that did not have access to protein complex ids, would only be able to continue to annotate to the subunits directly.
IntAct/PRO protein complex identifiers would be annotated to the GO protein complex terms as well.
However, as protein complex terms may in future only describe the functions of the complex, might we eventually end up in some redundancy in the annotation set? As the complex would be directly annotated to MF and BP terms. Therefore shouldn't annotations to these other GO terms be able to fully describe the GO protein complex definition? Concern regarding this redundancy might be for a future discussion.
ACTION ITEM [Sandra, Harold]: Produce mock ups to show how using Interaction databases would be used for GO. Please discuss how ALL MODs could be included (if they can't, what they would do).
Bernd: Although IntAct have already started to associate MF and BP terms with their protein complexes, quite often this evidence comes from review articles (unlike the experimental evidence cited to support protein complex composition)
Emily: This would only be able to generate TAS-evidenced annotations. Might be something to start with, and then for GO annotation groups to attempt to improve.
Q: does IntAct describe post-transcriptional modification in complexes? Sandra: yes, can describe this in IntAct (and link out to UniProt), where we can include a description of any PTMs and the AA they affect. However currently deciding not to split complexes based solely on differences in PTMs. Although could be convinced otherwise, if the right example comes along.
Bernd: GO can be used as very easy way to annotate proteins to complexes and propagate this data to other species. Easier than the resources currently used by Swiss-Prot. Complexes should be functional units, and this can differ from the raw experimental data which can sometimes differently include substrates and more transient subunits. Therefore curators need to be carefully to best represent the functional building blocks of the cell - might need to combine evidence from different papers for a full/conservative complex composition - as different papers will include different qualities/granularities of evidence.
Bernd: In GO there is the term VRK3/VHR/ERK complex (example 4), which includes a kinase that is likely to be a substrate. Sandra: Perhaps this should not be represented as a complex at all - its unlikely that the kinases and phosphatase included in this term are a complex - more likely to be a transient collection of proteins associating as part of a signalling pathway. Unlikely to be much evidence of biochemical functionality. Sandra: I've identified a number of GO terms which IntAct would not consider to be complexes - can assemble this as a list if required.
Action: Sandra to send this list of GO terms which IntAct would not consider to be complexes, so that GO could reconsider these terms
Pascale: concern regarding capturing protein interactions in MF annotations, where this data should be represented in the CC annotations as complexes. Emily: this kind of data would be very difficult to separate out. Our IntAct pipeline would provide us with protein-protein interactions which both result in the formation of a complex, as well as those which while binding, would not be considered to form a complex.
Pascale: for example, actin homodimerization activity. Should this MF term exist? It describes a complex - there seem to be terms that overlap between MF and CC ontologies. Similarly, structural molecule activity - isn't this more describing the involvement of a subunit in a complex than representing a MF? Something else to resolve.
We'll at definition of complexes (should they be defined by function rather than composition); provide guidelines about how CC -complex terms should be created.
- We need to propose a solution for interactions that regulate some MF/BP, but not necessarily part of the complex. Right now people might (a) create a new complex term; (b) use colocalizes with.
- Doodle with date of next meeting to be sent out to list
Action: all WG members should ensure they have added themselves to the protein complexes email list