- NOTE: This is a work in progress. It needs to be wrapped up, and revised by editors, Becky and Birgit. Also, we need to add examples - what works and what doesn't.
Background and rationale
Recently, GO and IntAct have started to work together to improve the 'protein complex' branch in GO, making it less flat and more informative, and to provide species-agnostic GO terms that IntAct can reference to for their species-specific curation projects (at the time of writing, yeast, fly and human) [Birgit, is this correct?]. Here, we collect current guidelines on protein complex terms, to aid GO curators in discerning whether a protein complex belongs in GO or not, and if yes, in including all necessary information when requesting a new protein complex.
Protein complexes in GO
Rule n. 1: Is the complex stable?
- How do we view protein complexes in GO. The complex should be stable. If not stable, it's just protein binding. I'm sure we had something written down for this - Birgit? Do GO and IntAct guidelines agree on this.
[Birgit] Yes, we, complexes curators, concentrate on stable complexes. They should have experimental evidence with regards to their subunit composition AND function in vivo (see desirable MF links above). NB: Our new editor now allows for adding PMID and ECO codes for the GO xrefs so in future we can export thse more easily :)
The Complex Portal could also hold transient complexes, e.g. signaling complexes that form for only split seconds but have some experimental evidence that they exist. We haven't done any of these but they are possible.
We can also curate complexes that have no full experimental evidence but are commonly regarded as truly real, e.g. complexes submitted by ChEMBL for which we only have pharmacological evidence. These complexes are tagged with ECO:0000306 - inferred from background scientific knowledge by manual assertion
Rule n. 2: Is the complex species-agnostic?
- GO should host species-agnostic complexes, ideally conserved across taxa. Where this isn't known, still make the def generic, and add 'For example, in human this complex contains...' as a def gloss or def comment. Species-specific complexes don't belong in GO, but rather in IntAct and/or PRO (or just IntAct?).
[Birgit] Yes, our complexes are all species-specific but we make the GO terms agnostic (s. above)
Does the complex have a molecular function?
- Ideally, add capable_of functions link. If not possible, see if capable_of_part_of process links can be made. If none is applicable, we do host complexes based on their subunits only. (See below.)
[Birgit] Yes, see summary above.
Is the complex known to be involved in one or more biological processes?
If yes, add capable_of_part_of process links.
Does the complex contain conserved subunits?
GO does host complexes based on their subunits only, when no function or process information is available.
Where is the complex located?
- Indicate cellular location as specifically as possible, unless parent already has one
[Birgit] Yes! And the CC is for the complex as a whole. We discussed this in the context of transmembrane complexes with members that are only located on one side of the membrane or have no membrane attachment at all. As gene products have the part_of relationship with the complexes this is apparently fine (and the only way of reflecting the CC for the complex)
How to request protein complexes in GO based on the above (TG template, TG freeform)
Emily started documentation here, in case it's helpful, but this wasn't worked on since 2011: http://wiki.geneontology.org/index.php/Protein_Complex_ids_as_GO_annotation_objects
[Birgit] Inheritance of annotations: I agree with the wiki, you cannot inherit MF from a complex to a subunit and even a CC is problematic, see the transmembrane example above. This needs more thinking about. I don't know what you are doing right now...
Orthologies: We infer within taxon groups, e.g. human to mouse to rat or any other mammal etc, depending on where the exp evidence comes from. We systematically infer human-mouse. We have a few pombe complexes inferred from yeast (Sc!) but we don't do it systematically.
Paralogues: We make inferences between related complexes in the same species when the gene products are very similar, e.g. hemoglobin chains for adult and developmental complexes.
'Large' complexes: We have tackled the 'mediator' and we can now link to RNACentral for RNAs so time permitting we'll tackle the 'biggies' soon!
Pro: We have a list of Pro complexes that we consult for refs.
- What IntAct is doing - a summary:
[Birgit] We didn't draw up an official set of rules but in summary this is what we do (and it pretty much matches what Paola says below and the wiki she cites): A complex should be taxon agnostic but may be restricted to certain taxonomic groups, such as pro- vs eukaryotes. ... should contain subunits in the def ... should have a 'as precise as possible' part_of relationship to the CC (may have to create new terms here as well of course!) which can be a complex (in cases of subcomplexes) or a location ... have, if possible, capable_of and capable_of_part_of annotation extensions. ... should have is_a relationship to an appropriate child term of 'protein complex'. This could be a term based on it's composition or function but NOT based on the PB. If no appropriate term exists, we create one based on either of the two classes. There is now a TG template for creating complex-by-MF which make curators' life much easier :) If there is no appropriate CC or complex-by-MF parent the new complex will be a direct child of 'protein complex'.
IntAct Complex portal, http://www.ebi.ac.uk/intact/complex/