2010 GO camp Annotation of complexes issues

From GO Wiki
Jump to navigation Jump to search

1. Background

Pascale: Problems with annotation to complexes :

  1. What would be good evidence for membership in a complex? curators are not always consistent when assigning a gene products to a complex; often an interaction with any member of a complex is used to assign membership to that complex.
  2. 'colocalizes with' may be overused? (Need to get stats on annotations). When a protein is found in proximity with a known complex, it may be annotated to 'colocalizes with'. Given that the qualifier is likely to get dropped, this may be misleading. And more generally, those annotations may not be really informative.
  3. There are a number of problems with the ontology itself: some term definitions, very often related to species-dependent complex composition. Also, some terms are rather questionable (there are some examples of questionable complexes below). The goal of the annotation camp is not to rework the ontology (although Chris and Jane will be present, so there may be an opportunity to do some of that); but we want to discuss the ontology because Swiss-Prot has been annotating complexes and we'd like to hear how they do it, and what suggestions they have to improve GO.


2. Review of current GO annotation practices

Pascale

  1. Annotating to 'x complex' (CC) and 'x complex binding' (MF). For example P10809 is annotated to cell surface (CC) and cell surface binding (MF).
  2. Can IPI be used to annotate molecular function? For example ribosomal subunits are annotated to 'structural constituent of ribosome' (a function)
  3. 'Structural components activity' function terms: some ribosome proteins are annotated to 'structural constituent of ribosome' by IDA (for example SGD:S000005760); but the paper (PMID: 18782943) only shows purification of the complex.
  4. Chromatin? See NIPBL discussion from Feb 22 jamboree. The general question is when is a protein part of a complex or organelle or when it binds to it?
  5. (Rama:) Interpretation of author statements: Authors very often use the word complex when two proteins are interacting, but there may not be enough evidence to show the existence of a stable complex. Should curators not add another layer of critiquing and request/annotate to the complex? How far should curators go with interpreting evidence and when should we let go thinking the reviewers are okay with that language, let us go with that.


Emily:

  1. UniProtKB-GOA would like to start investigating the possibility of directly annotating to protein complex identifiers. These would be identifiers provided by the IntAct group, who have begun to generate a collection of literature-mined stable and functionally defined molecular complexes for model organism species. IntAct supplement their data with functional information (from GO) and consistent nomenclature to protein lists, also links to experimental evidence.

Example of the data IntAct is capturing for thier protein complexes: http://www.ebi.ac.uk/intact/pages/details/details.xhtml?interactionAc=EBI-2307627

  1. The gene association format does currently allow the annotation of protein complexes directly, as annotation objects in column 2, although no group (as far as I'm aware) have yet done so (see definition of DB Object Type (column 12) in:http://www.geneontology.org/GO.format.gaf-1_0.shtml).
  2. UniProtKB-GOA are currently considering using the protein complex identifier as a principal annotation object in GO annotations (column 2) that refer to the biological processes or molecular functions carried out specifically by a complex. However it may also be useful to be able to automatically 'expand' these complex id annotations so that individual subunits could be annotated to with the same GO terms. This may be of use, as users are less likely to be familiar with protein complex identifiers. If this was the case, then the protein complex id should still be captured somewhere in the annotation format (possibly col 16?!!)



  1. If external resources are being developed that accurately define the membership of a complex (e.g IntAct and PRO efforts), should GO focus on defining generic complexes in a function-centric manner - these complex terms which could be used where a complex's membership has not been fully characterized. Currently GO inconsistently defines complexes both in terms of their subunit composition (e.g. SMAD2-SMAD3 protein complex; GO:0071144) or in terms of their function (e.g. GARP complex; GO:0000938)

SP (Bernd):

  1. Defining GO complex terms is currently part of the SP annotation process. It aims at describing evidenced complex assemblies as functional units. This applies for generic complexes and characterized complexes. If a complex is evolutionary conserved SP is copying the complex term with 'ISS_curator' to the subunits from other species. Doing this requires a term definition that is not species-specific -a condition that is often not met by existing GO terms.
  2. Annotating GO terms to complexes as db objects is reasonable. There are instances in which GO terms could not be assigned to individual subunits but to the complex. Examples 1: activin / inhibin complexes -experiments have done with specific assemblies. Example 2: Guanylate cyclase - catalytic activity is demonstrated only for the complex not for its subunits.
  3. The proposal that GO should concentrate on function-centric generic complexes and leave the full characterization of complex membership to specialized databases, such as IntAct and PRO, is in line with the independent development that Swiss-Prot will investigate the feasibility of adding to the mentioned IntAct complex dataset.
  4. Focusing on function-centric generic complexes, GO would loose the ISS_curator propagation for characterized complexes. The specialized databases could capture these later through automatic procedures. Anyway, it appears that at the moment only Swiss-Prot is consistently using the current system provided by GO.
  5. Focusing on function-centric generic complexes would imply that each time encountering a fully characterized complex GO annotators will forward the info to the complex db for further processing, i.e. creating the database object and subsequently performing complex-centric annotation. To my experience, especially for large complexes, the BP or MF is often experimentally evidenced for some, or may be one, crucial complex member and is therefore deduced for the complex/other subunits. E.g. while IntAct is annotating BP and MF GO terms to the complex these are not linked to a pmid for the exp evidence. pmids linked in IntAct to the complex rather point to review papers. QUESTION: Should this be reflected in GO? Does it meet Emilies proposal 'for a possibly col 16'?
  6. Having a centralized resource for characterized complexes could help to unify the standards regarding complex membership which appears not to be handled consistently in GO. The external dbs currently assigning identifiers to complexes, IntAct and PRO, might try to define these standards.
  7. Annotating to generic complexes might require guidelines for the term definitions. See example 3 GO:0071565 nBAF complex: Is it useful to give all gene names in the definition what rather indicates a characterized complex (not the case !) ? Might it be useful to indicate somehow 'the term refers to a generic complex'? GO:0071565 is one of several childs of GO:0070603 SWI/SNF-type complex which I guess all represent generic complexes, but the relationship is not clear. E.g. GO:0035060 brahma complex, with def. 'A SWI/SNF-type complex that contains the ATPase product of the Drosophila brahma gene, or an ortholog thereof.' should have other complexes such as nBAF as child.
  8. Though it might turn out that the question of complex membership is redirected to external dbs, I give some examples for GO complexes with potential problems: Example 4: GO:0046881 VRK3/VHR/ERK complex, Example 5: GO:0070209 ASTRA complex, Example 6: GO:0005955 calcineurin complex

Pascale

  1. MF and BP GO terms related to the subunit compositions are not intuitive MF's or BP's. In some cases the dimerization leads to a change in protein activity; but the terms are often used to capture the fact that the protein is a dimer (for example.)
  • GO:0046983 : protein dimerization activity (MF)
  • GO:0042803 : protein homodimerization activity (MF)
  • GO:0007261 : JAK-induced STAT protein dimerization (BP)
  • GO:0051262 : protein tetramerization (BP)

Serenella: Let's say that there is an external reliable database which takes care of the complexes (IDs, memberships, ...).

Then we would have the possibility to (?):

  1. Annotate at complex level (Complex ID in column 2) when the BP(s) and/or the MF(s)is characterized at the complex level (not for just one or some of its subunits). [The external complex database could retrieve 'automatically' the GO annotations done at the complex level. Go should also import the manual GO annotations done by the external database (if they follow strictly the same rules as GO)]
  2. Annotate all the members with a term such as 'component of complex' and the ID of the complex in the 'with' column.
  3. Annotate each subunit with the BP(s) and MF(s) experimentally proven for that specific subunit.
  4. The fact that the complex is an homotrimer, heterodimer, etc...is included in the complex 'definition' and should not be described with a MF or BP term in my opinion.

Rules: A. Should the MF(s) and/or the BP(s) of the complex be 'expanded' to its individual members ?

  • MF: No. (catalytic, regulatory, DNA-binding, etc... specific subunits).
  • BP: Only if the complexes are defined sufficiently precisely (Same BP(S) for the exact same composition).

B. Propagation: Could the MF(s) and BP(s) be propagated ?

  • At the protein level? This is already done.
  • At the complex level? ?

3. Proposed annotation policy

4. Examples (papers) and discussion of GO annotation issues

SP (Bernd)

Example 1: activin / inhibin complexes
Background: (taken from SP) Inhibins and activins inhibit and activate, respectively, the secretion of follitropin by the pituitary gland..... Inhibin A is a dimer of alpha and beta-A. Inhibin B is a dimer of alpha and beta-B. Activin A is a homodimer of beta-A. Activin B is a homodimer of beta-B. Activin AB is a dimer of beta-A and beta-B. Consistently
P09529 INHBB Homo sapiens Inhibin beta B chain is annotated two both BP terms:
GO:0046882 negative regulation of follicle-stimulating hormone secretion
GO:0046881 positive regulation of follicle-stimulating hormone secretion

similarily
P08476 INHBA Homo sapiens Inhibin beta A chain
is annotated according pmid:11948405 to
GO:0008285 negative regulation of cell proliferation
But the experiments have been explicitly done with activin A what can not be captured by the current GO annotation (with proteins as objects).

Example 2: Guanylate cyclase
Background: (taken from SP) [GUCY1A2] Has guanylyl cyclase on binding to the beta-1 subunit. Heterodimer of an alpha and a beta chain.

AC P33402
DE RecName: Full=Guanylate cyclase soluble subunit alpha-2;
DE EC=4.6.1.2;
GN Name=GUCY1A2; Synonyms=GUC1A2, GUCSA2;
CC -!- CATALYTIC ACTIVITY: GTP = 3',5'-cyclic GMP + diphosphate.
DR GO; GO:0004383; F:guanylate cyclase activity; TAS:ProtInc.

AC Q02153
DE RecName: Full=Guanylate cyclase soluble subunit beta-1;
DE EC=4.6.1.2;
GN Name=GUCY1B3; Synonyms=GUC1B3, GUCSB3, GUCY1B1;
CC -!- CATALYTIC ACTIVITY: GTP = 3',5'-cyclic GMP + diphosphate.
DR GO; GO:0004383; F:guanylate cyclase activity; TAS:ProtInc.

MF GO:0004383 should be annotated to the complex not its subunits.

Example 3: GO:0071565 nBAF complex
def: A SWI/SNF-type complex that is found in post-mitotic neurons, and in human contains actin and proteins encoded by the ARID1A/BAF250A or ARID1B/BAF250B, SMARCD1/BAF60A, SMARCD3/BAF60C, SMARCA2/BRM/BAF190B, SMARCA4/BRG1/BAF190A, SMARCB1/BAF47, SMARCC1/BAF155, SMARCE1/BAF57, SMARCC2/BAF170, DPF1/BAF45B, DPF3/BAF45C, ACTL6B/BAF53B genes. The nBAF complex along with CREST plays a role regulating the activity of genes essential for dendrite growth.

The cited composition appears wrong to me, as there are several subunits that are probably present in a mutually exclusive fashion, like BRM and BRG1, and BAF60A and BAF60C. Is rather a generic complex then. Useful definition?

Example 4: GO:0046881 VRK3/VHR/ERK complex
def: A ternary complex consisting of VRK3 , VHR (Dusp3), and ERK1 (Mapk3) existing in neuronal cells. and is involved in regulation of the ERK signaling pathway.

Mentioned by Harold (who requested term SF 2958912) No proteins annotated yet, pmid 16845380.

It appears to me a borderline case for a complex at all. There is no purification, but some coip experiments with western-blot read outs well indicating that the three proteins can assemble and act together. (Also I guess that the ERK antibodies are not specific.) To me it is rather VRK3:VHR that is the functional unit (and acts on ERK). QUESTION: Do we want to capture a complex (GO:0046881) like this? Is the view of VRK3:VHR too 'mechansitic'?

Example 5: GO:0070209 ASTRA complex
def:A protein complex that is part of the chromatin remodeling machinery; the acronym stands for ASsembly of Tel, Rvb and Atm-like kinase. In Saccharomyces cerevisiae this complex includes Rvb1p, Rvb2p, Tra1p, Tel2p,Asa1p, Ttilp and Tti2p.

The ASTRA complex was annotated in GO by SGD and GeneDB_Spombe (apparently two groups at two dates, means presumably two people read the same paper etc, which btl is not very efficient).
PMID: 19040720
'Our approach to charting proteomic environments relies upon the sequential use of TAP and mass spectrometry to identify stable protein assemblies. In a typical TAP pull-down experiment, LC-MS/MS analysis identified over 500 proteins containing stoichiometric and transient bona fide protein interactors, along with a large number of background proteins of diverse origin and abundance. To dissect the composition of complexes, we employed a layered data mining approach. First, we sorted out common background proteins and then distinguished proteins specifically enriched in the TAP isolation using semi-quantitative estimates of their abundance (Figure 1).'

It appears to me that in a proteomics approach they performed several rounds of IP with selected baits followed by an statistical analysis to identify complexes. QUESTION: Is this enough to describe a characterized complex? with a function?


Example 6: GO:0005955 calcineurin complex
def.: A heterodimeric calcium ion and calmodulin dependent protein phosphatase composed of catalytic and regulatory subunits; the regulatory subunit is very similar in sequence to calmodulin.

MGI annotated to the complex:
P10417 Apoptosis regulator Bcl-2 Bcl2 PMID:12617961 IDA MGI

Paper title: Calcium-dependent interaction of calcineurin with Bcl-2 in neuronal tissue. Appears to be a wrong annotation to me -or represents a different view on the complex issue. Same problem for GO:0000159 protein phosphatase type 2A complex (annotated P10417 Bcl2 PMID:16717086 IDA MGI).


Example 7: AP1 complex
(Sandra Orchard during discussion): Has at least two forms

5. Suggestions for Quality Control procedures


Back to 2010_GO_camp_Meeting_Agenda