2010 GO camp Annotation of complexes issues: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
(14 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Category:Protein Complex]] [[Category:Workshops]]
=1. Background=
=1. Background=
Emily, Pascale: <br>
Emily, Pascale: <br>
Line 241: Line 242:
===Third point to resolve: Molecular Functions:  Is 'contributes_to' useful? ===
===Third point to resolve: Molecular Functions:  Is 'contributes_to' useful? ===


'''Proposal: Drop the 'contributes to' qualifier
# annotate to MF for complexes where more than one subunits provides the MF
# annotate to MF = unknown for subunits with no known function;
# when it is not known which subunit(s) performs the catalytic activity, annotate to all. Update when there is more data (this is how SP annotates EC numbers)'''
----
Current problem with the 'contributes_to' qualifier is that is used with different meanings.
*(a) when an active site is distributed between gene products, like the F1/F(o) ATPase or DNA polymerase and
*(b) when a gene product contributes_to the function of a catalytic subunit in the same complex (ie, a regulatory subunit, for example), which depletes its usefulness
*# '''member_of_complex_having [MF]''' : when you don't know what role a protein is playing in a complex, for example ribosomal proteins
*# '''required_for''' : complex has MF provided by A, B and C (all necessary) (similar to example 'Guanylate cyclase' from wiki page)                     
*# '''sufficient_for''' : complex has MF provided by A and B (sufficient, C not required) ; C would be annotated to  either 'unknown' or 'member_of_complex_having [MF/BP]'                     
----


[[File:PolIII.jpg|right|300px]]  
[[File:PolIII.jpg|right|300px]]  
Line 253: Line 267:
* Given that we don't expect users to be using qualifiers very much, wouldn't it be desirable to remove the 'contributes_to' qualifier?  
* Given that we don't expect users to be using qualifiers very much, wouldn't it be desirable to remove the 'contributes_to' qualifier?  


----
If not - here are some suggestions:
* Current problem with the 'contributes_to' qualifier is that is used with different meanings.
**(a) when an active site is distributed between gene products, like the F1/F(o) ATPase or DNA polymerase and
**(b) when a gene product contributes_to the function of a catalytic subunit in the same complex (ie, a regulatory subunit, for example), which depletes its usefulness
*# '''member_of_complex_having [MF]''' : when you don't know what role a protein is playing in a complex, for example ribosomal proteins
*# '''required_for''' : complex has MF provided by A, B and C (all necessary) (similar to example 'Guanylate cyclase' from wiki page)                     
*# '''sufficient_for''' : complex has MF provided by A and B (sufficient, C not required) ; C would be annotated to  either 'unknown' or 'member_of_complex_having [MF/BP]'                     
----
----


Line 275: Line 280:
* versus BP: protein oligomerization and children (protein homooligomerization , protein tetramerization, etc)
* versus BP: protein oligomerization and children (protein homooligomerization , protein tetramerization, etc)
* versus CC annotations
* versus CC annotations
** Example: [http://www.uniprot.org/uniprot/P13716 ALAD Human]; "Human PBGS purifies with eight Zn(II) per homo-octamer" PMID: 11032836
** '''Proposal: move the term ' protein oligomerization' to cellular component ontology as 'protein oligomer'


* Those terms represent related entities; so they should be in a single ontology if the ontologies are supposed to be orthogonal. It seems bad ontological practice, and leads to redundant annotations to F/P/C.  
* Those terms represent related entities; so they should be in a single ontology if the ontologies are supposed to be orthogonal. It seems bad ontological practice, and leads to redundant annotations to F/P/C.  
Line 346: Line 353:


'''All in agreement.'''
'''All in agreement.'''
- Emily: can ICM be transferred between complex subunits by ISS?
 
- Emily: can ICM-evidenced annotations be transferred between complex subunits by ISS?


- Pascale: should be able to propagate this information to other species.
- Pascale: should be able to propagate this information to other species.
Line 359: Line 367:


- Summary: ICM annotations can be transferred by ISS - with the caution that curators need to determine that the complex subunits are sufficiently conserved between species.
- Summary: ICM annotations can be transferred by ISS - with the caution that curators need to determine that the complex subunits are sufficiently conserved between species.
[[Category:Protein Complex]]

Revision as of 11:46, 12 April 2019

1. Background

Emily, Pascale:

  1. We need to agree on what is a complex, and when complex terms should be created.
    • IntAct: can be purified as an entity, that ideally has a function. (This data does not just come from co-ip which can pick up multiple complexes.) Would not consider a substrate to be a member of a complex, similarly would consider only the relatively stable components.
    • GO:0043234 : protein complex [show def]: Any macromolecular complex composed of two or more polypeptide subunits, which may or may not be identical. Protein complexes may have other associated non-protein prosthetic groups, such as nucleotides, metal ions or other small molecules.
      • This definition is probably too broad and results in too many terms being created.
    • PRO, the Protein Ontology, uses the GO definition.


Pascale:
Problems with annotation to complexes :

  1. What would be good evidence for membership in a complex? curators are not always consistent when assigning a gene products to a complex; often an interaction with any member of a complex is used to assign membership to that complex.
  2. 'colocalizes with' may be overused? (Need to get stats on annotations). When a protein is found in proximity with a known complex, it may be annotated to 'colocalizes with'. Given that the qualifier is likely to get dropped, this may be misleading. And more generally, those annotations may not be really informative.
  3. There are a number of problems with the ontology itself: some term definitions, very often related to species-dependent complex composition. Also, some terms are rather questionable (there are some examples of questionable complexes below). The goal of the annotation camp is not to rework the ontology (although Chris and Jane will be present, so there may be an opportunity to do some of that); but we want to discuss the ontology because Swiss-Prot has been annotating complexes and we'd like to hear how they do it, and what suggestions they have to improve GO.

2. Review of current GO annotation practices

Pascale

  1. Annotating to 'x complex' (CC) and 'x complex binding' (MF). For example P10809 is annotated to cell surface (CC) and cell surface binding (MF).
  2. Can IPI be used to annotate molecular function? For example ribosomal subunits are annotated to 'structural constituent of ribosome' (a function)
  3. 'Structural components activity' function terms: some ribosome proteins are annotated to 'structural constituent of ribosome' by IDA (for example SGD:S000005760); but the paper (PMID: 18782943) only shows purification of the complex.
  4. Chromatin? See NIPBL discussion from Feb 22 jamboree. The general question is when is a protein part of a complex or organelle or when it binds to it?
  5. (Rama:) Interpretation of author statements: Authors very often use the word complex when two proteins are interacting, but there may not be enough evidence to show the existence of a stable complex. Should curators not add another layer of critiquing and request/annotate to the complex? How far should curators go with interpreting evidence and when should we let go thinking the reviewers are okay with that language, let us go with that.


Emily:

  1. UniProtKB-GOA would like to start investigating the possibility of directly annotating to protein complex identifiers. These would be identifiers provided by the IntAct group, who have begun to generate a collection of literature-mined stable and functionally defined molecular complexes for model organism species. IntAct supplement their data with functional information (from GO) and consistent nomenclature to protein lists, also links to experimental evidence.

Example of the data IntAct is capturing for thier protein complexes: http://www.ebi.ac.uk/intact/pages/details/details.xhtml?interactionAc=EBI-2307627

  1. The gene association format does currently allow the annotation of protein complexes directly, as annotation objects in column 2, although no group (as far as I'm aware) have yet done so (see definition of DB Object Type (column 12) in:http://www.geneontology.org/GO.format.gaf-1_0.shtml).
  2. UniProtKB-GOA are currently considering using the protein complex identifier as a principal annotation object in GO annotations (column 2) that refer to the biological processes or molecular functions carried out specifically by a complex. However it may also be useful to be able to automatically 'expand' these complex id annotations so that individual subunits could be annotated to with the same GO terms. This may be of use, as users are less likely to be familiar with protein complex identifiers. If this was the case, then the protein complex id should still be captured somewhere in the annotation format (possibly col 16?!!)



  1. If external resources are being developed that accurately define the membership of a complex (e.g IntAct and PRO efforts), should GO focus on defining generic complexes in a function-centric manner - these complex terms which could be used where a complex's membership has not been fully characterized. Currently GO inconsistently defines complexes both in terms of their subunit composition (e.g. SMAD2-SMAD3 protein complex; GO:0071144) or in terms of their function (e.g. GARP complex; GO:0000938)

SP (Bernd):

  1. Defining GO complex terms is currently part of the SP annotation process. It aims at describing evidenced complex assemblies as functional units. This applies for generic complexes and characterized complexes. If a complex is evolutionary conserved SP is copying the complex term with 'ISS_curator' to the subunits from other species. Doing this requires a term definition that is not species-specific -a condition that is often not met by existing GO terms.
  2. Annotating GO terms to complexes as db objects is reasonable. There are instances in which GO terms could not be assigned to individual subunits but to the complex. Examples 1: activin / inhibin complexes -experiments have done with specific assemblies. Example 2: Guanylate cyclase - catalytic activity is demonstrated only for the complex not for its subunits.
  3. The proposal that GO should concentrate on function-centric generic complexes and leave the full characterization of complex membership to specialized databases, such as IntAct and PRO, is in line with the independent development that Swiss-Prot will investigate the feasibility of adding to the mentioned IntAct complex dataset.
  4. Focusing on function-centric generic complexes, GO would loose the ISS_curator propagation for characterized complexes. The specialized databases could capture these later through automatic procedures. Anyway, it appears that at the moment only Swiss-Prot is consistently using the current system provided by GO.
  5. Focusing on function-centric generic complexes would imply that each time encountering a fully characterized complex GO annotators will forward the info to the complex db for further processing, i.e. creating the database object and subsequently performing complex-centric annotation. To my experience, especially for large complexes, the BP or MF is often experimentally evidenced for some, or may be one, crucial complex member and is therefore deduced for the complex/other subunits. E.g. while IntAct is annotating BP and MF GO terms to the complex these are not linked to a pmid for the exp evidence. pmids linked in IntAct to the complex rather point to review papers. QUESTION: Should this be reflected in GO? Does it meet Emilies proposal 'for a possibly col 16'?
  6. Having a centralized resource for characterized complexes could help to unify the standards regarding complex membership which appears not to be handled consistently in GO. The external dbs currently assigning identifiers to complexes, IntAct and PRO, might try to define these standards.
  7. Annotating to generic complexes might require guidelines for the term definitions. See example 3 GO:0071565 nBAF complex: Is it useful to give all gene names in the definition what rather indicates a characterized complex (not the case !) ? Might it be useful to indicate somehow 'the term refers to a generic complex'? GO:0071565 is one of several childs of GO:0070603 SWI/SNF-type complex which I guess all represent generic complexes, but the relationship is not clear. E.g. GO:0035060 brahma complex, with def. 'A SWI/SNF-type complex that contains the ATPase product of the Drosophila brahma gene, or an ortholog thereof.' should have other complexes such as nBAF as child.
  8. Though it might turn out that the question of complex membership is redirected to external dbs, I give some examples for GO complexes with potential problems: Example 4: GO:0046881 VRK3/VHR/ERK complex, Example 5: GO:0070209 ASTRA complex, Example 6: GO:0005955 calcineurin complex

Pascale

  1. MF and BP GO terms related to the subunit compositions are not intuitive MF's or BP's. In some cases the dimerization leads to a change in protein activity; but the terms are often used to capture the fact that the protein is a dimer (for example.)
  • GO:0046983 : protein dimerization activity (MF)
  • GO:0042803 : protein homodimerization activity (MF)
  • GO:0007261 : JAK-induced STAT protein dimerization (BP)
  • GO:0051262 : protein tetramerization (BP)

Serenella: Let's say that there is an external reliable database which takes care of the complexes (IDs, memberships, ...).

Then we would have the possibility to (?):

  1. Annotate at complex level (Complex ID in column 2) when the BP(s) and/or the MF(s)is characterized at the complex level (not for just one or some of its subunits). [The external complex database could retrieve 'automatically' the GO annotations done at the complex level. Go should also import the manual GO annotations done by the external database (if they follow strictly the same rules as GO)]
  2. Annotate all the members with a term such as 'component of complex' and the ID of the complex in the 'with' column.
  3. Annotate each subunit with the BP(s) and MF(s) experimentally proven for that specific subunit.
  4. The fact that the complex is an homotrimer, heterodimer, etc...is included in the complex 'definition' and should not be described with a MF or BP term in my opinion.

Rules:

A. Should the MF(s) and/or the BP(s) of the complex be 'expanded' to its individual members ?

  • MF: No. (catalytic, regulatory, DNA-binding, etc... specific subunits).
  • BP: Only if the complexes are defined sufficiently precisely (Same BP(S) for the exact same composition).

B. Propagation: Could the MF(s) and BP(s) be propagated ?

  • At the protein level? This is already done.
  • At the complex level? ?


Bernd (response to Serenella, above proposed rules B)

See an kinase example: yeast PHO80-PHO85 kinase complex IntAct EBI-2354002 with UniProt P17157, P20052
Intact annotated to the complex

GO:0006468 P:protein amino acid phosphorylation;
GO:0043433 P:negative regulation of transcription factor activity
GO:0045936 P:negative regulation of phosphate metabolic process
GO:0004693 F:cyclin-dependent protein kinase activity

It appears to me that GO:0006468 (BP) might be valid for both subunits but GO:0045936 (BP) cannot be expanded to the catalytic subunit. Correct?

3. Proposed annotation policy

4. Examples (papers) and discussion of GO annotation issues

SP (Bernd)

Example 1: activin / inhibin complexes
Background: (taken from SP) Inhibins and activins inhibit and activate, respectively, the secretion of follitropin by the pituitary gland..... Inhibin A is a dimer of alpha and beta-A. Inhibin B is a dimer of alpha and beta-B. Activin A is a homodimer of beta-A. Activin B is a homodimer of beta-B. Activin AB is a dimer of beta-A and beta-B. Consistently
P09529 INHBB Homo sapiens Inhibin beta B chain is annotated two both BP terms:
GO:0046882 negative regulation of follicle-stimulating hormone secretion
GO:0046881 positive regulation of follicle-stimulating hormone secretion

similarily
P08476 INHBA Homo sapiens Inhibin beta A chain
is annotated according pmid:11948405 to
GO:0008285 negative regulation of cell proliferation
But the experiments have been explicitly done with activin A what can not be captured by the current GO annotation (with proteins as objects).

Example 2: Guanylate cyclase
Background: (taken from SP) [GUCY1A2] Has guanylyl cyclase on binding to the beta-1 subunit. Heterodimer of an alpha and a beta chain.

AC P33402
DE RecName: Full=Guanylate cyclase soluble subunit alpha-2;
DE EC=4.6.1.2;
GN Name=GUCY1A2; Synonyms=GUC1A2, GUCSA2;
CC -!- CATALYTIC ACTIVITY: GTP = 3',5'-cyclic GMP + diphosphate.
DR GO; GO:0004383; F:guanylate cyclase activity; TAS:ProtInc.

AC Q02153
DE RecName: Full=Guanylate cyclase soluble subunit beta-1;
DE EC=4.6.1.2;
GN Name=GUCY1B3; Synonyms=GUC1B3, GUCSB3, GUCY1B1;
CC -!- CATALYTIC ACTIVITY: GTP = 3',5'-cyclic GMP + diphosphate.
DR GO; GO:0004383; F:guanylate cyclase activity; TAS:ProtInc.

MF GO:0004383 should be annotated to the complex not its subunits.

Example 3: GO:0071565 nBAF complex
def: A SWI/SNF-type complex that is found in post-mitotic neurons, and in human contains actin and proteins encoded by the ARID1A/BAF250A or ARID1B/BAF250B, SMARCD1/BAF60A, SMARCD3/BAF60C, SMARCA2/BRM/BAF190B, SMARCA4/BRG1/BAF190A, SMARCB1/BAF47, SMARCC1/BAF155, SMARCE1/BAF57, SMARCC2/BAF170, DPF1/BAF45B, DPF3/BAF45C, ACTL6B/BAF53B genes. The nBAF complex along with CREST plays a role regulating the activity of genes essential for dendrite growth.

The cited composition appears wrong to me, as there are several subunits that are probably present in a mutually exclusive fashion, like BRM and BRG1, and BAF60A and BAF60C. Is rather a generic complex then. Useful definition?

Example 4: GO:0046881 VRK3/VHR/ERK complex
def: A ternary complex consisting of VRK3 , VHR (Dusp3), and ERK1 (Mapk3) existing in neuronal cells. and is involved in regulation of the ERK signaling pathway.

Mentioned by Harold (who requested term SF 2958912) No proteins annotated yet, pmid 16845380.

It appears to me a borderline case for a complex at all. There is no purification, but some coip experiments with western-blot read outs well indicating that the three proteins can assemble and act together. (Also I guess that the ERK antibodies are not specific.) To me it is rather VRK3:VHR that is the functional unit (and acts on ERK). QUESTION: Do we want to capture a complex (GO:0046881) like this? Is the view of VRK3:VHR too 'mechansitic'?

Example 5: GO:0070209 ASTRA complex
def:A protein complex that is part of the chromatin remodeling machinery; the acronym stands for ASsembly of Tel, Rvb and Atm-like kinase. In Saccharomyces cerevisiae this complex includes Rvb1p, Rvb2p, Tra1p, Tel2p,Asa1p, Ttilp and Tti2p.

The ASTRA complex was annotated in GO by SGD and GeneDB_Spombe (apparently two groups at two dates, means presumably two people read the same paper etc, which btl is not very efficient).
PMID: 19040720
'Our approach to charting proteomic environments relies upon the sequential use of TAP and mass spectrometry to identify stable protein assemblies. In a typical TAP pull-down experiment, LC-MS/MS analysis identified over 500 proteins containing stoichiometric and transient bona fide protein interactors, along with a large number of background proteins of diverse origin and abundance. To dissect the composition of complexes, we employed a layered data mining approach. First, we sorted out common background proteins and then distinguished proteins specifically enriched in the TAP isolation using semi-quantitative estimates of their abundance (Figure 1).'

It appears to me that in a proteomics approach they performed several rounds of IP with selected baits followed by an statistical analysis to identify complexes. QUESTION: Is this enough to describe a characterized complex? with a function?


Example 6: GO:0005955 calcineurin complex
def.: A heterodimeric calcium ion and calmodulin dependent protein phosphatase composed of catalytic and regulatory subunits; the regulatory subunit is very similar in sequence to calmodulin.

MGI annotated to the complex:
P10417 Apoptosis regulator Bcl-2 Bcl2 PMID:12617961 IDA MGI

Paper title: Calcium-dependent interaction of calcineurin with Bcl-2 in neuronal tissue. Appears to be a wrong annotation to me -or represents a different view on the complex issue. Same problem for GO:0000159 protein phosphatase type 2A complex (annotated P10417 Bcl2 PMID:16717086 IDA MGI).


Example 7: AP1 complex
(Sandra Orchard during discussion): Has at least two forms

5. Suggestions for Quality Control procedures

6. Meetings

Meeting minutes ore on the Discussion page

Protein complexes April 27, 2010

Protein complexes May 12, 2010

AGENDA

Examples of problems for annotating protein complexes

  1. When different forms of the complex have different functions.
    • Example inhibin/activin homo-/heterodimeric complexes.
    • Background: (taken from SP) Inhibins and activins inhibit and activate,respectively, the secretion of follitropin by the pituitary gland..... Inhibins appear to oppose the functions of activins …Inhibin A is a dimer of alpha and beta-A. Inhibin B is a dimer of alpha and beta-B. Activin A is a homodimer of beta-A.Activin B is a homodimer of beta-B. Activin AB is a dimer of beta-A and beta-B.
    • The beta subunits are implicated in different BP depending on the dimeric assembly - what can not be captured unambiguously by the current GO annotation (with proteins as objects). E.g. present annotations could be more precisely assigned to complexes.


PRO protein ontology


GO: protein complex definition

We need to agree on what is a complex, and when complex terms should be created.

  • IntAct: can be purified as an entity, that ideally has a function. (This data does not just come from co-ip which can pick up multiple complexes.) Would not consider a substrate to be a member of a complex, similarly would consider only the relatively stable components.
  • GO:0043234 : protein complex [show def]: Any macromolecular complex composed of two or more polypeptide subunits, which may or may not be identical. Protein complexes may have other associated non-protein prosthetic groups, such as nucleotides, metal ions or other small molecules.
    • This definition is probably too broad and results in too many terms being created.
  • PRO, the Protein Ontology, uses the GO definition.
  • Reactome:

Guidelines for annotation of protein complexes

  1. Should we use the ‘contributes_to’ qualifier?
  2. Should we use IC for MF and BP when a protein is known to be in a complex by IDA? (see Q12789 and others)For example: a complex is identified (RNA pol III) – 3 subunits (for eg.) . No assay is done. We know (from textbooks) that RNA polIII is responsible for ‘transcription from RNA polymerase III promoter ; GO:0006383’

Annotations:

  • protein A – CC: RNA pol III (IDA)
  • protein B – CC: RNA pol III (IDA)
  • protein C – CC: RNA pol III (IDA)
  • MF ?
  • BP ?

Annotations by ISS

Proposed guidelines:

  1. If all subunits are conserved, ISS the CC, MF and BP.
  2. If all ‘core’ subunits are conserved (see RNA pol III for exmaple), and the MF is expected to be essential, ISS the CC, MF and BP.
  3. Qualifiers (contributes to) should be kept when transferring annotations

Action items from May 12 call

  1. Protein complexes DBs to work together to write a description of relative scopes of each project, cross-linking, and how that impacts GO annotations (Assigned to:)
  2. Need to improve the definition of Protein complex (or decide whether it needs improving)
  3. Improve consistency in qualifier usage (contributes_to)
  4. Provide guidelines for col 2/col 17

Protein complexes June 2, 2010

First point to resolve:Towards the annotation of protein complexes

Do we agree that a goal of the GOC is to annotate protein complexes with MF/BP? Yes

  1. If we allow annotation to complexes then we need means to describe the relationship between the annotations relevant to gene products as complex subunits of these complexes?
  2. If so, we need to make rules to annotate individual subunits of complexes as a first step. The same rules should be applicable to annotation of the complexes themselves.

Second point to resolve: Biological processes versus Molecular Functions

How to correctly capture the BP and MF of individual subunits of complexes?

  • Proposal : annotations of PC to Biological processes should be inherited by all individual subunits. Is everyone in agreement with this? Yes
  • what evidence code should that be? No agreement
  • For example: RNaseP : [1] "In archaea, RNase P ribonucleoproteins consist of 4-5 protein subunits that are associated with RNA. As revealed by in vitro reconstitution experiments these protein subunits are individually dispensable for tRNA processing that is essentially mediated by the RNA component"
  • Complex can be annotated to MF : 'ribonuclease P activity' and BP : 'tRNA processing.
  • BP can be propagated to individual subunits using the same evidence code
  • MF cannot be propagated: in the case of the RNase P complex, the proteins would be annotated to MF unknown.
  • Comments? Objections?

Third point to resolve: Molecular Functions: Is 'contributes_to' useful?

Proposal: Drop the 'contributes to' qualifier

  1. annotate to MF for complexes where more than one subunits provides the MF
  2. annotate to MF = unknown for subunits with no known function;
  3. when it is not known which subunit(s) performs the catalytic activity, annotate to all. Update when there is more data (this is how SP annotates EC numbers)

Current problem with the 'contributes_to' qualifier is that is used with different meanings.

  • (a) when an active site is distributed between gene products, like the F1/F(o) ATPase or DNA polymerase and
  • (b) when a gene product contributes_to the function of a catalytic subunit in the same complex (ie, a regulatory subunit, for example), which depletes its usefulness
    1. member_of_complex_having [MF] : when you don't know what role a protein is playing in a complex, for example ribosomal proteins
    2. required_for : complex has MF provided by A, B and C (all necessary) (similar to example 'Guanylate cyclase' from wiki page)
    3. sufficient_for : complex has MF provided by A and B (sufficient, C not required) ; C would be annotated to either 'unknown' or 'member_of_complex_having [MF/BP]'

How should we annotate the molecular function of the complex?

  • A) if reconstitution experiments show all subunits are needed : - do we annotate all to MF = RNA polymerase III activity ? - do we use ‘contributes to?’
  • B) if one subunit is not essential for activity but seems to be a stable part of the complex, - do we annotate to MF = RNA polymerase III activity ? - or do we annotate to ‘NOT’ MF = RNA polymerase III activity ?
  • C) if one subunit has not been tested but is a stable part of the complex, - do we annotate to MF = RNA polymerase III activity by IC?
  • see also above example RNase P
  • One way to help annotation consistency would be to clarify the use of qualifiers, possibly adding some.
  • Given that we don't expect users to be using qualifiers very much, wouldn't it be desirable to remove the 'contributes_to' qualifier?

Fourth point to resolve: structural constituents terms

  • Are structural constituents terms helpful for genes lacking a catalytic activity? for example 'structural constituent of ribosome'.
  • What do we want users to do with this information? This is already captured in 'complexes'.
  • Can we live with some subunits of complexes having no known MF? (remember they have a BP)
  • the 'scaffold' function of actin seems legitimate; however this has been shown. For most of the ribosomal subunits, we dont know which one has a role in the structure of the complex. Or, we can say that any protein in any complex provides structure, and create MF functions for all PC terms.


Fifth point to resolve: Removing protein complexes in MF and BP ontologies?

  • MF: 'homodimer/heterodimer' molecular functions,
  • versus BP: protein oligomerization and children (protein homooligomerization , protein tetramerization, etc)
  • versus CC annotations
    • Example: ALAD Human; "Human PBGS purifies with eight Zn(II) per homo-octamer" PMID: 11032836
    • Proposal: move the term ' protein oligomerization' to cellular component ontology as 'protein oligomer'
  • Those terms represent related entities; so they should be in a single ontology if the ontologies are supposed to be orthogonal. It seems bad ontological practice, and leads to redundant annotations to F/P/C.
  • Do people agree that we try and move all the MF/BP 'components' to CC?

Back to 2010_GO_camp_Meeting_Agenda

Minutes from 2nd June meeting

Peter, Judy, Suzi, Rama, Pascale, Emily, Harold, Diana, Serenella, Cecilia, Chris.

See: http://wiki.geneontology.org/index.php/2010_GO_camp_Annotation_of_complexes_issues#Protein_complexes_June__2.2C_2010

- if we allow annotation to complexes, then we need to describe relationship between complexes and gp subunits, as the main 'currency' of the GO annotation effort is gene products

- Peter: a complex has additional functions - a complex has functions that a single gp cannot carry out. Therefore BP terms probably can be propagated quite easily to subunits, however the experimental evidence for Molecular Function applied to a complex may not be appropriate to propagate to all subunits. The application of MF to subunits might not be appropriate.

- Pascale: therefore need the propagation rules to annotation to subunits. All in agreement

Section: Second point point to resolve: Biological processes versus Molecular Functions

Question: can we automatically propagate a BP term applied to a complex to all subunits?

- Harold: if have an annotation to each subunit, there needs to be a way of ensuring that we're not implying that the subunit can carry out the BP on its own.

- Peter: CDK26 - cell cycle regulation. Some BPs are big - they're often not carried out by one subunit.

- we have this situation currently, where we do not qualify a BP annotation based on whether it is carried out by a gp in its own right, or as a complex member.

- Pascale: where groups can fill in column 16/17 - they can indicate the context of a biological process.

- Suzi: currently if a gp is involved both positive and negative regulation of a process in different complexes we annotate to both.

- Peter: an annotation to a BP indicates a protein does not carry out a process by itself

- Chris: if need to automatically propagate terms: if we know the complex the has a function w/i a process. Then we would annotate to the protein complex. Then we want to make inferences at a gp level - need to indicate the context of the protein complex.

- Peter: there are quiescent subunits in a ribosome, which are only scaffolding subunits.

- Pascale: would users expect the ribosomal subunits to the process of translation?

All agreed that all subunits should be given the BP translation: what is valid for the protein complex should be valid for the subunits

- Diana: need to be careful with the subunit composition of a protein complex

- Pascale: this should be provided by the external protein complex groups who will be generating the IDs

- Rama: would the context information be located in column 16/17?

- Pascale: this needs to be settled by software developer at a later date.

- Serenella: why not use 'Inferred from Complex Membership' evidence code, and complex ID within the 'with' field?

- Pascale: concern how this evidence code should be used.

- Diana: is important to have an indication in the annotation that the GO annotation has been inherited from the complex -to distinguish it from having been tested directly on the subunit

- Peter: experimental evidence should be provided to the annotation of the protein complex - rather than the subunit that inherits the annotation.

- Pascale: however, if we use IPI or IC-evidenced annotations, then we lose information on the experimental evidence.

- Peter: propagation of process terms down to subunits - can be done reliably. However need to determine an evidence code.

- Rama: has used the 'IC' code to describe the biological process of the subunits; where experimental evidence from the protein complex.

- Emily: however, this should not be a curator judgement call. The composition of a protein complex has been identified by the external PC group, and the activity of the complex has been experimentally identified.

- Emily and Peter: we have a distinct reasoning process to describe relationship between a GO term and the protein complex subunits. In favour of ICM, and if we are concerned with variations in exp vs. non-exp then we could have subtypes of this evidence code.

- Diana: we should restrict it so that there always needs to be experimental proven evidence

All in agreement.

- Emily: can ICM-evidenced annotations be transferred between complex subunits by ISS?

- Pascale: should be able to propagate this information to other species.

- Peter: when annotations made by inference - we need to provide a full description describing the information trace, to make reasoning path clear.

- Peter: and have some concerns about the validity of such ISS statements - if the protein complex is fluid and not conserved.

- Diana: e.g. condesins - have one conserved subunit and one not so conserved. Therefore there would worrying if curators did not bare this in mind when making ISS statements

- Emily: should orthology of complexes be something that should be supported by the external protein complex group? It may not be appropriate for the curators to make this assessment on orthology data of individual subunits.

- Summary: ICM annotations can be transferred by ISS - with the caution that curators need to determine that the complex subunits are sufficiently conserved between species.