Protein Complexes: who, what, where
- Many groups now create/represent/annotate protein:protein complexes as a component of representing biological knowledge
- Protein complexes exist in several resources resulting in a situation where the same 'thing' or 'entity' is represented in different places with different IDs and different names.
- This meeting brings together several of the groups representing/creating protein complex entities with the goal of
- Decide which groups create and provide ID namespace for which protein complex entities
- Decide the relation between the protein complex entities
- Decide on the exchange of information about protein complexes between groups.
The First Meeting
- Date: July 11, 2010
- Time: 11 am - 2 pm (coffee, lunch provided)
- Place: Boston Back Bay Hilton. 40 Dalton St. Boston MA 02115
- Phone Number1-617-867-6158
- Judy's Number1-207-460-7918
|Judy Blake||GOC, PRO, MGI||Jax|
|Harold Drabkin||GOC, MGI, and PRO||Jax|
|Alan Ruttenberg||Semantic Web||Science Commons|
|Michael Ashburner||GOC||U Cambridge|
|Cathy Wu||PRO||Univ Del|
|Michelle Giglio||Univ Maryland|
|Pascale Gaudet||GOC and Dictybase||Northwestern|
Other Groups Representing Complexes but not at this Meeting
|CORUM" Comprehensive Resource of Mammalian Protein Complexes||MIPS|||
All of us- Define protein complex, especially in regards to distinguishing permanent vs transient complexes.
PRO, Alan Ruttenberg, Chris Mungall- PRO works with Allen and Chris to convert a subset of PBI-XML into OBO. PRO builds an ontological framework over intact complexes as a test case.
GO- review completeness of generic GO protein complexes.
GO- Consider having PRO create generic GO complexes. Consider how this impacts people using GAF files.
Peter D'Eustachio, Reactome- Add attributes that Intact needs.
Sandra Orchard- provide Intact curation manual to the group.
Alan Ruttenberg, Sandra Orchard- AR collaborate with Intact to create stable URI's for people using OWL and semantic web tools.
Minutes of Discussion
Judy Blake (JB)- here as facilitator. Lots of groups wanting to represent protein complexes… there is overlap, causes confusion and there is way too much to do. We should coordinate.
Michael Ashburner (MA) - We are not talking about annotation issues. AGREED.
Judy Blake's Talk
GO representation of protein complexes slide
What are GO truly representing? There is some debate. Is cellular location included? Are complexes defined by function or by components?
AI for GO- Define complex.
example of SPT complex in mousecyc slide
Peter D'Eustachio (PD) asks about isoforms and connection to function. GO only covers aspects of this (ie. not capturing isoforms
A protein complex example from PRO
Has precise semantics, converts to Owl, indicate stoichometry
Alan Ruttenberg (AR)- Protein subunits come and go for some complex. Does PRO represent this as 1 or 2?
JB -Pro represents 2. AR- good
Complex-isoform parentage AR concerned with.
Respiratory complex slide
Multiple genes, multiple isoforms, multiple species
PD- explosion of complexes. Work in progress at Reactome.
Pascale Gaudet (PG)- What do we annotate to? There are at least two PRO ids for a protein complex, the subunit and the complex.
Outside of scope but interest.
Rolf Apweiler (RA) 30-fold more terms than associations in Uniprot. Need numbers and mappings to determine effort in mappings for complexes. Uniprot IDs won’t change. Need numbers to have a proper discussion. Next Sept this will be a focus at meeting.
Cathy Wu (CW)- Pro wants to attach to subrecords in Uniprot. Working on the associations.
RA- Uniprot not an ontology. One gene from one species = one uniprot record. There are sub-identifiers for variant chains.
AR- Within the pro can we use the uniprot ID to represent all protein products from one gene within a species.
PG- concerned about functional complexes. Can we have generic complexes?
Judys questions slide
How will reactome use these complexes?
Moving GO complexes to PRO?
Interaction complex represented in Intact?
CM & AR- moving ontology is misnomer. IDs can be retained, PRO takes over, but invisible to end user.
Intact & IMEX
Sandra will talk about that.
AR- duplication is not desirable but is okay if we know, and coordinate.
MA- historical perspective for GO. Nucleus is a complex. Ribosome is a complex. But now it gets difficult. Empirical and pragmatic. Agrees ownership is not an issue. Curators need an ID. Annotation issue came up again.
Sandra Orchard (SO) talk-
PSI- molecular interaction format. Long history, used by many groups. Have controlled vocabulary, tool support, etc
Users of protein complexes data
Annotator, interaction databases, affinity purification groups, structural modelers, phylogeneticists,
Interactors-proteins, lipid, nucleic acid, etc
Sequence associated with
Linking to experimental evidence- inferred vs. proven
How IntAct is approaching this slide
Creating complexes for MOD--- IDs number in the 10,000’s
JB- they are very complex and difficult to work with
Giving accession number, uniprot-short name (MA doesn’t like), systemic long name, synonyms, function free def, interactors-linked to UNIPROT. Chebi, INSDC also used.
JB- isoforms? If known they will link to the feature chain id in Uniprot.
AR- include topology id if complex is crystallized? Yes.
Complex variants given separate IDs if good evidence for it.
Intact complexes slide
-cross reference to GO, enzyme activity annotated to the complex (GO people are happy with this)
- Reactome for human only, sometimes 1 to many because reactome creates multiple IDs for the same complex with different sub-cellular localization.
all these references have types, not just a flat list. Identical object codes, review vs primary from pubmed for instance.
experimental evidence- manually curated,
enable searches of evidence of components
HT data when complex is shown to exist in related organisms
Only annotate components required to make complex (ie phosphorylation of component needed to make complex is included but subsequent phosphorylation events not captured.
Intact complex2 slide
A stable set of 2 or more interacting protein complexes which can be co-purified
AI- need definition of STABLE! Intact has good ideas on this.
What is not a complex= enzyme/ substrate.
What should not be complex- pulldowns
AR- complex be part of bigger complex? Yes. Are building blocks complexes? No unless has own function.
Variant complexes are created in Intact
Need for experimental data slide
Intact links to several hundred Go terms.
"What else is out there?" slide
MIPS yeast source- No longer maintained since 2006. 300 complexes
MIPS COrum- human mouse. 1400 not supported since 2009.
JB- GO has mined this. AB- sometimes two ids for same complex. JB-aware of this and screened this out.
Wodak yeast complexes- Computationally derived.
Poorly defined complexes-
Many complexes like ribosome, proteosome have redundant subunits. Need to create a new category “composite complexes” showing all possible complexes. Add as variants when have experimental evidence.
How to handle this collaboratively.
Complexes need to be available in PXI_MI
XML bit can be handled by IMEx
-agree to share curation load
-Swiss-port and GOA curators will move to annotating interaction data in Intact. Intact then exports high-quality data binary data
Alan Bridge (AB)- Swiss-prot committed to this process.
RA- GO complex ID, Intact ID, details
Complex at intact. Exported to IMEx. Export to UniprotKB, GO, MODs
End of talk. Lunch
1. generic-protein complex- species neutral , ex. RNA polymerase.
AR- some generic is based on function, some on structure. There can be generic children… mammalian RNA polymerase.
2. -specific complex- Post translational Modification, isoforms
- splice isoform
AR- no cost in making more specific classes
PD- GO doesn’t want to be creating lists. GO doesn’t do instances. Avoid sensu terms but can create a class based on a shared function ie. Ribosome has a 3-component class, 4-component class.
PD- Division of labor. Essential question is a component of a complex PTM, or is the complex PTM.
AR- Need stable IDs for specific forms.
PG- Intact IDs are not ontology. They represent a class.
AR- variant confers a phenotype. Uniprot doesn’t capture alleles. RA disagrees. Discussion about this involving variants
JB- Need to focus on division of labor. Where can we be complementary?
Suzanna Lewis (SL)- sees gaps, and sees overlaps.
CW- ontological framework needed to coordinate
RA- GO does generic protein complexes. Uniprot provides ID levels. Species-specific complex from IMex can be mapped.
JB- Use PRO as a framework for other complexes.
AR- has concerns about the one to many complexes that intact does with reactome. #1 exactly equivalent. #2 sub_class in another class. Broader concern about each DB doesn’t care about certain things. Don’t box us; outside users may have other needs.
PD- Use ontology to provide spidering over of these databases
JB- Need to compute over term logically. Uniprot and Intact don’t provide this
RA- Uniprot and intact have institutional funding. PRO needs to be well established for them to change their methods.
JB- Not asking to give up anything. Lets be aware of what everyone else is doing.
SO- we can provide the complex any way you want
CJM and AR- we can convert PSI-XML into OBO.
AR- Also involved in semantic web. Translation for other formats needed.
RA- Unaware of some of intacts needs. Reactome needs to add attributes. Also concerned about non-human complexes. Reactome has some non-human because data is needed to verify human complexes.
JB-proposes PRO imports parts of Intact, and see if it satisfy all of each others needs.
- Discussion slides from GO Annotation Camp Discussion
- Link to GO working group discussion on Annotation of Complexes