Annotation Conf. Call 2016-07-11
- 1 Bluejeans URL: https://bluejeans.com/993661940
- 2 Agenda
- 3 Minutes
Bluejeans URL: https://bluejeans.com/993661940
Next GOC Meeting - USC, Los Angeles, CA, November 4-6, 2016
Annotation Consistency Exercise for 2016-07-26
- PomBase is next up on the rota
- Future dates:
- August 23 - SGD
- September 27 - dictyBase
- October 25 - RGD?
- November 22 - Zfin
Revised Protein Binding Doucmentation
- On the 2016-06-28 call, we discussed how each group currently annotates protein binding experiments as it was pointed out that the current documentation does not likely reflect universal practice, specifically wrt the issue of the direct or indirect nature of the interactions captured using 'protein binding' (GO:0005515) or its children.
Current Documentation: The 'with' column (8) and the annotation extension column (16) should be used only for direct interactions and only when the binding relationship is not already included in the GO term and/or definition. See "column 16 documentation for relationship types to use when adding IDs in the annotation extension column (16).
- We surveyed curators on the call and found that there are differences in how groups use interaction experiments for GO annotation.
- We also discussed whether we are comfortable with having differences or should try to adhere to a common practice; generally, people felt it was okay to have some differences here, but we need to reflect that in the documentation.
- Here is a draft of an update to the binding section of our curation documentation. Let's discuss if this accurately reflects what we do and why, and then make changes, if needed, and update the documentation.
Proposed New Guideline: The Molecular Function (MF) ontology can be used to capture macromolecular interactions, such as protein- protein, protein-nucleic acid, protein-lipid interactions, etc. While GO annotations are not considered to be a repository of all protein-protein interactions, many gene products are annotated to 'protein binding' (GO:0005515) or one of its child terms. In making these annotations, contributing groups may follow slightly different practices with respect to the types of experimental evidence used to support these inferences, e.g. some groups may use co-immunoprecipitation as supporting evidence for a protein binding annotation between two gene products, others not. However, all groups generally adhere to the principle that, when annotated, protein binding interactions inform what is believed to be the normal biological role of a gene product, i.e. the protein-protein interactions support an author's hypothesis about how the gene product is thought to execute its molecular function in the context of a normal biological process. Protein-protein interactions for which there is not yet sufficient biological context are discouraged as sources of GO MF annotations.
- We also discussed, on the last conference call, the criteria by which protein binding annotations from IntAct are exported to GO. A response from Sandra Orchard is on the 2016-06-28 minutes.
- A summary:
* Only experimental data is used for making the decision to export the protein pair to UniProtKB/GOA as a true binary interacting pair * The export decision is always based on at least two pieces of experimental data. A single evidence cannot score highly enough to trigger an export * An export cannot be triggered if the protein pair only ever co-occurs in larger complexes, there must be at least one evidence that the proteins are probably in physical contact.
Questions about Membrane Cellular Component Annotations
- The UCL group would like clarification and guidelines on how curators should annotate the various membrane and child terms that describe the extent to which a gene product is contained within a membrane.
- Here is a representative branch of the CC ontology wrt these types of terms:
- membrane part
- [isa]intrinsic component of membrane
- membrane part
Definition: The component of a membrane consisting of the gene products having some covalently attached portion, for example part of a peptide sequence or some other covalently attached group such as a GPI anchor, which spans or is embedded in one or both leaflets of the membrane. Source: GOC:mah Comment: Note that proteins intrinsic to membranes cannot be removed without disrupting the membrane, e.g. by detergent.
- [isa]integral component of membrane
Definition: The component of a membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. Source: GOC:go_curators, GOC:dos
- [isa]anchored component of membrane
Definition: The component of a membrane consisting of the gene products that are tethered to the membrane only by a covalently attached anchor, such as a lipid group that is embedded in the membrane. Gene products with peptide sequences that are embedded in the membrane are excluded from this grouping. Source: GOC:dos, GOC:mah
- [isa]extrinsic component of membrane
Definition: The component of a membrane consisting of gene products and protein complexes that are loosely bound to one of its surfaces, but not integrated into the hydrophobic region. Source: GOC:dos, GOC:mah, GOC:jl Comment: Note that proteins extrinsic to membranes can be removed by treatments that do not disrupt the membrane, such as salt solutions.
- Examples from the literature:
EXAMPLE 1: PMID:18502731 What annotations for VGAT and VGLUT2? Summary of methods in the paper:
Electron microscopy shows synaptic vesicle localisation (Figure 4) Immunolocalization supports the localisation of VGAT and VGLUT2 to synaptic vesicles
The curator knows that these proteins have transmembrane domains Would you annotate to integral component of synaptic vesicle membrane ; GO:0030285 | IDA Or synaptic vesicle ; GO:0008021 | IDA integral component of synaptic vesicle membrane ; GO:0030285 | IC from GO:0008021 (NB: The IC doesn’t show the full-picture because the membrane domains/anchors are author knowledge so often curated as a NAS/TAS which can’t be included in the with statement for an IC annotation).
Example 2: PMID:17110340 integral component of synaptic vesicle membrane ; GO:0030285 or anchored component of synaptic vesicle membrane ; GO:0098993
Summary of methods in the paper: The protein composition of purified synaptic vesicles (SVs) was analysed by Mass spectrometry (MS) and 1D SDS-PAGE, and 410 proteins were unambiguously identified Proteins are classified as: Copurifying with SVs Ubiquitously distributed on subcellular membranes (i.e present on SVs but not enriched relative to other fractions). Western blots were used to quantitate the levels of SV proteins. Three different electron microscopy (EM) procedures imaged surface proteins, and show the surface of SVs to be covered with proteins, but doesn’t identify individual proteins They model the SV (Figure 4) to show transmembrane domains of proteins and anchored proteins- some of these are known to me membrane proteins by their previous structure (e.g. they are known ion channels)
From this paper, would you annotate to: integral component of synaptic vesicle membrane ; GO:0030285 | IDA anchored component of synaptic vesicle membrane ; GO:0098993 | IDA (e.g. incorporating author say-so/previous knowledge of the protein domains into the IDA evidence code) Or synaptic vesicle membrane ; GO:0030672 | IDA integral component of synaptic vesicle membrane ; GO:0030285 | IC from GO:0030672 anchored component of synaptic vesicle membrane ; GO:0098993 | IC from GO:0030672 (NB: The IC doesn’t show the full-picture because the membrane domains/anchors are author knowledge so often curated as a NAS/TAS which can’t be included in the with statement for an IC annotation).
- On call: Alice, David H., Elena, Giulia, Kimberly, Melanie, Rebecca, Ruth, Sabrina, Stacia
Annotation Consistency Exercises
- PomBase is up in two weeks - Val was notified
- August is SGD - okay with them, but we need to make sure we'll have enough people around in August to make this worthwhile. So far, we're okay, but will continue to check on the next two calls.
- September is dictyBase - Petra was contacted
- October is RGD - Stan was contacted
- November is Zfin - okay with Sabrina
Protein Binding Documentation
- Following up on the protein binding documentation discussion, we reviewed the draft of new guidelines/explanation of how the GOC approaches annotations to protein binding (GO:0005515) and its children.
- The new documentation seems okay with everyone, so we will go ahead and update the section on binding guidelines on the website.
- Ruth also suggested we review how groups are annotating to protein complex terms using the IPI evidence code, specifically to see how curators populate the With/From field, and make sure we are being consistent here.
- For protein complex annotations, does the With/From field list every member of the complex, pipe-separated, or only, for example, the tagged member of the complex used to identify multiple other members?
- We will need to survey groups about this and then take it from there.
Protein Binding Documentation - IntAct Protein Binding Annotations
- Melanie followed up with Sandra Orchard to get more information on the criteria by which IntAct's protein-protein interactions are incorporated into the GO
- IntAct uses a scoring system that requires high confidence, experimental data to allow export
- We will add the explanation of this pipeline to the GO website's annotation FAQs
Questions about Membrane Cellular Component Annotations
- Summary from Rebecca:
- Annotating membrane localization GO terms, when part of the evidence is experimental, and part comes from author/curator knowledge or inference from domain/sequence information.
- For example, it is tricky to annotate (using a single GO evidence code) to terms such as:
integral component of synaptic vesicle membrane ; GO:0030285 anchored component of synaptic vesicle membrane ; GO:0098993
- In these cases, often ‘synaptic vesicle’ localization is shown experimentally, and the protein is inferred to be integral to the membrane based on the presence of TM domains.
- We discussed the following options for handling these:
1. Allowing IDA for the full ‘integral to synaptic vesicle membrane’ term. Pros: Annotations can be transferred easily between species by ISS. Cons: Not traceable that the assay only shows SV localization, and the rest comes from author/curator knowledge.
2. Extending IC annotations to allow IEA/NAS/TAS-supported annotations in the ‘with’ field. Pros: All supporting evidence for the annotation can be captured. Cons: IC annotations can’t be transferred between species easily.
3. Requesting and using a new ECO code (possibly under ‘combinatorial evidence ; ECO_0000212’). Pros: It would more accurately capture multiple evidence. Cons: ECO codes aren’t supported in GAF files. For groups using GAF (rather than GPAD) format, the new evidence code would have to be mapped to an existing GO evidence code (IC).
4. Use LEGO annotation Pros: Individual evidence is captured to build up the bigger picture. Cons: Not everyone is using LEGO yet, and information would currently be lost when annotations are transferred into GAF format.
- The consensus on the call was to investigate Option 2 (expanding IC) and Option 3 further (using a new ECO code for combined evidence).
- Proposal 1: Proposal for a new ECO code:
- The new ECO code would most likely be under ECO_0000212 or ECO_0000244:
combinatorial evidence ; ECO_0000212 —[isa]combinatorial evidence used in manual assertion ; ECO_0000244
combinatorial evidence ; ECO_0000212 A key aspect of this type of evidence is that two or more pieces of information are combined to generate an emergent type of evidence not possible with the constituent pieces of evidence alone. A combinatorial analysis typically involves incorporation of different types of evidence. http://purl.obolibrary.org/obo/ECO_0000212
combinatorial evidence used in manual assertion ; ECO_0000244 Combinatorial analyses could include experimental or computational results. Examples include: (i) large-scale experiment such as a genome-wide two-hybrid or genome-wide synthetic interactions; (ii) integration of large-scale data sets of various types; and (iii) text-based-computation, e.g. text-mining. For simple sequence comparisons, one should use the sequence similarity analysis evidence type. For microarray results alone, expression pattern analysis is appropriate; whereas, large-scale computational analysis should be used when microarray results are combined with the results of other types of large-scale experiments. http://purl.obolibrary.org /obo/ECO_0000244
- Proposal 2: Expanding the use of IC
- At the call, it was ALSO proposed that the IC evidence code be expanded to allow non-experimental/ISS annotations including TAS, NAS and IEA.
- Further Points that Need Discussion:
1. If we expand IC to allow TAS, NAS and IEA in the supporting ‘with’ field, does this mean a new ECO code isn’t required?
2. If we expand IC to allow TAS, NAS and IEA in the supporting ‘with’ field, are we diverging the use of IC too much?
3. If we expand IC to allow IEA in the supporting ‘with’ field, we would likely need a new GO_REF to add these under.
4. We need to ensure that a new evidence code isn’t just duplicating ‘Inferred from curator/IC’. How is the new code distinct from IC (especially IC with and GO_REF36/109)? Based on the proposal below, it could be a subtype of IC.
5. Should a new ECO code replace combinatorial IC annotations?
6. If we decide to use an ECO code, do we need a new ECO code entirely (see proposal below), or could we use ECO_0000212 or ECO_0000244 as they are?
7. Should a new ECO code be limited so that at least one piece of evidence needs to be experimental? Or are two or more electronic/author-statemented annotations sufficient (e.g. IEA+IEA, IEA+ISS).
8. Should we allow the new ECO code to be transferred between species by ISS. Note that this may result in IEA/NAS/TAS/ISS annotations essentially being transferred by ISS.
9. Will the new ECO code map to IC in GAF files, and will the ‘with’ evidence be maintained?
10. What reference would the annotations using the new ECO code be attributed to?
- Proposals for a new ECO code (wording TBD):
combinatorial evidence ; ECO_0000212 —[isa]combinatorial evidence used in manual assertion ; ECO_0000244 ------[isa]combinatorial evidence used in curator manual assertion ; ECO:NEW
curator inference ; ECO_0000205 --[isa]curator inference used in manual assertion ; ECO_0000305 (IC) ----[isa]combinatorial evidence used in curator manual assertion ; ECO:NEW
- The ECO:NEW could be retrofitted to any IC annotations using GO_REF_36. And any IC annotations with multiple GO IDs in the with field.
- The ECO:NEW would map-up to IC in GAF files.
- The ECO:NEW would accept experimental, NAS, TAS and IEA evidence. This means IC would have to be expanded similarly.