2016 Los Angeles GOC Meeting Agenda: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 135: Line 135:
===Annotations from High Throughput Experiments===
===Annotations from High Throughput Experiments===


*Annotations from high-throughput experiments (Ruth, David Hill, Kimberly)
*Annotations from high-throughput experiments (Pascale, Ruth, David Hill, Kimberly)
*AI: discuss proposal for defining HTP data (Pascale, 20 min)
**How do we decide when to make annotations from high-throughput experiments?
**How do we decide when to make annotations from high-throughput experiments?
**If we decide that annotations from high-throughput experiments should be removed, what are the procedures (all annotations, some annotations)?
**If we decide that annotations from high-throughput experiments should be removed, what are the procedures (all annotations, some annotations)?
**Do we want new evidence codes to indicate that the annotation was inferred from a high-throughput experiment?
**Do we want new evidence codes to indicate that the annotation was inferred from a high-throughput experiment?
===Qualifiers that describe relationship of gene product to a biological process===
These are used in LEGO as well
** involved_in = causally_upstream_of_or_within (default?)
** causally_upstream_of
** part_of


== Annotation Metrics ==
== Annotation Metrics ==

Revision as of 18:05, 3 November 2016

Day 1

Overview/Plan for Upcoming Five Years

  • GO PIs presentation
    • Update on AGR

Informatics and Infrastructure 5 year plan

Update on various changes (Chris)

Proposal

  • Switch to global monthly releases (cjm)
    • Still provide daily snapshots

Lunch 11:45 - 1:00pm

Updates

  • AmiGO and web site action items (Seth)
    • AI: Mechanism to remove redundant TAS/ISS/IEA etc annotations that are covered by experimental annotations
    • AI: GOC to extract this data and display annual stats on web page
      • See http://amigo.geneontology.org/amigo/base_statistics
      • AI: Include stats for % genome annotated
      • AI: Include GOC extract ontology stats
      • IEA annotations should be broken out by reference (this is the method used), and converted to a provider, e.g. GO_REF:0000019 is Ensembl Compara.
    • AI: Val give presentation on term matrix
  • Update on UniProt GCRP sets (Maria)
  • Update on gpi specifications and uses (please list specific items) (Kimberly, Chris) - 10-15 minutes
  • Ontology Group Update (DavidH)
      • Special Projects
    • Cilia
      • Autophagy
      • Apoptosis
      • Plant Enzymes
      • Synapse (DOS/PDT)
    • GO help report

Breakout session to finish proposal development

The following are concurrent

  • Flowchart guidelines for transcription factor annotations [10 min Rachael/Ruth/Barbara] presentation. To improve consistency UCL team have created an annotation flowchart which is being circulated to GOC members.
  • PI UniProt discussion
  • GAF/GPAD inference from LEGO models (Jim and David OS, Chris, Kimberkeley, David Hill, ...)
    • Introduction to inferring annotations from LEGO: Extended Gene Product to GO term relations; Reasoning across causal chains.
      • Jim Balhoff: Inference using Blazegraph & RDFox
      • DOS: Templates, design patterns and inference.
  • Community annotation web presence (Val, Ruth)

Reports from breakout groups

Day 2

Annotation Issues - LEGO Annotations

Aligning Conventional and LEGO Annotations

  • A proposal to make Conventional Annotation align better with LEGO modelling (F-P linking) (Val)
https://github.com/geneontology/go-ontology/issues/12739#issuecomment-254623691

Evidence codes in Noctua

  • How are we going to handle ECO codes in Noctua. Currently there are only a limited number of codes that fall under 'used in manual assertion'. If we use codes that are not specific to the manual assertion part of the ontology, then they map to EXP. Are we going to request the entire set of codes that we think we might want to use or are we going to have an automated way to map to the correct code?

Example: http://noctua.berkeleybop.org/editor/graph/gomodel:5745387b00001874

Generating conventional annotations from Noctua models

  • MGI's experience roundtripping with Noctua Models (DavidH)
  • Are we going to allow Noctua to generate conventional annotations to the root nodes of the ontology?
    • This would be useful for contextual annotations that are to otherwise root nodes.
    • However some groups block these kinds of annotations because in the past, these annotations were used to keep track of genes about which we had no information.
    • Note that the evidence code for a root node annotation in Noctua would/could be different in that the curator might assert that a gene product has some molecular function due to the observation that, when mutated, there is a phenotypic outcome, e.g. apoptosis execution fails.
    • This is a different statement from no biological data (ND) in which there is no information at all to assert a role in any biological process.
  • Are some conventional annotation rules inappropriate for Noctua annotation?
    • For a molecular function occurring in a cellular location, isn't IEP a more appropriate evidence code? IDA would mean that the function was assayed in situ. https://github.com/geneontology/go-annotation/issues/1395
    • Since binding is a part of many molecular functions, should we allow evidence codes other than IPI for binding (eg TAS)?

Regulation relations

Regulation and causal relations are central to LEGO annotation and to inference based on LEGO models, but definitions and guidelines still need work to ensure consistency and clarity. DOS: I would like to present progress on the development of the relevant relations along with a proposal for how to improve them. This would probably work best as a collaborative presentation with LEGO annotators where we can show application to LEGO models.

Annotation Issues - Conventional Annotations

Modified Protein Binding

  • Modified protein binding: GO terms & annotations are very inconsistent. (DavidH to present Paola's proposal)
    • Recent github issues:
glycoprotein binding: https://github.com/geneontology/go-ontology/issues/12580#issuecomment-240782020
ubiquitinated protein binding https://github.com/geneontology/go-ontology/issues/12582#issuecomment-240452320

Protein Family Terms in the Ontology

  • Protein families in terms (DavidH)
    • Currently the inclusion of protein family information in term names is leading to inconsistent annotation.
      • For now, the ontology editors have not been adding terms that specifically refer to protein families with the exception of signaling pathways. Should we make this a rule? If so, how will we capture the detail desired by annotators and how will we make this backward compatible?
https://github.com/geneontology/go-ontology/issues/12440

Multiple Evidences to Support an Inference

  • How are people capturing data where both direct assay AND protein motif/domains/sequence needs to be used by the curator to provide the annotation? [15 min Ruth, started by Rebecca] presentation A system needs to be in place to enable the more specific annotations to be created for orthologous proteins (which cannot be done across all species with the IC evidence code)
    • eg transmembrane domain used as evidence to create the annotation 'integral to membrane' with IEA evidence; immunofluorescence localises protein to 'plasma membrane' (annotated with IDA evidence), ideal annotation to be created 'integral to plasma membrane'
    • 3 obvious options (any others?)
      • new evidence code IDD 'inferred by direct assay AND protein domain(sequence/motif?)' (would probably also want IMD, IGD, IED)
        • Note that ECO has a combinatorial evidence code that could possibly be used as the parent for new GO combinatorial codes:
          • combinatorial evidence used in manual assertion - ECO:0000244
      • no new evidence code requires as this is implied by the 'inferred' aspect of the evidence code as well as 'author intent'
      • Create a GOC pipeline that creates the CC annotations based on the IDA annotation (eg plasma membrane) and the IEA information (eg integral to membrane) to create the more specific annotation (eg integral to plasma membrane).


Consistent Classification of Signaling Pathway Terms

  • Conventions for signalling pathway terms
    • Currently you can request signalling pathway terms along multiple axes of classification including:
      • signalling module (MAPK cascade, GTPase etc)
      • process regulated
      • target TF's
      • ligand /pheromone activating pathway
      • Process regulated
      • condition activating pathway (in response to hydrogen peroxide and other oxidants for oxidative stress pathway)

This results in almost infinite number of ways to describe some pathways

https://github.com/geneontology/go-ontology/issues/12701

Annotations from High Throughput Experiments

  • Annotations from high-throughput experiments (Pascale, Ruth, David Hill, Kimberly)
  • AI: discuss proposal for defining HTP data (Pascale, 20 min)
    • How do we decide when to make annotations from high-throughput experiments?
    • If we decide that annotations from high-throughput experiments should be removed, what are the procedures (all annotations, some annotations)?
    • Do we want new evidence codes to indicate that the annotation was inferred from a high-throughput experiment?

Qualifiers that describe relationship of gene product to a biological process

These are used in LEGO as well

    • involved_in = causally_upstream_of_or_within (default?)
    • causally_upstream_of
    • part_of

Annotation Metrics

Estimated time: 1 hour

  • What are the optimal metrics to assess progress in GO annotation?
    • Number of annotations
    • Number of references
      • Recall ZFIN's 'paper complexity' measure as a way of normalizing for different paper content (Doug mentioned in Geneva)
    • Revised annotations, e.g. updating to a new term
    • Removing annotations, e.g. improving knowledge about how a gene product affects a downstream process
    • Adding appropriate contextual information to existing annotations
    • Percentage of genome annotated vs percentage of genome with annotatable information?
  • How does LEGO modeling change our assessment of a curator's contributions?
  • Multiple funding bodies (Ruth)
  • Distinguishing annotations that are created automatically, e.g. inference pipelines (Tony)
  • Individual curators attribution via Orcid IDs, it is important to establish if this wanted, and if it is wanted at what level of information? At an annotation by annotation level or just as a summation of contribution.

Conference Calls and Communication

Estimated time: 30 minutes

  • Discuss different options for reducing the number of conference calls, while still facilitating effective communication between the different GO groups, e.g. annotators, ontology editors, software team
    • Consolidate all annotation calls (Monday LEGO, Tuesday Annotation, Tuesday PAINT) into one Tuesday annotation call, frequency TBD
    • Consolidate LEGO, Annotation, PAINT, and Ontology Development calls into one weekly GO call
  • Discussion on the design of new SOPs for mechanisms of communication
    • What is the best mechanism to alert annotation groups of changes to the ontology that will affect annotations? We have started a table of contacts, but is this how annotation groups would like to proceed?
    • Review of github repositories, what to record where, who is processing/clearing tickets, etc.
  • Discussion on what it means to be a member of the Gene Ontology Consortium, not just the NHGRI grant.
    • Agreed to standards, which ones?