2016 Los Angeles GOC Meeting Agenda

From GO Wiki
Jump to navigation Jump to search

Minutes

Day 1

9am Welcome, schedule and logistics (Paul)

https://docs.google.com/document/d/1MAnnOfs-e2LY9MnqdCZscalbxbNUDSJ9pbMZ-f2WS9U/edit?usp=sharing

https://drive.google.com/drive/folders/0B8kRPmmvPJU3dFhhcWhTSmlUcDA

link to folder on Google drive: bit.ly/geneont-drive

Overview/Plan for Upcoming Five Years

  • GO PIs presentation (Paul T)

Coffee break 10am-ish

Informatics and Infrastructure 5 year plan (Chris, LBNL)

Update on various changes

Proposal

  • Switch to global monthly releases, but still provide daily snapshots

Lunch 11:45 - 1:00pm (on our own, must vacate room for seminar happening then)

Updates

AmiGO and web site action items (Seth)

Update on UniProt GCRP sets (Maria)

Update on gpi specifications and uses (Kimberly, Chris, 10-15 min)

  • items?

Ontology Group Update (DavidH)

  • Special Projects
    • Cilia
    • Autophagy
    • Apoptosis
    • Plant Enzymes
    • Synapse (DOS/PDT)
  • GO help report

Alliance of Genome Resources Update (Judy, Paul S)

Coffee break 3pm

Breakout sessions to finalize development of various proposals

The following are concurrent

  • Flowchart guidelines for transcription factor annotations [10 min Rachael/Ruth/Barbara] presentation. To improve consistency UCL team have created an annotation flowchart which is being circulated to GOC members.
  • PI +UniProt discussion
  • Community annotation web presence (Val, Ruth)

Day 2-3

Annotation Issues - Conventional Annotations

Modified Protein Binding

  • Modified protein binding: GO terms & annotations are very inconsistent. (DavidH to present Paola's proposal)
    • Recent github issues:
glycoprotein binding: https://github.com/geneontology/go-ontology/issues/12580#issuecomment-240782020
ubiquitinated protein binding https://github.com/geneontology/go-ontology/issues/12582#issuecomment-240452320

Protein Family Terms in the Ontology

  • Protein families in terms (DavidH)
    • Currently the inclusion of protein family information in term names is leading to inconsistent annotation.
      • For now, the ontology editors have not been adding terms that specifically refer to protein families with the exception of signaling pathways. Should we make this a rule? If so, how will we capture the detail desired by annotators and how will we make this backward compatible?
https://github.com/geneontology/go-ontology/issues/12440

Multiple Evidences to Support an Inference

  • How are people capturing data where both direct assay AND protein motif/domains/sequence needs to be used by the curator to provide the annotation? [15 min Ruth, started by Rebecca] presentation A system needs to be in place to enable the more specific annotations to be created for orthologous proteins (which cannot be done across all species with the IC evidence code)
    • eg transmembrane domain used as evidence to create the annotation 'integral to membrane' with IEA evidence; immunofluorescence localises protein to 'plasma membrane' (annotated with IDA evidence), ideal annotation to be created 'integral to plasma membrane'
    • 3 obvious options (any others?)
      • new evidence code IDD 'inferred by direct assay AND protein domain(sequence/motif?)' (would probably also want IMD, IGD, IED)
        • Note that ECO has a combinatorial evidence code that could possibly be used as the parent for new GO combinatorial codes:
          • combinatorial evidence used in manual assertion - ECO:0000244
      • no new evidence code requires as this is implied by the 'inferred' aspect of the evidence code as well as 'author intent'
      • Create a GOC pipeline that creates the CC annotations based on the IDA annotation (eg plasma membrane) and the IEA information (eg integral to membrane) to create the more specific annotation (eg integral to plasma membrane).


Consistent Classification of Signaling Pathway Terms

  • Conventions for signalling pathway terms
    • Currently you can request signalling pathway terms along multiple axes of classification including:
      • signalling module (MAPK cascade, GTPase etc)
      • process regulated
      • target TF's
      • ligand /pheromone activating pathway
      • Process regulated
      • condition activating pathway (in response to hydrogen peroxide and other oxidants for oxidative stress pathway)

This results in almost infinite number of ways to describe some pathways

https://github.com/geneontology/go-ontology/issues/12701

Annotations from High Throughput Experiments

  • Annotations from high-throughput experiments (Pascale, Ruth, David Hill, Kimberly)
  • AI: discuss proposal for defining HTP data (Pascale, 20 min)
    • How do we decide when to make annotations from high-throughput experiments?
    • If we decide that annotations from high-throughput experiments should be removed, what are the procedures (all annotations, some annotations)?
    • Do we want new evidence codes to indicate that the annotation was inferred from a high-throughput experiment?

Qualifiers that describe relationship of gene product (activity) to a biological process

DOS; Pascale.

As a result of work on LEGO in conversion to GPAD/GAF, we now have a wider set of relations linking gene products to biological processes (GAF qualifiers/GPAD column 2). These are derived from the broader set of causal relations developed for LEGO. We need to discuss how these should be applied to conventional annotations. This builds on a proposal to agreed in DC but not yet implemented to distinguish annotations where it is clear that a gene product activity is part of a process from cases where it is not clear if the activity is part of the process or causally upstream of it.

Annotation Issues - LEGO Annotations

Aligning Conventional and LEGO Annotations

  • A proposal to make Conventional Annotation align better with LEGO modelling (F-P linking) (Val)
https://github.com/geneontology/go-ontology/issues/12739#issuecomment-254623691

Evidence codes in Noctua

  • How are we going to handle ECO codes in Noctua. Currently there are only a limited number of codes that fall under 'used in manual assertion'. If we use codes that are not specific to the manual assertion part of the ontology, then they map to EXP. Are we going to request the entire set of codes that we think we might want to use or are we going to have an automated way to map to the correct code?

Example: http://noctua.berkeleybop.org/editor/graph/gomodel:5745387b00001874

Generating conventional annotations from Noctua models

  • GAF/GPAD inference from LEGO models (Jim and David OS, Chris, Kimberly, David Hill, ...)
    • Introduction to inferring annotations from LEGO: Extended Gene Product to GO term relations; Reasoning across causal chains.
      • Jim Balhoff: Inference using Blazegraph & RDFox
      • DOS: Templates, design patterns and inference.
  • MGI's experience roundtripping with Noctua Models (DavidH)
  • Are we going to allow Noctua to generate conventional annotations to the root nodes of the ontology?
    • This would be useful for contextual annotations that are to otherwise root nodes.
    • However some groups block these kinds of annotations because in the past, these annotations were used to keep track of genes about which we had no information.
    • Note that the evidence code for a root node annotation in Noctua would/could be different in that the curator might assert that a gene product has some molecular function due to the observation that, when mutated, there is a phenotypic outcome, e.g. apoptosis execution fails.
    • This is a different statement from no biological data (ND) in which there is no information at all to assert a role in any biological process.
  • Are some conventional annotation rules inappropriate for Noctua annotation?
    • For a molecular function occurring in a cellular location, isn't IEP a more appropriate evidence code? IDA would mean that the function was assayed in situ. https://github.com/geneontology/go-annotation/issues/1395
    • Since binding is a part of many molecular functions, should we allow evidence codes other than IPI for binding (eg TAS)?

Regulation relations

Regulation and causal relations are central to LEGO annotation and to inference based on LEGO models, but definitions and guidelines still need work to ensure consistency and clarity. DOS: I would like to present progress on the development of the relevant relations along with a proposal for how to improve them. This would probably work best as a collaborative presentation with LEGO annotators where we can show application to LEGO models.


Conference Calls and Communication

Estimated time: 30 minutes

  • Discuss different options for reducing the number of conference calls, while still facilitating effective communication between the different GO groups, e.g. annotators, ontology editors, software team
    • Consolidate all annotation calls (Monday LEGO, Tuesday Annotation, Tuesday PAINT) into one Tuesday annotation call, frequency TBD
    • Consolidate LEGO, Annotation, PAINT, and Ontology Development calls into one weekly GO call
  • Discussion on the design of new SOPs for mechanisms of communication
    • What is the best mechanism to alert annotation groups of changes to the ontology that will affect annotations? We have started a table of contacts, but is this how annotation groups would like to proceed?
    • Review of github repositories, what to record where, who is processing/clearing tickets, etc.
  • Discussion on what it means to be a member of the Gene Ontology Consortium, not just the NHGRI grant.
    • Agreed to standards, which ones?
  • Decide where and when to hold next GOC meeting (also whether to include SAB next time)

Wrap up, action items

Time permitting, or post-meeting breakout

Annotation Metrics

Estimated time: 1 hour

  • What are the optimal metrics to assess progress in GO annotation?
    • Number of annotations
    • Number of references
      • Recall ZFIN's 'paper complexity' measure as a way of normalizing for different paper content (Doug mentioned in Geneva)
    • Revised annotations, e.g. updating to a new term
    • Removing annotations, e.g. improving knowledge about how a gene product affects a downstream process
    • Adding appropriate contextual information to existing annotations
    • Percentage of genome annotated vs percentage of genome with annotatable information?
  • How does LEGO modeling change our assessment of a curator's contributions?
  • Multiple funding bodies (Ruth)
  • Distinguishing annotations that are created automatically, e.g. inference pipelines (Tony)
  • Recognizing individual curators contributions via Orcid IDs.
    • Determine at what level attribution occurs
    • Inclusion of funding source