LEGO May 9, 2016

From GO Wiki
Jump to: navigation, search



Software Updates

Evidence Cloning

  1. DavidH has been successfully using the evidence cloning feature.
  2. One nice enhancement to the evidence cloning feature would be to list a brief bibliography in the table next to the PMID, as it is unlikely that curators will be able to remember, just from the PMID, what each paper is. Can we use PubMed or Europe PMC web services for something like that? (Kimberly)

Form Interface for Curation

  • Revisit this? How high a priority is this for wider adoption of the tool?


The Jenkins job is running again.

Still needed before we can import (into MGI):


  1. Stable purl from which to retrieve files. We will use GAF for now, but we have a ticket in to be able to use GPAD. (GH Noctua issue #200 -
  2. Folded annotations for individuals annotated by the regulates relation. (GH Noctua issue #189 -


  1. Attribution to individual groups. For example 'MGI-Noctua'. (GH Noctua issue #84 -
  2. An automated QC check on annotations made in Noctua. For example, tagging annotations that don't have evidence. (
  3. Automated updated of models to github so that GAFs/GPADs are generated on a regular basis.
  4. Attribution of curator comments and evidence sentences to the appropriate reference. (

Determining the Extent of Upstream/Downstream to Capture in GAF/GPAD

  • Continuing discussion from last week wrt capturing annotations downstream to reflect knowledge of the biological system.
  • For an illustration, see:
    • Fatty acylation of Wnt is required for its secretion
    • Currently, worm, fly, fish, mouse, and human acyltransferases of the porcupine family are annotated to Wnt protein secretion or some variant (direct or regulation) of Wnt signaling pathway. This reflects the biological context of the activity of these acyltransferases.
    • Additionally, Wntless protein family members (transporters, transporter chaperones?) that bind Wnt and shuttle it through the secretory pathway are annotated to a number of different Wnt-related BPs:
      • positive regulation of Wnt signaling pathway
      • positive regulation of Wnt protein secretion
      • regulation of Wnt signaling pathway by Wnt protein secretion
    • The current GAF/GPAD output only includes, as an annotation extension, the immediate downstream process which in the illustrative model is 'Golgi to plasma membrane transport'
  • Since the evidence that a protein has an activity, e.g. lipid transferase, may be different from the evidence that it affects downstream signaling and development pathways, it is necessary to create edges between the upstream activity and the downstream process in order to retrieve the desired annotations in the GAF.
  • This raises the issue of the qualifier proposal again, though. Do we want to move forward with using the expanded list of qualifiers in the GAF?
  • If yes, what are the options?
    • Only add new BP qualifiers, e.g. causally upstream of, in LEGO-generated GAF/GAPD?
    • Add a default qualifer to all BP annotations which could then be updated as curators review annotations and create new LEGO models?
    • Using existing evidence codes and ontology, try to come up with the best approximation for BP qualifiers for as many annotations as possible and then revise as needed? For example, IDA BP annotations to metabolic process terms may be good candidates for part of qualifiers, while IMP annotations to development or behavioral terms could get a less granular qualifier, the equivalent of affects.


  • DavidH and Kimberly are using this week as a documentation week. By the end of the week, we will have a training manual for Noctua.
  • Stacia will work at SGD to create videos.

Models Discussion

cdc2 - Continuing Discussion from 2016-04-25


  • On call: Chris, Dan Keith, David OS, David H, Giulia, Helen, Kimberly, Seth, Stacia, Suzi

Software Updates

Evidence Cloning

  • Works well for cloning and also enables cut and paste
  • Feature request - listing additional bibliographic information for papers listed as evidence
    • Could possibly re-use code currently on AmiGO
    • AI: Kimberly will open a github ticket for this feature.

Form Interface

  • +1 from SGD curators
  • High priority, but not highest at this point.
  • AI: Make a github ticket for this feature.

Evidence Model

  • Will be testing this before Heiko leaves.
  • Publications will now be individuals, so will be able to add comments (evidence sentences, for example) to specific references.


  • Essential
  • Need stable URLs for retrieving files
  • Need folded annotations in GAF/GPAD in order to get, for example, a separate annotation to a regulation term when there is a direct regulates relation between an activity and a process
    • These inferences should be happening based on the new patterns from David OS??
    • But currently, the annotation appears as an AE
    • What about an immediate post-processing step that would generate these annotations as part of the Jenkins job since doing this as part of the Minerva pipeline could be computationally intensive?
    • Heiko currently generates the folded inferences, but these are not being generated for LEGO models
  • Desirable
    • Attribution to specific groups, GO_Noctua -> GO_Noctua_MGI
    • Automated error checking
    • Automated updated models to github
      • Should happen every 24 hours
    • Associating evidence with specific references
      • How best to model this?
        • Sentences could be their own individuals
        • Sentences would be part_of a publication
        • Sentences would then support an assertion

Capturing Extent of Causally Upstream Of in a GAF/GPAD

  • Example using Wnt pathway
  • Wnts need to be lipid modified to be secreted
  • Wnt acyltransferases are all currently annotated to some flavor of Wnt signaling or Wnt protein secretion
  • To capture this in a LEGO model we need to make a direct connection between the acyltransferase and the Wnt signaling pathway
  • Distinguish between a logically redundant assertion with specific evidence vs a stronger assertion with specific evidence
    • The mom-1 relation to Wnt signaling is currently logically redundant but with specific evidence
  • Also look at Wntless and its annotations, currently in GAFs as some form of regulation of Wnt secretion or Wnt protein signaling
  • Implications for current ontology development
    • If you characterize signaling by ligand type, do you run into problems?
      • Need to be clear what we mean here: cell-cell signaling vs signal transduction. The latter is mostly defined by receptor. The former covers secretion, transport between cells, presentation to a receptor and signal transduction. It can potentially be defined by ligand. Doesn't it make sense to do so in cases where there is a specific pathway for processing of a ligand?
  • Current GAF generation only generates AE to immediate downstream step.
  • Additional AEs to downstream steps can only come from direct links between an activity and a process.
  • This leads to question of how to generate these annotations in the GAF and the issue of expanding the qualifier column to make the association between the activity and the process clearer.
    • DOS: The 'qualifier' column = the GP : GO relationship. If we expand the range of these, then we explicitly broaden the meaning of annotation. Need to think through the implications of this for general use of GO annotations.
  • Contrast the translation upstream vs the acylation upstream - the former is not known to be Wnt-specific, but the latter is Wnt-specific
    • DOS: But note - *this* could be captured under a broader term for cell-cell signaling by wnt - with parts for ligand processing, secretion, intercellular transport/presentation, & signal transduction. The advantage of this is that it makes it explicit what it means to be wnt-specific. If we leave this to the LEGO model, then it comes down to an undocumented curator choice about when to use place causally_upstream_of links. Are we confident of curator consistency on this?
    • DPH: It could but we would need to be really careful. This seems straightforward on the surface, but the underlying biology might be more complex than first thought. The reason that the signaling group decided to start and classify a signaling cascade based on the receptor, was because they found examples of different types of ligands binding different types of receptors, not always named for the ligand. This could lead to confusion in annotation. To avoid this using the suggested strategy, it would be necessary to define the signal transduction in the 'cell-cell signaling process by X' as a signal transduction that starts with the reception of a signal enabled by X. This doesn't contradict any of the work already done. This should be an ontology discussion.
  • Proposal is to add these qualifiers to the GAF so we can get the annotations curators would normally make in the GAF:
    • DOS: The logically consistent way to do this would be to add the qualifiers whether the relationship was asserted or inferred. DH arguing for this only where the link is asserted by a curator (?)
    • DPH: No, but in the conventional gaf, it would not be straightforward to attribute evidence to the inferred annotation with the qualifier. To get things moving, for now we would simply want the asserted ones. In the future we might want to have the ability to see when an asserted relation and an inferred relation 'contradict' one another. For example if a gene product executes a function that is inferred to be part of a process, but is asserted to execute a function that regulates the process. Further in the future, I am sure that we will want to make associations between gene products and processes based on property chains in models. Here is where I think DOS's point really comes into play. If models contain very long causal chains that are true, but not necessarily useful, then this could become very messy. For example RNA polymerase being upstream of any process mediated by a gene product wouldn't be useful. A few weeks ago Stacia suggested that the practical way to handle this would be to only populate the model with things that a curator deems relevant to the biology being modeled. Maybe this is the best rout to take.
    • DOS: Making a qualifier for causally_upstream_of could be dangerous. Casual chains can potentially become very long => associations that most biologists would consider wrong. Our protection against this is to rely on curators not to add the links in question. But are there clear rules for this? If so, can we codify them? (how) do they map to all/some patterns for relationships (which, strictly, don't apply in LEGO!)? If there are no clear rules, is it sufficient to rely on curator instinct? To assess this, I think we need to look at many more examples. It is easy to be fooled into thinking we have a generally safe method on the basis of a small number of cases where the correct way to annotate is clear.
    • DPH: Having been an annotator for a long time and having looked at a lot of annotations, I would say that this already happens. Annotators create annotations for processes that are a result of very long chains of causally_upstream_of with no indication in the current methodology to indicate this. Adding the qualifier will only make things more explicit. Right now the existing annotations consist of a gene product and a GO term and in my experience include every possible qualifier we could think of adding. We are looking for improvement over the current method. But yes, it is still very up to curator judgement.

Documentation Week

  • David H and Kimberly will be working on LEGO documentation and a training manual this week
  • Stacia and Kevin at SGD will be helping to create Noctua videos