LEGO May 16, 2016

From GO Wiki
Jump to: navigation, search

Bluejeans

https://bluejeans.com/969313231

Agenda

Mailing Lists

  • Software discussion list vs annotation discussion list
    • noctua-minerva (hosted by lbl)
    • go-discuss (hosted by Stanford)
    • go-consortium (hosted by Stanford)

Software Updates

Form Interface for Curation

  • Revisit this? How high a priority is this for wider adoption of the tool?

RO View for Curators

  • Being able to browse the RO would be helpful for curators doing LEGO curation
  • Are there any plans to add this functionality to AmiGO2?

GAF/GPAD

Still needed before we can import (into MGI):

Essential

  1. Stable purl from which to retrieve files. I have begun working with our software group to load GPAD, but I'm sure others will still need to be able to download usable GAFS. The reason I made the switch is because GPAD contains info about the curator who made the annotation and can handle full evidence codes.(GH Noctua issue #200 - https://github.com/geneontology/noctua/issues/200)
  2. Folded annotations for individuals annotated by the regulates relation. (GH Noctua issue #189 - https://github.com/geneontology/noctua/issues/189)
    1. We can get around this requirement by making the 'extra' annotations to the regulation terms in the interface. Note that this same issue exists with other templated terms in GO. In the case of regulation we want to generate an extra annotation. In the case of other annotations, we want to generate the more specific annotation.
    2. Although we can create 'regulation targets' in LEGO models (the correct way!!!!!), the short cut relation is not available. We need to be able to fold relations into the short cut relation or be able to finally represent this relation correctly. For now, the information is not lost in the model, but it won't be available in the output until we deal with this.
  3. For full adoption to all curation (at least at MGI), we will need the ability to annotate to proteoforms/isoforms and have this information processed correctly in the GAF or GPAD files. In the GAF file, this information is in column 17, but the object in the enabled_by field populates column 2. This issue needs to be solved.
    1. Possibility #1, curators are allowed to enter both isoforms and genes into Noctua in separate fields. This is equivalent to entering data for both columns #2 and column 17 in the GAF. If this is the case, when the GAF gets output, the GAF will be exactly correct with current spec. However, if the output is GPAD, how will the information about the isoform be captured? Will there be a mini-GPI file generated. What group is responsible for the GPI, is it still the group who officially contributes the annotations?
    2. Possibility #2, the Noctua interface stays exactly as it is now and the current enabled by field is equivalent to column #2 in a GPAD file. If this is the case, there needs to be something on the back end that associates the value in the enabled by field with the gene object for population into column #2 of a GAF and it needs to be able to somehow create an entry into a GPI file. Questions remains about data flow.
  4. Ability to annotate to entities not yet in NEO (https://github.com/geneontology/noctua/issues/278)

Desirable or Bugs

  1. Attribution to individual groups. For example 'MGI-Noctua'. It is possible we could do this during our GPAD load by combining the information in the contributor property with the GO_Noctua provider tag. For example my OrcID combined with GO_Noctua in the provider field #10 could be used to generate a new provider MGI_Noctua, not ideal. (GH Noctua issue #84 - https://github.com/geneontology/noctua/issues/84)
  2. An automated QC check on annotations made in Noctua. For example, tagging annotations that don't have evidence. (https://github.com/geneontology/noctua/issues/255)
  3. Attribution of curator comments and evidence sentences to the appropriate reference. (https://github.com/geneontology/noctua/issues/280)
  4. Multiple paper evidence on a relation is only returning one paper in the GAF/GPAD output
  5. Currently conventional annotations are only made if a molecular function in an annoton is linked to a process by part_of. We have asked for this to be relaxed to include causally_upstream_of and causally_upstream_of_or_within because although controversial, it is consistent with the way we have annotated in the past, particularly when annotating from mutant phenotypes. This issue is not anything to do with LEGO or Noctua, it is a curation issue that has been around for a looooooong time. Do we want annotations based on those relations reported in GAF and GPAD? Right now, all relations in in column #3 of the GPAD map to the default used in the file specs; enables, part_of and involved_in.

Further Testing

  1. For full adoption to all curation (at least at MGI), we will need the ability to annotate to proteoforms/isoforms and have this information processed correctly in the GAF or GPAD files. In the GAF file, this information is in column 17, but the object in the enabled_by field populates column 2. This issue needs to be solved. In the GPAD format, this issue is solved in column #2, but how will Noctua 'know' that the isoform ID entered into the enabled by field is associated with a mouse gene? Is Noctua aware of the GPI file? Will Noctua generate entries for the GPI file, or will that be done at the corresponding MOD?
    1. Test- Proteoform PR:000037269 (Shh, mouse)- failed
    2. Test- Proteoform UniProtKB:P35222-1 (CTNNB1, human)- failed
    3. Test- Complex annotation. Need to decide what complexes to allow.
  2. Annotations based on orthology that use a GO reference. (tested by dph on 5/14/2016, passed with GO_REF:0000096)

Annotation Data flow Questions

  1. Once we have imported annotations into MGI from Noctua, those annotations will be incorporated into our output files, GPAD and GAF. Is there a strategy for them not being duplicated when they are picked up by the GOC?
    1. Will annotations in AMIGO2 continue to only come from the 'official providers' or will they be loaded directly?
    2. If annotations cycle through the 'official providers', then the official providers should all be encouraged to load GPA files so that the associated annotation properties in column 12 of the GPAD file are not lost. How important is this since we have the original OWL saved?

Other Business

LEGO/Noctua training

  • As a test, MGI curators will be trained to use LEGO in mid-June
  • The UniProt curators have requested training. Hopefully we can schedule something this summer.
  • Kimberly and David have been working to tighten up the Curation Guide for LEGO.
    • Please have a look at the written guide if you have time
    • We need to figure out what new videos we want.
  • We will push the discussion of the Wnt signaling model to a future meeting because we think it will take almost the whole hour.

Minutes

  • On call: Dan Keith, David H., David OS, Giulia, Heiko, Helen, Judy, Kimberly, Melanie, Paul T., Seth, Suzi

LEGO Mailing Lists

  • We think it would be best if the LEGO annotation list was just the same as the regular annotation list.
  • Melanie: Can we have access to the archives of the mailing lists?
  • Seth: For the noctua-minerva lists, not much traffic here, but the emails could be made public.
  • go-discuss and go-consortium mailing lists can be accessed via mailman interface hosted at Stanford. If you're a subscriber, you should be able to access this.
  • https://mailman.stanford.edu/mailman/listinfo/go-discuss

Software Updates

Form Interface

  • Essential?
    • Some code developed for Monarch's form interface will be incorporated; curators could look at this although it is a bit different from what GO curators might use
    • AI: Re-visit some of Chris' mock-ups that show a view of the individuals and relations between them in the graph view
    • AI: Talk with Stacia and SGD about their use case/requirements?

RO Browsing in AmiGO

GPAD/GAF Outputs

What do we absolutely need to get GPAD/GAF outputs?

Stable URLs for retrieving GPAD/GAF files

  • See export lego to legacy directory
  • Chris: we need some naming conventions for files for the GPAD/GAF files on Jenkins
  • One GPAD/GAF per species, not per model, as that will explode the number of files and this is not sustainable

Folding over Regulates Relations

  • David presented the Fgfr1example
  • Start with a simple annoton example of an MF that occurs in a CC and is part of a signaling pathway
  • To extend the model, though, the curator wants to make a relation between the signaling pathway and regulation of cell proliferation
  • Conventional annotation would have resulted in an annotation to positive regulation of cell proliferation for Fgfr1
  • In the LEGO GAF, though, the positive regulation of cell proliferation is only added as an annotation extension to the Fgf signaling pathway annotation - biologically more accurate, but not what we've been doing in conventional annotation and not what the curators want to see in the GAF
  • David OS - the inference doesn't follow??
  • Question: does the property chain of part of over positively regulates -> positively regulates?
  • You can make the direct assertion in the LEGO model, but you still won't get the direct BP positive regulation of cell proliferation annotation, you'll only get the positive regulation blah.... as an annotation extension to the MF annoton.
  • We do want the GPAD/GAF output to reflect what is said in the model. We want the GPAD/GAF to capture more of the causal chain, but how far we go down the causal chain still needs to be decided.
  • The regulation assertion could be done on the GPAD/GAF downstream using the regular inference pipeline, but ideally we want this as part of the LEGO output.
  • Can the inference pipeline be included in the LEGO GPAD/GAF output?
  • Yes....but we need to make sure that all the plumbing is there for this to work.
  • Part of the issue is what axioms exist in the ontology and whether you can make an inferred annotation when a BP is an extension of a BP in contrast to a BP as an extension of an MF
  • Current GAF inference pipeline restricts how the inferences can be made
  • AI: We need to capture the regulation example as a unit test - in a Google doc with screenshots

Need has regulation target Relation

  • This is needed in the GPAD/GAF output

Annotating to Isoforms

  • Can cut and paste IDs, but they will not be validated
  • Should be able to use any isoform that has been used previously right now
  • Once the switch is flipped, it will be possible to go the cut and paste route
  • Longer term solution - get PRO to role out species-specific modules?
  • Another possibility - the MODs could supply Noctua with a gpi file
  • What then happens with the GPAD/GAF for Column 2 - annotated entity? GPAD would contain the annotated entity in Column 2; GAF would either have the annotated entity in Column 2 or the translated entity in Column 2 and the isoform in Column 17.
  • Is it okay to just output GPAD from Noctua? Are there groups that only can consume GAF?

Other things

Documentation

  • David and Kimberly working on documentation - please check the Google doc linked from the home page and provide feedback
  • Need to decide what videos we want Stacia and Kevin to make

Training

  • Training for MGI curators in June
  • Training for EBI curators later this summer