LEGO May 16, 2016

From GO Wiki
Jump to navigation Jump to search

Bluejeans

https://bluejeans.com/969313231

Agenda

Mailing Lists

  • Software discussion list vs annotation discussion list
    • noctua-minerva (hosted by lbl)
    • go-discuss (hosted by Stanford)
    • go-consortium (hosted by Stanford)

Software Updates

Form Interface for Curation

  • Revisit this? How high a priority is this for wider adoption of the tool?

RO View for Curators

  • Being able to browse the RO would be helpful for curators doing LEGO curation
  • Are there any plans to add this functionality to AmiGO2?

GAF/GPAD

Still needed before we can import (into MGI):

Essential

  1. Stable purl from which to retrieve files. We will use GAF for now, but we have a ticket in at MGI to be able to load GPAD. Note to all groups, that GPAD is desirable because it contains info about the curator who make the annotation.(GH Noctua issue #200 - https://github.com/geneontology/noctua/issues/200)
  2. Folded annotations for individuals annotated by the regulates relation. (GH Noctua issue #189 - https://github.com/geneontology/noctua/issues/189)
    1. We can get around this requirement by making the 'extra' annotations to the regulation terms in the interface.
    2. Although we can create 'regulation targets' in LEGO models (the correct way!!!!!), the short cut relation is not available. We need to be able to fold relations into the short cut relation or be able to finally represent this relation correctly. For now, the information is not lost in the model, but it won't be available in the output until we deal with this.
  3. For full adoption to all curation (at least at MGI), we will need the ability to annotate to proteoforms/isoforms and have this information processed correctly in the GAF or GPAD files. In the GAF file, this information is in column 17, but the object in the enabled_by field populates column 2. This issue needs to be solved.
    1. Possibility #1, curators are allowed to enter both isoforms and genes into Noctua in separate fields. This is equivalent to entering data for both columns #2 and column 17 in the GAF. If this is the case, when the GAF gets output, the GAF will be exactly correct with current spec. However, if the output is GPAD, how will the information about the isoform be captured? Will there be a mini-GPI file generated. What group is responsible for the GPI, is it still the group who officially contributes the annotations?
    2. Possibility #2, the Noctua interface stays exactly as it is now and the current enabled by field is equivalent to column #2 in a GPAD file. If this is the case, there needs to be something on the back end that associates the value in the enabled by field with the gene object for population into column #2 of a GAF and it needs to be able to somehow create an entry into a GPI file. Questions remains about data flow.
  4. Ability to annotate to entities not yet in NEO (https://github.com/geneontology/noctua/issues/278)
  5. Currently conventional annotations are only made if a molecular function individual is linked to a process by part_of. We have asked for this to be relaxed to include causally_upstream_of and causally_upstream_of_or_within because although controvesial, it is consistent with the way we have annotated in the past. If we decide we really want that, do we want those relations reported in GPAD? Right now, these relations map to the default used in the file specs; enables, part_of and involved_in.

Desirable or Bugs

  1. Attribution to individual groups. For example 'MGI-Noctua'. (GH Noctua issue #84 - https://github.com/geneontology/noctua/issues/84)
  2. An automated QC check on annotations made in Noctua. For example, tagging annotations that don't have evidence. (https://github.com/geneontology/noctua/issues/255)
  3. Attribution of curator comments and evidence sentences to the appropriate reference. (https://github.com/geneontology/noctua/issues/280)
  4. Multiple paper evidence on a relation is only returning one paper in the GAF/GPAD output

Not Yet Tested

  1. For full adoption to all curation (at least at MGI), we will need the ability to annotate to proteoforms/isoforms and have this information processed correctly in the GAF or GPAD files. In the GAF file, this information is in column 17, but the object in the enabled_by field populates column 2. This issue needs to be solved. In the GPAD format, this issue is solved in column #2, but how will Noctua 'know' that the isoform ID entered into the enabled by field is associated with a mouse gene? Is Noctua aware of the GPI file?
  2. Annotations based on orthology that use a GO reference. (tested by dph on 5/14/2016, passed with GO_REF:0000096)

Annotation Data flow Questions

  1. Once we have imported annotations into MGI from Noctua, those annotations will be incorporated into our output files, GPAD and GAF. Is there a strategy for them not being duplicated when they are picked up by the GOC?
    1. Will annotations in AMIGO2 continue to only come from the 'official providers' or will they be loaded directly?
    2. If annotations cycle through the 'official providers', then the official providers should all be encouraged to load GPA files so that the associated annotation properties in column 12 of the GPAD file are not lost.

Other Business

If time permits, we will continue the discussion of the Wnt signaling model.