LEGO May 16, 2016: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 36: Line 36:
# Attribution of curator comments and evidence sentences to the appropriate reference.  (https://github.com/geneontology/noctua/issues/280)
# Attribution of curator comments and evidence sentences to the appropriate reference.  (https://github.com/geneontology/noctua/issues/280)
# Multiple paper evidence on a relation is only returning one paper in the GAF/GPAD output
# Multiple paper evidence on a relation is only returning one paper in the GAF/GPAD output
# Currently conventional annotations are only made if a molecular function individual is linked to a process by part_of. We have asked for this to be relaxed to include causally_upstream_of and causally_upstream_of_or_within because although controversial, it is consistent with the way we have annotated in the past, particularly when annotating from mutant phenotypes. Do we want those relations reported in GPAD? Right now, all relations in in column #3 of the GPAD map to the default used in the file specs; enables, part_of and involved_in.
# Currently conventional annotations are only made if a molecular function in an annoton is linked to a process by part_of. We have asked for this to be relaxed to include causally_upstream_of and causally_upstream_of_or_within because although controversial, it is consistent with the way we have annotated in the past, particularly when annotating from mutant phenotypes. This issue is not anything to do with LEGO or Noctua, it is a curation issue that has been around for a looooooong time. Do we want annotations based on those relations reported in GAF and GPAD? Right now, all relations in in column #3 of the GPAD map to the default used in the file specs; enables, part_of and involved_in.


Further Testing
Further Testing

Revision as of 09:46, 15 May 2016

Bluejeans

https://bluejeans.com/969313231

Agenda

Mailing Lists

  • Software discussion list vs annotation discussion list
    • noctua-minerva (hosted by lbl)
    • go-discuss (hosted by Stanford)
    • go-consortium (hosted by Stanford)

Software Updates

Form Interface for Curation

  • Revisit this? How high a priority is this for wider adoption of the tool?

RO View for Curators

  • Being able to browse the RO would be helpful for curators doing LEGO curation
  • Are there any plans to add this functionality to AmiGO2?

GAF/GPAD

Still needed before we can import (into MGI):

Essential

  1. Stable purl from which to retrieve files. I have begun working with our software group to load GPAD, but we still need to be able to download usable GAFS. The reason I made the switch is because GPAD contains info about the curator who make the annotation and can handle full evidence codes.(GH Noctua issue #200 - https://github.com/geneontology/noctua/issues/200)
  2. Folded annotations for individuals annotated by the regulates relation. (GH Noctua issue #189 - https://github.com/geneontology/noctua/issues/189)
    1. We can get around this requirement by making the 'extra' annotations to the regulation terms in the interface. We think this is actually the most accurate procedure given the way information is stored in conventional annotations.
    2. Although we can create 'regulation targets' in LEGO models (the correct way!!!!!), the short cut relation is not available. We need to be able to fold relations into the short cut relation or be able to finally represent this relation correctly. For now, the information is not lost in the model, but it won't be available in the output until we deal with this.
  3. For full adoption to all curation (at least at MGI), we will need the ability to annotate to proteoforms/isoforms and have this information processed correctly in the GAF or GPAD files. In the GAF file, this information is in column 17, but the object in the enabled_by field populates column 2. This issue needs to be solved.
    1. Possibility #1, curators are allowed to enter both isoforms and genes into Noctua in separate fields. This is equivalent to entering data for both columns #2 and column 17 in the GAF. If this is the case, when the GAF gets output, the GAF will be exactly correct with current spec. However, if the output is GPAD, how will the information about the isoform be captured? Will there be a mini-GPI file generated. What group is responsible for the GPI, is it still the group who officially contributes the annotations?
    2. Possibility #2, the Noctua interface stays exactly as it is now and the current enabled by field is equivalent to column #2 in a GPAD file. If this is the case, there needs to be something on the back end that associates the value in the enabled by field with the gene object for population into column #2 of a GAF and it needs to be able to somehow create an entry into a GPI file. Questions remains about data flow.
  4. Ability to annotate to entities not yet in NEO (https://github.com/geneontology/noctua/issues/278)

Desirable or Bugs

  1. Attribution to individual groups. For example 'MGI-Noctua'. (GH Noctua issue #84 - https://github.com/geneontology/noctua/issues/84)
  2. An automated QC check on annotations made in Noctua. For example, tagging annotations that don't have evidence. (https://github.com/geneontology/noctua/issues/255)
  3. Attribution of curator comments and evidence sentences to the appropriate reference. (https://github.com/geneontology/noctua/issues/280)
  4. Multiple paper evidence on a relation is only returning one paper in the GAF/GPAD output
  5. Currently conventional annotations are only made if a molecular function in an annoton is linked to a process by part_of. We have asked for this to be relaxed to include causally_upstream_of and causally_upstream_of_or_within because although controversial, it is consistent with the way we have annotated in the past, particularly when annotating from mutant phenotypes. This issue is not anything to do with LEGO or Noctua, it is a curation issue that has been around for a looooooong time. Do we want annotations based on those relations reported in GAF and GPAD? Right now, all relations in in column #3 of the GPAD map to the default used in the file specs; enables, part_of and involved_in.

Further Testing

  1. For full adoption to all curation (at least at MGI), we will need the ability to annotate to proteoforms/isoforms and have this information processed correctly in the GAF or GPAD files. In the GAF file, this information is in column 17, but the object in the enabled_by field populates column 2. This issue needs to be solved. In the GPAD format, this issue is solved in column #2, but how will Noctua 'know' that the isoform ID entered into the enabled by field is associated with a mouse gene? Is Noctua aware of the GPI file? Will Noctua generate entries for the GPI file, or will that be done at the corresponding MOD?
    1. Test- Proteoform PR:000037269 (Shh, mouse)- failed
    2. Test- Proteoform UniProtKB:P35222-1 (CTNNB1, human)- failed
  2. Annotations based on orthology that use a GO reference. (tested by dph on 5/14/2016, passed with GO_REF:0000096)

Annotation Data flow Questions

  1. Once we have imported annotations into MGI from Noctua, those annotations will be incorporated into our output files, GPAD and GAF. Is there a strategy for them not being duplicated when they are picked up by the GOC?
    1. Will annotations in AMIGO2 continue to only come from the 'official providers' or will they be loaded directly?
    2. If annotations cycle through the 'official providers', then the official providers should all be encouraged to load GPA files so that the associated annotation properties in column 12 of the GPAD file are not lost.

Other Business

LEGO/Noctua training

  • As a test, MGI curators will be trained to use LEGO in mid-June
  • The UniProt curators have requested training. Hopefully we can schedule something this summer.
  • Kimberly and David have been working to tighten up the Curation Guide for LEGO.
    • Please have a look at the written guide if you have time
    • We need to figure out what new videos we want.

We will push the discussion of the Wnt signaling model to a future meeting because we think it will take almost the whole hour.