Noctua MOD Imports: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
Line 7: Line 7:
*MODs will import their manual GO annotations, e.g Assigned_by 'MGI'.
*MODs will import their manual GO annotations, e.g Assigned_by 'MGI'.
**At this time, annotations made to the same species by external groups will not be imported.
**At this time, annotations made to the same species by external groups will not be imported.
**MODs may choose to include/exclude annotations made with specific evidence codes, but the overall goal is to get a full set of manual annotations imported as GO-CAMs.


== Import File Format ==
== Import File Format ==

Revision as of 09:54, 18 March 2019

Introduction

This page contains documentation for importing a set of MOD annotations into Noctua as GO-CAMs (Gene Ontology Causal Activity Modeling). This initial set of guidelines is based on a set of general import rules that have been further refined by testing the imports with annotations from MGI and WormBase. We expect that each group that imports their annotations may have some unique issues that will need to be dealt with on an individual basis, but these guidelines are nevertheless intended to give all groups a point from which to begin the process.

Import Files

Annotation Sets Imported

  • MODs will import their manual GO annotations, e.g Assigned_by 'MGI'.
    • At this time, annotations made to the same species by external groups will not be imported.
    • MODs may choose to include/exclude annotations made with specific evidence codes, but the overall goal is to get a full set of manual annotations imported as GO-CAMs.

Import File Format

  • The annotation import pipeline uses the GPAD annotation file format.
  • Using GPAD file format allows us to import key information not in the GAF:
      • all gene product-to-term (gp2term) relations
      • annotation metadata in the Annotation_property field
  • The GPAD file used for annotation import may likely be a 'one-off' file that includes metadata stored in other curation tools that would not normally be output in the GPAD file consumed by our users.




  • How will annotations be modeled?
    • Dustin and Ben are working on the conversion code
    • One gene's annotations = one GO-CAM model
      • Other alternatives we considered:
        • one paper = one GO-CAM (spreads potentially similar annotations and evidence out amongst many different models, harder to review and QC)
        • one "organismal" process = one GO-CAM (although ultimately where we'd like to go, for this initial phase, a one gene = one GO-CAM model is probably easier to review and QC)
  • Input file is GPAD
    • Allows for expanded set of gp2term relations and annotation properties, e.g. contributor
  • Annotations will not be 'transformed' at all upon import, i.e. what's in the file is what will be imported
  • Iterative process, each round of imports is available on noctua-dev

Modeling Conventional Annotations as GO-CAMs: Rules

Molecular Function

  • [GP] <- enabled_by [MF]
  • [GP] contributes_to -> [MF]

Cellular Component

  • [GP] part_of -> [CC]
  • [GP] colocalizes_with -> [CC]

Biological Process

  • Relation between MF and BP will be taken from existing gp2term relations in the GPAD. For example:
    • For existing GP 'involved in' BP:
      • [GP] <-enabled_by (GO:0003674 molecular_function) part_of-> BP
    • For existing GP 'acts upstream of or within' BP:
      • [GP] <-enabled_by (GO:0003674 molecular_function) causally_upstream_of_or_within -> BP

References

  • Will import one reference ID in order of preference:
    • PMID
    • GO_REF
    • doi
    • MOD paper id

With/From Field

  • github ticket on modeling multiple values in the With/From field in OWL, i.e. commas and/or pipes
  • For now, import as strings?
  • Future project may be to develop and use OWL Union and Intersection to represent what is meant by commas and/or pipes in the With/From field

Annotation Extensions

  • Working with Dustin and Paul to devise a set of rules to convert existing annotation extensions to GO-CAM
  • [https://docs.google.com/spreadsheets/d/1AEAj39mxQnBsuccIxWhpjiFKcdCJLw6uIa1eXiAG1Hs Google spreadsheet of annotation extensions for which the meaning and/or conversion needs to be clarified
  • This is currently where the bulk of the curator work is being done
    • For example, has_regulation_target extensions need to be modeled for GO-CAM