Noctua MOD Imports: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 88: Line 88:
**No more than one inhibited_by(ChEBI)
**No more than one inhibited_by(ChEBI)
** Multiple entries allowed?:
** Multiple entries allowed?:
***UBERON
***occurs_in (UBERON)
***EMAPA
***occurs_in (EMAPA)
***WBbt (contains both cell and anatomy terms)
***WBbt (contains both cell and anatomy terms)
***has_input/has_direct_input(geneID or ChEBI)
***has_input/has_direct_input(geneID or ChEBI)


===== Cellular Component =====
===== Cellular Component =====
*If the aspect is C
**No more than one part_of(CC)
**No more than one part_of(CC)
**No more than one part_of(CL)
**No more than one part_of(CL)
**No more than one part_of(UBERON or EMAPA)
**No more than one part_of(UBERON or EMAPA)
*If the aspect is P
 
===== Biological Process =====
**No more than one occurs_in(CC)
**No more than one occurs_in(CC)
**No more than one occurs_in(CL)
**No more than one occurs_in(CL)
**No more than one occurs_in(UBERON or EMAPA or WBbt)
**No more than one has_input/has_direct_input(geneID or ChEBI)
**No more than one part_of(BP)
**No more than one part_of(BP)
**Multiple entries allowed?:
***occurs_in(UBERON)
***occurs_in(EMAPA)
***occurs_in(WBbt)
***has_input/has_direct_input(geneID or ChEBI)




*Working with Dustin and Paul to devise a set of rules to convert existing annotation extensions to GO-CAM
*[https://docs.google.com/spreadsheets/d/1AEAj39mxQnBsuccIxWhpjiFKcdCJLw6uIa1eXiAG1Hs Google spreadsheet of annotation extensions for which the meaning and/or conversion needs to be clarified
*[https://docs.google.com/spreadsheets/d/1AEAj39mxQnBsuccIxWhpjiFKcdCJLw6uIa1eXiAG1Hs Google spreadsheet of annotation extensions for which the meaning and/or conversion needs to be clarified
*This is currently where the bulk of the curator work is being done
*This is currently where the bulk of the curator work is being done
**For example, has_regulation_target extensions need to be modeled for GO-CAM
 





Revision as of 14:20, 18 March 2019

Introduction

This page contains documentation for importing a set of MOD annotations into Noctua as GO-CAMs (Gene Ontology Causal Activity Modeling). This initial set of guidelines is based on a set of general import rules that have been further refined by testing the imports with annotations from MGI and WormBase. We expect that each group that imports their annotations may have some unique issues that will need to be dealt with on an individual basis, but these guidelines are nevertheless intended to give all groups a point from which to begin the process.

Import Files

Annotation Sets Imported

  • MODs will import their manual GO annotations, e.g Assigned_by 'MGI'.
    • At this time, annotations made to the same species by external groups will not be imported.
    • MODs may choose to include/exclude annotations made with specific evidence codes, but the overall goal is to get a full set of manual annotations imported as GO-CAMs.

Import File Format

  • The annotation import pipeline uses the GPAD annotation file format.
  • Using GPAD file format allows us to import key information not in the GAF:
      • all gene product-to-term (gp2term) relations
      • annotation metadata in the Annotation_property field
  • The GPAD file used for annotation import may likely be a 'one-off' file that includes metadata stored in other curation tools that would not normally be output in the GPAD file consumed by our users.

Modeling Conventional Annotations as GO-CAMs

One Gene = One GO-CAM

  • Each gene's set of conventional annotations will be imported as one GO-CAM.
  • Annotations with no extensions are modeled very simply, while annotations with extensions are converted to a GO-CAM model that most faithfully represents the information captured in the extension.
    • Note that not all relations used in annotation extensions are used in Noctua and GO-CAMs, so the exact translation of an extension to a GO-CAM is not always possible. However, in these cases, we have tried very hard to capture the information in a clear, consistent way.

Basic Rules

Note: links to examples currently point to models on the development site, noctua-dev, and may change as the import process iterates. We'll try to update links as quickly as possible if this happens.

Molecular Function

Cellular Component

Biological Process

  • Biological Process annotations are modeled with respect to Molecular Function, as this is the GO-CAM convention.
    • All Molecular Functions will be to the root node; this is the most conservative approach.
    • Note that these root MF annotations will NOT be exported in the GPAD file.
  • Relations between the root MF and BP will be derived from existing gene product-to-Biological Process relations in the GPAD.
GP 'involved in' BP
  • These will be modeled as root MF 'enabled by' GP is 'part of' a BP.
GP ' acts upstream of (or within)' BP
  • [GP] <-enabled_by (GO:0003674 molecular_function) causally_upstream_of_or_within -> BP
    • Example: mouse Rela - note that this rule is not implemented yet for import
    • Positive effect example:
    • Negative effect example:

References

  • Where multiple, pipe-separate references exist for a single annotation, only one reference ID is imported in order of preference:
    • PMID
    • GO_REF
    • doi
    • MOD paper id
  • Models when there is multiple evidence to support a given GO annotation:
    • If multiple, independent references exist to support a given annotation, do we want to combine them or leave them separate? What are the implications for later model manipulation?
    • Combining evidence on to a single edge wherever possible will simplify the graph display, but may make moving or merging annotations later more complicated.

With/From Field

  • Values in the With/From field will be imported as string literal, i.e. text will be imported "as is" with no further modeling in the OWL representation
  • github ticket on modeling multiple values in the With/From field in OWL, i.e. commas and/or pipes
  • Future project may be to develop and use OWL Union and Intersection to represent what is meant by commas and/or pipes in the With/From field

Annotation Extensions

  • Rules for importing annotation extensions address cardinality, domain and range constraints, and modeling when an extension relation is not used in Noctua.

Simple Conversion

  • These simple conversion rules are derived from a set of rules that Paul T. and Dustin put together.
  • Some existing annotations violate these rules, so it will be necessary to review those and make sure we're clear on what was meant by the annotation before we relax the rule.
Molecular Function
  • If the aspect is F (column A)
    • No more than one occurs_in(CC)
      • There are currently annotations with multiple occurs_in GO:CC extensions, e.g. protein binding from IntAct.
    • No more than one occurs_in(CL)
    • No more than one happens_during(BP)
    • No more than one part_of(BP)
    • No more than one activated_by(ChEBI)
    • No more than one inhibited_by(ChEBI)
    • Multiple entries allowed?:
      • occurs_in (UBERON)
      • occurs_in (EMAPA)
      • WBbt (contains both cell and anatomy terms)
      • has_input/has_direct_input(geneID or ChEBI)
Cellular Component
    • No more than one part_of(CC)
    • No more than one part_of(CL)
    • No more than one part_of(UBERON or EMAPA)
Biological Process
    • No more than one occurs_in(CC)
    • No more than one occurs_in(CL)
    • No more than one part_of(BP)
    • Multiple entries allowed?:
      • occurs_in(UBERON)
      • occurs_in(EMAPA)
      • occurs_in(WBbt)
      • has_input/has_direct_input(geneID or ChEBI)