Ontology processing recipes

From GO Wiki
Jump to navigation Jump to search

This page collects some common "ontology processing" tasks performed by GO. For many of these we have a variety of ad-hoc solutions in place. It would be good to unify how we do this kind of processing using standard 3rd party tools such as OPPL, Populous and Rightfield.

This may be much easier if the source file the ontology editors edit is in OWL, rather than involving awkward translation back and forth

Basic

Subset completion

  • Example: add everything that's in goslim_generic to gosubset_prok

(not super-trivial at it requires understanding of the way we model subsets/slims in owl using annotation properties...)

Advanced

make a subset based on taxon constraints

  • Example: make an extended prok subset that excludes euk-specific cells

bio-chebi

Make axioms of the form:

 biological_process and has_input some ?X 
   EquivalentTo
 biological_process and has_input some ?Y
 WHERE:
  ?X SubClassOf is_conjugate_base_of some ?Y
  OR
  ?X SubClassOf is_conjugate_acid_of some ?Y

This is currently done by a popl script - switching to oppl will make this more future proof

OPPL

The following OPPL script could replace the popl:

?x:CLASS, ?y:CLASS 
  SELECT ASSERTED ?x subClassOf 'is conjugate acid of' some ?y or 'is conjugate base of' some ?y
   
BEGIN
  ADD BFO_0000057 some ?x EquivalentTo BFO_0000057 some ?y,
  ADD RO_0002313 some ?x EquivalentTo RO_0002313 some ?y,
  ADD RO_0002233 some ?x EquivalentTo RO_0002233 some ?y,
  ADD RO_0002234 some ?x EquivalentTo RO_0002234 some ?y
END;

Seed ontology file with ChEBI import and required object properties:

<?xml version="1.0"?>
<rdf:RDF xmlns="http://purl.obolibrary.org/obo/chebi.owl#"
     xml:base="http://purl.obolibrary.org/obo/chebi.owl"
     xmlns:obo="http://purl.obolibrary.org/obo/"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <owl:Ontology rdf:about="http://purl.obolibrary.org/obo/go/extensions/bio-chebi.owl">
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/chebi.owl"/>
    </owl:Ontology>
     
    <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/BFO_0000057"/>
   
    <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/RO_0002313"/>
   
    <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/RO_0002233"/>
   
    <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/RO_0002234"/>
</rdf:RDF>


Unfortunately, the current OPPL implementation (command-line and Protege plugin) is too slow. The script did not finish after waiting for more than 24 hours.

merging axioms from xp file

  • Example: move all equiv axioms that use transport relations from go_xp_bp to live file

parsing terms to make logical definitions

  • Example: if label matches "cellular X" then make an equiv axiom

This is currently done by Obol, but it might be useful if editors had scripts to do this themselves

make text defs/syns from logical definitions

This is currently done for de novo terms by TG, but there may be a need to go back and 'refactor' existing text defs.

assertions & justification

  • Example:
  1. remove all SubClassOf links between terms with chebi logical defs
  2. is the removed link in the set of inferences?
  3. if not, add the link back and add an annotation to the link (e.g. "can't infer this")

Note: this is currently done in Oort by passing the CHEBI ID for 'chemical entity' using the --justify-from argument. It would be good to have a flexible way of doing this.

GO Annotations (Advanced)

these require Oort to translate the GAF to OWL

Inter-ontology inference

  • if G is capable_of F and F part_of some P, then G capable_of some P

Data Integration (advanced)

BioPAX translation (basic)

  • pull out xrefs from a biopax file and add them to GO. Example: get GO to EC by using rhea.biopax and GO to RHEA and RHEA to EC mappings in there

BioPAX translation (advanced)

Translate biopax triples to something resembling the OWL we use to represent reactions, transport etc.

As a first pass, we would use the RHEA biopax export.