OBO-Edit: The Filtering System

From GO Wiki
Revision as of 12:31, 30 June 2014 by Gail (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The OBO-Edit filtering system allows for the specification of filters that can pick out ontology terms, ontology links, or parts of a term or link. This system is used all over OBO-Edit to specify searches, filter the ontology for saving, configure autocompletion boxes, and many other tasks.

Note that the OBO-Edit filtering system is distinct from the OBO-Edit querying system. The filtering system provides a framework for picking out ontology items, but it says nothing about how this system should be used. The querying system provides a framework for actually using the filtering system, including tools for doing fast cached queries over the ontology, wrapping search results to include extra useful information, etc. The querying system is largely independent of the filtering system, although there are bridge classes that allow the querying system to use parts of the filtering system when constructing queries.

The Filtering System

Java Particulars

The filtering system is defined in the org.obo.filters package.

The Basics

All filters ultimately inherit from the Filter interface. A simplified view of this interface is:

public interface Filter<T> extends Cloneable, VectorFilter<T> {

	public void setContext(JexlContext context);

	public boolean satisfies(T o);

	public Object clone();
}

The clone() and setContext() methods have to do with Java object maintenance and the scripting system. The interesting method is satisfies(). The satisfies() method returns true or false for a given input object to indicate whether that object passes this filter. It's possible to write a very simple filter by implementing the satisfies() method with whatever filtering behavior you want.

However, most users can't just code up their own filtering classes when they want to search or filter the ontology. Additional layers have been added to make this system accessible to regular users.

Compound Filters

One extension of the Filter interface is CompoundFilter. A simplified view of this interface is:

public interface CompoundFilter extends Filter {

	public static final int AND = 0;
	public static final int OR = 1;

	public void addFilter(Filter f);

        public int getBooleanOperation();

        public void setBooleanOperation(int op);
}

This extension aggregates a bunch of simple filters using either an AND or OR boolean operation. If the operation is AND, the CompoundFilter's satisfies() method returns true for a given input only if all its sub-filters return true for that input. If the operation is OR, the satisfies method returns true for a given input is any of its sub-filters return true for that input.

Structured Object Filters

The most common extension of the Filter interface is the structured object filter, found in the ObjectFilter interface. A structured object filter can only be used to filter terms (that is, LinkedObjects); any other object will not pass the filter.

A structured object filter has 5 important attributes:

  • Whether the filter should be negated
  • A search aspect
  • A search criterion
  • A search comparison
  • A string value to compare against

Often, the OBO-Edit documentation (and this document) will describe a structured object filter using the notation:

[aspect] [criterion] [comparison] "value"

If the filter is negated, the notation will be proceeded with the word NOT.

Structured Filter Overview

A structured filter's satisfies() method works via the following steps to determine whether some term X passes the filter:

  • Ask the search aspect which terms to look at to determine whether X matches. We'll call this set of terms S. Most of the time, S is a set that only contains X, but see below for exceptions.
  • For each term T in S:
    • Use the search criterion to extract a set of values from T. Let's call this set W.
    • For each value V in W:
      • Use the search comparison to determine whether the value V matches the filter's string value. If it does, term X passes the filter

Let's look at how this works in a real-world example. Let's say the filter is:

[Self] [Synonym] [contains] "function"

When OBO-Edit checks this filter against term X, the algorithm plays out as follows:

  • Ask the search aspect which terms to look at. Since the aspect is self, we only look at term X.
  • Use the search criterion to extract all the synonyms from X.
  • For each synonym:
    • Use the search comparison to determine whether the synonym matches the filter's string value. Since the comparison is "contains", OBO-Edit checks whether each synonym contains the string "function". If any of the synonyms contains the string "function", term X passes the filter.

Negation

If an ObjectFilter is set to negated mode by calling setNegate(true), the results of this filter are logically negated (if the filter would have returned true, it returns false, and vice versa).

Note that if an ObjectFilter is called on an invalid input, it will always return false, regardless of whether the filters negation flag is set.

Search Aspect

This is the most poorly understood - and one of the most powerful - elements of the filtering system. The search aspect decides which terms should be looked at when deciding whether a given term matches the filter.

Most filters only need to use the default aspect: self. This aspect means that when deciding whether term X matches a filter, look at term X itself.

However, there are other search aspects available:

  • Root
  • Ancestor
  • Descendant

The simplest of these is the Root aspect. The Root search aspect means "When checking whether term X matches the filter, don't look at term X. Look at term X's root term".

To illustrate how this works, consider the following filter with the aspect "Self".

[Self] [Name] [contains] "function"

When this filter is run on the GO, we get 4 non-obsolete results, each of which contains the word "function" in their name. But if we change the aspect to Root so that the filter becomes:

[Root] [Name] [contains] "function"

When this version of the filter is run, we get over 7000 results. This is because every term whose root contained the word "function" matched the filter. In practice, it means that every term in the "molecular_function" ontology matched the filter.

The Ancestor aspect works in a similar way. It means "When checking whether term X matches the filter, look at every ancestor of X. If any ancestor matches, then X matches". The Descendant aspect means "When checking whether term X matches the filter, look at every descendant of X. If any descendant matches, then X matches".

Search aspects are particularly useful in compound filters, where they can be used to restrict filtering to some subset of an ontology.

Note that currently, the Root, Ancestor and Descendant aspects don't look at relationship types while traversing the graph. For example, all ancestors over any relationship type are considered by the Ancestor aspect. In the future, it will be possible to specify a relationship type with the Aspect to pick out what type of Ancestor, Descendant, or Root to consider.

Search Criterion

A search criterion extracts a value (or collection of values) from a term. Those values are compared against the filter's string value to check for a match. If any of the values given by the search criterion match, the whole filter matches.

There are