Annotation Rule Engine

From GO Wiki
Jump to: navigation, search

Project lead: Amelia

For context see: Annotation_Quality_Control_Checks

These annotation rules will be specified in a computable form and a general purpose rule engine will execute the rules and produce the desired reports or filtering. The rule engine will be implemented in java as part of the org.geneontology.gold project. The intent is that the rules can be invoked in a variety of contexts:

  • run from the command line to produce report, or to filter files
  • run from within the admin servlet environment - at the time of GAF submission, or downstream
  • specific rules executed on specific datasets from a web form - allows MODs to run checks interactively

Related SWUG projects:

Timeline

  • 2010-09 collection rules in wiki
  • 2010-10 draft of XML specification of rules file
  • 2010-11 collection of rules into XML in go cvs: see annotation_qc.xml
  • 2011-01 basic browsing / html rendering of XML rules file
  • 2011-02 alpha version of rule engine
  • 2011-04 REST API

GAF File Format, Syntax

Mike's checks script

  • Each line of the GAF file is checked for the correct number of columns, the cardinality of the columns, leading or trailing whitespace
  • Column-specific checks:
    • Col 1 and all DB abbreviations must be in GO.xrf_abbs (case may be incorrect)
    • All GO IDs must be extant in current ontology
    • Qualifier, evidence, aspect and DB object columns must be within the list of allowed values
    • DB:Reference, Taxon and GO ID columns are checked for minimal form
    • Date must be in YYYYMMDD format
  • All IEAs over a year old are removed
  • Taxa with a 'representative' group (e.g. MGI for Mus musculus, FlyBase for Drosophila) must be submitted by that group only


Draft XML syntax

<rule>
	<id>GO_AR:00000XX</id>
	<title>brief summary of rule</title>
	<contact>email address</contact>
	<description format="[html|text]">description of the check to be performed</description>
	<implementation_list>
		<implementation>
			<script language="scriptLanguage" source="URL" />
			<input>
				<format>gaf1.0</format>
				<format>gaf2.0</format>
			</input>
		</implementation>
		<implementation>
			<script language="SQL">
			paste SQL query here
			</script>
			<input>
				<format>GO database</format>
			</input>
		</implementation>
	</implementation_list>
	<frequency>On submission</frequency>
	<status date="YYYYMMDD">[Proposed|Approved|Implemented|Deprecated]</status>
</rule>

Rule Engine Requirements

  • implemented in java, part of gold package
  • ability to launch external processes (e.g. perl scripts)
  • ability to connect to gold db
  • asynchronous execution
  • callable within servlet environment (cf Schema Overhaul, see section on gold watch)

Rule Engine API

There will be both a java API and a REST API that can be used from any application (e.g. Pombe curation tool).

Full specification forthcoming. Informally, the core capability of the API is to take as input one or more annotations (as a GAF files, as JSON, etc..) and for each annotation return one of the following:

  1. no operation
  2. inconsistency flag + reason (e.g. taxon constraint violation)
  3. suggested additional annotation (e.g. reciprocal binding, inter-ontology inference)
  4. suggested replacement annotation (e.g. more specific term based on cross-product)

It is up to the application what should be done with these.

The API may also report if the annotation is redundant (for this it needs access to the most recent annotations)

Related pages

Annotation Quality Control Checks

GAF Inference