Inter-application communication

From GO Wiki
Jump to navigation Jump to search

This document describes the plan for how different web applications developed by the GOC (or in collaboration with the GOC) will communicate.

TL;DR For a quick intro to the concepts, go to a galaxy instance (e.g. http://galaxy.berkeleybop.org) then try fetching data from an external app like biomart, paying attention to the URLs along the way. Note that data coming into Galaxy on the last leg of a round trip requires a cookie having been set on the way out.

Background

It's impractical to have a single web application for all needs, yet we want to integrate the capabilities of different tools in a seamless way.

Example: the Common Annotation Tool will be a web-based application. It will use Category:Web Services such as Ontology Web Services for ontology search and term selection. We may also want to add a component for doing annotation from documents marked up by NLP tools. One option would be to consume NLP web services (e.g. UIMA, textpresso) and build our own UI within the CAT, but this might be a lot of work. Another is to provide a plugin architecture such that 3rd party UIs can be plugged in, but this may not be practical, especially for web applications that already exist.

Approach

We use a delegation-based strategy in order to loosely couple web applications.

  • When the user of the main tool (e.g. CAT) wants to use external functionality, the user's browser location is redirected to the external web tool (e.g. textpresso). The URL includes a query parameter such as GO_APP_URL=<URL>, which includes the URL of the main tool.
  • The external tool keeps track of their origin URL. It may choose to provide a welcome message (e.g. "welcome, GO user")
  • When the user is finished (e.g. they have a candidate set of gene-term pairs) they click "export to <main tool name>" (which is a link provided by the external tool)
  • The external tool then sends them back to the origin URL (e.g CAT) and includes as a query parameter a URL to the external tool which can be used to fetch the data (e.g. gene-term pairs)
  • The main tool then fetches the data from that URL and includes it into its current environment.

Examples

Proof of concept with GO Galaxy and Biomart

This is used by Galaxy - for example, to allow galaxy users to go to biomart, customize a query, then get the resylts back into their galaxy environment. To understand this better, you are encouraged to try this out in galaxy. In fact, other web apps could take advantage of this by "pretending" they are galaxy.

  • galaxy sends the user to biomart and appends GALAXY_URL=http://galaxy.berkeleybop.org/ to the URL (assuming the user is using the GO galaxy server)
  • biomart says "hello galaxy user". The users uses normal biomart functionality to collect their data. biomart shows "export to galaxy"
  • The user clicks export. They are redirected back to galaxy.berkeleybop.org. However, the URL also includes URL=http://biomart.org/<uuid>.
  • Behind the scenes galaxy slurps data from URL=http://biomart.org/<uuid> (biomart is keeping a hold of our data at this url)
  • the user sees their data in the galaxy environment

Implementation Example with GOOSE 2 and GO Galaxy

Round-trip Outline

On reaching Galaxy, the user has a cookie set in their browser. The user decides they want external data from GOOSE, so the they click on the appropriate link under "Get Data". Galaxy kicks the user out to the specified URL (with the specified method) along with the GALAXY_URL variable defined with the installation's URL.

On landing, GOOSE detects the existence of the GALAXY_URL and makes sure that it is preserved (GOOSE is stateless, so this is done with CGI variables). The user interacts with GOOSE, trying to find the data that they want; when the user finds the data that they want, GOOSE provides a form to reproduce the data in a single stroke.

The user then POSTs back to the GALAXY_URL, with the URL parameter defined with where the Galaxy installation can get the data in the proper format and the URL_method parameter defined with how Galaxy should get it. Galaxy then brings the user back into the environment and grabs the data in the background, with the session preserved by the initial cookie.

GO Galaxy Setup

Add a data source for GOOSE:

https://bitbucket.org/cmungall/galaxy-obo/src/42c7aceb232a/tools/data_source/go_goose.xml

And add this tool to the tool conf file, with:

<tool file="data_source/go_goose.xml" /> 

in: https://bitbucket.org/cmungall/galaxy-obo/src/42c7aceb232a/tool_conf_obo.xml

GOOSE 2 setup

Detect the incoming GALAXY_URL CGI variable; since GOOSE stateless, it has to be preserved in the URLs between calls--in this case, a variable display in the HTML and adding a hidden form variable for future calls.

[% IF galaxy_url %]
<input type="hidden" name="GALAXY_URL" value="[% galaxy_url %]" />
[% END %]

When there are results in GOOSE and the GALAXY_URL has been detected, an additional form is presented.

[% IF galaxy_url AND direct_gaffer_id_url_safe AND direct_gaffer_gaf_url_safe %]

Export to GO Galaxy: 
<form id="galaxyform" action="[% galaxy_url %]" name="galaxyform" method="POST" target="_blank">
   <input id="URL" type="hidden" name="URL" value="[% direct_gaffer_id_url_safe %]" />
   <input type="hidden" name="URL_method" value="get">
   <input name="submit" type="submit" value="Complete ID list" />
</form>

<form id="galaxyform" action="[% galaxy_url %]" name="galaxyform" method="POST" target="_blank">
   <input id="URL" type="hidden" name="URL" value="[% direct_gaffer_gaf_url_safe %]" />
   <input type="hidden" name="URL_method" value="get">
   <input name="submit" type="submit" value="Complete GAF" />
</form>

[% END %]

Pros/Cons

Advantages

  • lightweight
  • loosely coupled
  • easy to retrofit into existing applications - no rewrites required for data providers
  • minimizes redundancy: no duplicate programming effort, and no duplicate/overlapping capabilities in different tools

Disadvantages

  • Not perfectly seamless
    • better than cut-and-pasting results from external app into main app (the norm now for some)
    • not suited for cases where frequent back and forth is required
      • (why not?)

GO applications

TermGenie

TG should be able to communicate with other TG instances. Scenario: GO curator needs term "glomerular cell development", but "glomeral cell" is not in CL. Starting from go.termgenie.org, they select the "cell development" template. The autocomplete returns to results. go.tg asks "do you need a new cell term" - if yes, they are redirected to cl.termgenie.org. They make their term (e.g. by composing cell with UBERON:glomerulus). After they get the term they are returned to go.termgenie.org, with the new CL ID handily popped in to the cell type slot

(this is not completely trivial, as the cl.tg instance has to send back all new axioms in order for the reasoning on the go side to work)

AmiGO 2

We need to enable AmiGO as a galaxy slave in any case. Scenario: galaxy user wants a custom GAF for analysis. They come into AmiGO 2, play with facets, then export their results.

GOOSE

The experimental GOOSE, as part of AmiGO 2, has a partial implementation, available at:

http://amigo2.berkeleybop.org/
http://amigo2.berkeleybop.org/cgi-bin/amigo2/goose

The complete implementation is waiting on a more complete implementation of AmiGO2 and GOlr.

CAT

The Common Annotation Tool will most likely act as the main app and consume external apps. Example external apps

  • NLP tools (e.g. textpresso). CAT will pass along an object (gene, set of genes, paper). The NLP app will work with the user to generate data, e.g. gene-term pairs, that are sent back to CAT for further editing and refinement
  • Prediction environments
  • AmiGO 2 / QuickGO. E.g. the user may wish to select a list of genes they wish to annotate today - e.g. they may wish to bring in all transcription factors involved in development into the CAT environment in order to review them