AmiGO Manual: Installation 1.6

From GO Wiki
Revision as of 19:11, 4 December 2008 by Sjcarbon (talk | contribs)
Jump to navigation Jump to search

Overview

This document is intended for AmiGO version 1.6.* (currently in beta). For version 1.5.*, please check AmiGO_Manual:_Installation.

AmiGO, at its heart, is a simple perl CGI script. However, behind the simple external interface lies a somewhat baroque system of caches, databases connections, temporary files, and delegation. To set-up all of these things properly, AmiGO provides an installation script ("install.pl") that asks questions about the user's environment and tries to catch the biggest errors that one can make during installation. But no program is perfect.

This document is intended to help fill the gap between the cryptic install script and what the developers know because they wrote it.

Download

The most recent stable version of AmiGO should be available as part of the go-dev repository on the GO CVS site at SourceForge.net: http://sourceforge.net/projects/geneontology

The sourceforge CVS repository can be found at: geneontology.cvs.sourceforge.net:/cvsroot/geneontology go-dev (this is a change from a couple years ago). I do not think that you need any subdirectories of go-dev other than go-perl, go-db-perl, or amigo, but you do need a GO database you can connect to via DBD::mysql (I presume you have this since you have an old amigo install).

Requirements

GO database

A biggie, and outside the scope of this document. For more information about installing, see the online documentation.

Web sever

AmiGO does not provide its own web server, it is currently developed and run on Apache. However, there are some experimental components that run their own web server, but it is unlikely this will concern most people trying to install AmiGO.

BLAST

Also, if you are interested having the AmiGO wrapper for BLAST, you will need to download and install WU BLAST. You will also need a FASTA file from the Stanford GO archive.

GraphViz

AmiGO also depends on having GraphViz in its execution path--specifically the "dot" program.

Perl

As far as the perl environment goes, major packages that are necessary are: CGI::Application, GO::TermFinder, Template, CGI, DBI, and DBD::mysql, GraphViz, bioperl, go-perl, and go-db-perl (the last two included with the go-dev repository and are sometimes treated differently because of this--AmiGO is usually run as part of the complete go-dev repository). The vast majority of these should be available in your distribution. Otherwise, you will have to install them through CPAN.

version.pl

There is a script in go-dev/amigo called version.pl that can be used to get a more detailed opinion of what the developers feel is necessary to run AmiGO (there are many old, uneccessary, and experimental libraries that might confuse things). The script must be run from the go-dev/amigo directory.

install.pl

Once you have the software and the requirements met, you just go to the amigo directory and type install.pl (the "-h" flag you will get usage details). Depending on your environment, you may want to first set the following ENV variables:

setenv GO_ROOT <path_to_go-dev_source_dir>
setenv PATH /tools/perl/5.8.8/bin:${PATH}:${GO_ROOT}/go-perl/scripts
setenv PERLLIB $GO_ROOT/go-perl:$GO_ROOT/go-db-perl:$GO_ROOT/amigo/perl

When you run install.pl it asks you around a couple dozen questions about the installation configuration. The questions may be a little cryptic (the "-v" flag can be helpful here). A "config.pl" file will be created in the same directory as installer.pl during a successful installation run and is the basis for all future installation attempts after the first.

The "-r" option will overwrite config.pl if it exists and use its contents as defaults for a new round of interactive questioning.

The "-i" option will ignore config.pl if it exists and use the internal variables as the defaults for a new round of interactive questioning.

The "-f <filname>" option will read in <filename>, write a new config.pl, and continue installation as normal. This is useful if you have multiple AmiGO configurations that you're trying to juggle.

There is also the ability to change config.pl manually and rerunning installer.pl with no arguments at all--by default, it will use whatever is in config.pl. (I typically use this and copy different known good configurations for use with the "-f" option.)

Using the "-v" option in conjunction with any of the other options may give you a better idea of what variables are being targeted and what files are read.

Example config.pl files

While you can construct a config.pl file from scratch, it is highly recommended that you let the script create the file the first time, and then you may modify the values manually and rerun the script.

Be prepared to accept defaults or have an answer for these. Obviously, the path names are the most important.

Example file #1

These are the values used by the production machine at Stanford.

config.pl

$ENV{GO_ROOT}='/share/goweb/www-data/html/dev';
$ENV{GO_DBNAME}='go';
$ENV{GO_DBHOST}='localhost';
$ENV{GO_DBUSER}='amigo';
$ENV{GO_DBAUTH}='HA YOU DID NOT THINK I WOULD JUST EMAIL THE PASSWORD DID YOU'
$ENV{GO_DBSOCKET}='/db0/mysql/admin/golite/mysql.sock';
$ENV{GO_HAS_COUNT_BY_SPECIES}='1';
$ENV{AMIGO_PROJECT_NAME}='amigo';
$ENV{AMIGO_HTDOCS_PARTIAL_PATH}='/share/goweb/www-data/html';
$ENV{AMIGO_HTDOCS_PARTIAL_URL}='http://amigo.geneontology.org/';
$ENV{AMIGO_CGI_PARTIAL_PATH}='/share/goweb/www-data/cgi-bin';
$ENV{AMIGO_CGI_PARTIAL_URL}='http://amigo.geneontology.org/cgi-bin';
$ENV{AMIGO_SHOW_GP_OPTIONS}='1';
$ENV{AMIGO_SHOW_GRAPHVIZ}='1';
$ENV{AMIGO_DOT_PATH}='/usr/bin/dot';
$ENV{AMIGO_SHOW_BLAST}='1';
$ENV{AMIGO_FASTA_DB}='/share/blast/go-seqdblite.fasta';
$ENV{AMIGO_BLASTP}='/tools/wu-blast/current/blastp';
$ENV{AMIGO_BLASTX}='/tools/wu-blast/current/blastx';
$ENV{AMIGO_BLAST_METHOD}='cgi';
$ENV{AMIGO_QSUB}='/usr/local/command';
$ENV{AMIGO_QUEUE}='/usr/local/queue';
$ENV{AMIGO_PBS_USER}='nobody';
$ENV{AMIGO_MAX_SEQ_NUM}='100';
$ENV{AMIGO_MAX_SEQ_LENGTH}='3000000';
$ENV{AMIGO_SHOW_GOOSE_LINKS}='1';
$ENV{AMIGO_USE_DEFAULT_AMIGO_FILTERS}='1';
$ENV{AMIGO_SHOW_ONT_FILTER}='1';
$ENV{AMIGO_SHOW_TAXID_FILTER}='1';
$ENV{AMIGO_SHOW_SPECIESDB_FILTER}='1';
$ENV{AMIGO_SHOW_EVCODE_FILTER}='1';
$ENV{AMIGO_SHOW_GPTYPE_FILTER}='1';
$ENV{AMIGO_SHOW_ASSBY_FILTER}='0';
$ENV{AMIGO_SHOW_QUAL_FILTER}='0';
$ENV{AMIGO_TEMPLATE_PATHS}='templates/pages:templates/includes';
$ENV{AMIGO_SESSION_DIR}='sessions';
$ENV{AMIGO_MAX_SESSIONS}='200';
$ENV{AMIGO_SESSION_TIMEOUT}='7200';
$ENV{AMIGO_PAGE_SIZE}='50';
$ENV{AMIGO_MAX_RESULTS_HTML}='2000';
$ENV{AMIGO_MAX_RESULTS_DOWNLOAD}='20000';
$ENV{AMIGO_CALCULATE_GP_COUNTS}='0';
$ENV{AMIGO_CALCULATE_TERM_COUNTS}='0';
$ENV{AMIGO_GET_RELEVANCE}='1';
$ENV{AMIGO_CLEVER_MODE}='1';
$ENV{AMIGO_OBSOLETE_BEHAVIOUR}='include_commented';

Example file #2

These are the values used by one of the developer's at Berkeley.

config.pl

$ENV{GO_ROOT}='/users/sjcarbon/local/src/cvs/go-dev';
$ENV{GO_DBNAME}='go_latest_lite';
$ENV{GO_DBHOST}='spitz';
$ENV{GO_DBUSER}=;
$ENV{GO_DBAUTH}=;
$ENV{GO_DBSOCKET}=;
$ENV{GO_HAS_COUNT_BY_SPECIES}='1';
$ENV{AMIGO_PROJECT_NAME}='amigo';
$ENV{AMIGO_HTDOCS_PARTIAL_PATH}='/www/toy_9012/htdocs';
$ENV{AMIGO_HTDOCS_PARTIAL_URL}='http://toy.lbl.gov:9012';
$ENV{AMIGO_CGI_PARTIAL_PATH}='/www/toy_9012/cgi-bin';
$ENV{AMIGO_CGI_PARTIAL_URL}='http://toy.lbl.gov:9012/cgi-bin';
$ENV{AMIGO_DATA_PATH}='/www/toy_9012/cgi-bin';
$ENV{AMIGO_SHOW_GP_OPTIONS}='1';
$ENV{AMIGO_SHOW_GRAPHVIZ}='1';
$ENV{AMIGO_DOT_PATH}='/usr/bin/dot';
$ENV{AMIGO_SHOW_BLAST}='1';
$ENV{AMIGO_FASTA_DB}='/www/toy_9012/cgi-bin/data/go_20071106-seqdblite.fasta';
$ENV{AMIGO_BLASTP}='/share/bdgp64/wublast/blastp';
$ENV{AMIGO_BLASTX}='/share/bdgp64/wublast/blastx';
$ENV{AMIGO_BLASTN}='/share/bdgp64/wublast/blastn';
$ENV{AMIGO_BLAST_METHOD}='cgi';
$ENV{AMIGO_QSUB}='/usr/local/command';
$ENV{AMIGO_QUEUE}='/usr/local/queue';
$ENV{AMIGO_PBS_USER}='nobody';
$ENV{AMIGO_MAX_SEQ_NUM}='100';
$ENV{AMIGO_MAX_SEQ_LENGTH}='3000000';
$ENV{AMIGO_USE_DEFAULT_AMIGO_FILTERS}='1';
$ENV{AMIGO_SHOW_ONT_FILTER}='1';
$ENV{AMIGO_SHOW_TAXID_FILTER}='1';
$ENV{AMIGO_SHOW_SPECIESDB_FILTER}='1';
$ENV{AMIGO_SHOW_EVCODE_FILTER}='1';
$ENV{AMIGO_SHOW_GPTYPE_FILTER}='1';
$ENV{AMIGO_SHOW_ASSBY_FILTER}='0';
$ENV{AMIGO_SHOW_QUAL_FILTER}='0';
$ENV{AMIGO_TEMPLATE_PATHS}='templates/pages:templates/includes';
$ENV{AMIGO_SESSION_DIR}='sessions';
$ENV{AMIGO_MAX_SESSIONS}='200';
$ENV{AMIGO_SESSION_TIMEOUT}='7200';
$ENV{AMIGO_PAGE_SIZE}='50';
$ENV{AMIGO_MAX_RESULTS_PAGES}='40';
$ENV{AMIGO_CALCULATE_GP_COUNTS}='0';
$ENV{AMIGO_CALCULATE_TERM_COUNTS}='0';
$ENV{AMIGO_GET_RELEVANCE}='1';
$ENV{AMIGO_CLEVER_MODE}='1';
$ENV{AMIGO_OBSOLETE_BEHAVIOUR}='include_commented';

Variable meanings

Below is a list of meanings for some of the more important AmiGO variables.

  • GO_ROOT : The location of the local go-dev repository.
  • GO_DBNAME
  • GO_DBHOST
  • GO_DBUSER
  • GO_DBAUTH
  • GO_DBSOCKET
  • GO_HAS_COUNT_BY_SPECIES : Whether or not this was done during the GO db installation process.
  • AMIGO_PROJECT_NAME : This will be added to the end of the next four variables to copy files to the proper location and generate URLs. Useful to change if you want multiple AMiGO installations on the same web server.
  • AMIGO_HTDOCS_PARTIAL_PATH : The path to the root htdocs directory (in Apache terminology).
  • AMIGO_HTDOCS_PARTIAL_URL : The URL the above resolves to.
  • AMIGO_CGI_PARTIAL_PATH : The path to the root cgi-bin directory
  • AMIGO_CGI_PARTIAL_URL : The URL the above resolves to.
  • AMIGO_SHOW_GP_OPTIONS
  • AMIGO_SHOW_GRAPHVIZ
  • AMIGO_DOT_PATH : Location of the dot binary
  • AMIGO_SHOW_BLAST
  • AMIGO_FASTA_DB : The location of the downloaded FASTA file.
  • AMIGO_BLASTP
  • AMIGO_BLASTX
  • AMIGO_BLAST_METHOD
  • AMIGO_QSUB
  • AMIGO_QUEUE
  • AMIGO_PBS_USER
  • AMIGO_MAX_SEQ_NUM
  • AMIGO_MAX_SEQ_LENGTH
  • AMIGO_SHOW_GOOSE_LINKS
  • AMIGO_USE_DEFAULT_AMIGO_FILTERS
  • AMIGO_SHOW_ONT_FILTER
  • AMIGO_SHOW_TAXID_FILTER
  • AMIGO_SHOW_SPECIESDB_FILTER
  • AMIGO_SHOW_EVCODE_FILTER
  • AMIGO_SHOW_GPTYPE_FILTER
  • AMIGO_SHOW_ASSBY_FILTER
  • AMIGO_SHOW_QUAL_FILTER
  • AMIGO_TEMPLATE_PATHS
  • AMIGO_SESSION_DIR
  • AMIGO_MAX_SESSIONS
  • AMIGO_SESSION_TIMEOUT
  • AMIGO_PAGE_SIZE
  • AMIGO_MAX_RESULTS_HTML
  • AMIGO_MAX_RESULTS_DOWNLOAD
  • AMIGO_CALCULATE_GP_COUNTS
  • AMIGO_CALCULATE_TERM_COUNTS
  • AMIGO_GET_RELEVANCE
  • AMIGO_CLEVER_MODE
  • AMIGO_OBSOLETE_BEHAVIOUR

Loading an Ontology

There are numerous ways of loading an ontology into a MySQL database for AmiGO to use. Below, two of the most common will be covered. For more detailed information, please see the main GO database pages.

Loading by script

Probably the easiest way of getting a GO database to work with is using a perl script that is provided in the go-dev distribution (see above): go-dev/go-db-perl/scripts/go_db_install.pl. Usage and examples are given by:

go-dev/go-db-perl/scripts/go_db_install.pl -h

For example, the following incantation will load the latest lite database dump into a database called go_latest_lite on localhost:

go-dev/go-db-perl/scripts/go_db_install.pl -i -e go_latest_lite -v -d localhost

The following example loads the latest database dump into a database called go_latest onto localhost:

go-dev/go-db-perl/scripts/go_db_install.pl -v -d localhost

This method is also very easy to put into a crontab.

Direct manual loading

The following instructions can be used to create a GO database for AmiGO to use.

  • Download a database dump from http://archive.geneontology.org/ ; make sure that the file name ends with "-data.gz". In this example, we'll call this file go_200XXXXX-seqdblite-data.gz.
  • Unzip the database dump file.
  • Using your favorite MySQL client, create a database. In this example we'll call it go_2000XXXXX. Using the default MySQL client, the command would be :
CREATE DATABASE go_200XXXXX;
  • From the command line, load the database dump file into the database:
mysql go_200XXXXX < go_200XXXXX-seqdblite-data
  • Done!

Loading Annotations

While the usual GO database dumps found at http://archive.geneontology.org include many useful annotations, users may also load their own annotations into their local GO databases. To accomplish this, the go-dev distribution comes with many different scripts to manage association files. The easiest to use is probably go-dev/go-db-perl/scripts/load-go-into-db.pl .

The following incantation would load a gene association (ga_file.gz) file into the my_go_db database on localhost:

GO_ROOT=/path_to_go-dev/go-dev perl ./load-go-into-db.pl -d my_go_db -h localhost -datatype go_assoc -fill_count ga_file.gz

If the user does not have the perl DBIx::Stag, go-perl, or go-db-perl modules loaded and/or in their path, the same incantation would look like:

 GO_ROOT=/path_to_go-dev/go-dev perl -I /path_to_go-dev/go-dev/go-db-perl -I /path_to_go-dev/go-dev/go-perl -I /path_to_dbixstag/DBIx-DBStag-0.09 ./load-go-into-db.pl -d my_go_db -h localhost -datatype go_assoc -fill_count ga_file.gz

Contacts

If you are still having problems installing the AmiGO software, you can contact the developers directly :

  • Seth at LBNL (sjcarbon) (berkeleybop dot org)
  • Amelia at EBI (aji) (ebi dot ac dot uk)

Good Luck

Good luck!