AmiGO Manual: Installation 1.5

From GO Wiki
Jump to: navigation, search

Overview

This document is intended for AmiGO version 1.5.*. For version 1.6.* (currently in testing), please check AmiGO_Manual:_Installation_1.6.

AmiGO, at its heart, is a simple perl CGI script. However, behind the simple external interface lies a somewhat baroque system of caches, databases connections, temporary files, and delegation. To set-up all of these things properly, AmiGO provides an installation script ("install.pl") that asks questions about the user's environment and tries to catch the biggest errors that one can make during installation. But no program is perfect.

This document is intended to help fill the gap between the cryptic install script and what the developers know because they wrote it.

Requirements

GO database

A biggie, and outside the scope of this document. For more information about installing, see the online documentation.

Web sever

AmiGO does not provide its own web server, it is currently developed and run on Apache

BLAST

Also, if you are interested having the AmiGO wrapper for BLAST, you will need to download and install WU BLAST. You will also need a FASTA file from the Stanford GO archive.

GraphViz

AmiGO also depends on GraphViz, specifically the "dot" program.

Perl

As far as the perl environment goes, major packages that are necessary are: Go::TermFinder, Template, CGI, DBI, and DBD::mysql, GraphViz, bioperl, go-perl, and go-db-perl (the last two included with the go-dev repository). Hopefully, you can get these through your *NIX's package manager; otherwise: CPAN.

Below is an exhaustive list of all known perl requirements for go-dev and AmiGO. However, not all of the may be necessary for running AmiGO (one of the developers, for example, has never gotten DBIx::DBStag to install on his machine but manages to develop and deploy just fine without it).

Perl version

5.8.0

Perl libraries

AutoSplit;
Bio::DB::SwissProt;
Bio::Index::Swissprot;
Bio::Index::GenBank;
Bio::PrimarySeq;
Bio::SeqIO;
Bio::Species;
Carp;
Config;
CGI;
CGI::Carp;
Data::Dumper;
DBD::mysql;
DBI;
Digest::MD5;
DirHandle;
Exporter;
ExtUtils::MakeMaker;
Fcntl;
File::Basename;
File::Find;
File::Temp;
FileHandle;
FreezeThaw;
FindBin;
GD::Graph::pie;
Getopt::Long;
Getopt::Std;
GO::TermFinder;
GraphViz;
HTML::TableExtract;
HTTP::Cookies;
HTTP::Request;
HTTP::Request::Common;
IO::Handle;
LWP::Simple;
LWP::UserAgent;
Net::FTP;
Net::SMTP;
Shell;
SQL::Translator;
Storable;
strict;
Template;
Term::ReadLine;
Test;
Text::Balanced;
utf8;
warnings;
WWW::Mechanize;
XML::LibXML;
XML::LibXSLT;
XML::Parser::PerlSAX;
XML::Writer;
YAML; 

Download

The most recent stable version of AmiGO should be available as part of the go-dev repository on the GO CVS site at SourceForge.net: http://sourceforge.net/projects/geneontology

Most up-to-date version

If you want to get the most up-to-date version from source, you'll need go-dev/go-perl and go-dev/go-db-perl (HEAD versions from CVS) and the tagged go-dev/amigo amigo_1_5_RC7 (as of this writing).

The source forge CVS Repository can be found at: geneontology.cvs.sourceforge.net:/cvsroot/geneontology go-dev (this is a change from a couple years ago). I do NOT think you need any subdirectories of go-dev other than go-perl, go-db-perl, or amigo, but you do need a go database you can connect to via DBD::mysql (I presume you have this since you have an old amigo install).

install.pl

Once you have the software and the requirements met, you just go to the amigo directory and type install.pl (the "-h" flag you will get usage details). Depending on your environment, you may want to first set the following ENV variables:

setenv GO_ROOT <path_to_go-dev_source_dir>
setenv PATH /tools/perl/5.8.8/bin:${PATH}:${GO_ROOT}/go-perl/scripts
setenv PERLLIB $GO_ROOT/go-perl:$GO_ROOT/go-db-perl:$GO_ROOT/amigo/perl

When you run install.pl it asks you around 20 questions about installation configuration. The questions may be a little cryptic (the "-v" flag can be helpful here). A "config.pl" file will be created in the same directory as installer.pl during a successful installation run and is the basis for all future installation attempts after the first.

The "-r" option will overwrite config.pl if it exists and use its contents as defaults for a new round of interactive questioning.

The "-i" option will ignore config.pl if it exists and use the internal variables as the defaults for a new round of interactive questioning.

The "-f <filname>" option will read in <filename>, write a new config.pl, and continue installation as normal. This is useful if you have multiple AmiGO configurations that you're trying to juggle.

There is also changing config.pl manually and rerunning installer.pl with no arguments at all--by default, it will use whatever is in config.pl. (I typically use this and copy different known good configurations for use with the "-f" option.)

Using the "-v" option in conjunction with any of the other options may give you a better idea of what variables are being targeted and what files are read.


Example config.pl files

While you can construct a config.pl file from scratch, it is highly recommended that you let the script create the file the first time, and then you may modify the values manually and rerun the script.

Be prepared to accept defaults or have an answer for these. Obviously, the path names are the most important.

Example file #1

These are the values used by the production machine at Stanford.

config.pl

$ENV{GO_ROOT}='/share/goweb/www-data/html/dev';
$ENV{GO_DBNAME}='go';
$ENV{GO_DBHOST}='localhost';
$ENV{GO_DBUSER}='amigo';
$ENV{GO_DBAUTH}='NO PASSWORD FOR YOU'
$ENV{GO_DBSOCKET}='/db0/mysql/admin/golite/mysql.sock';
$ENV{GO_HAS_COUNT_BY_SPECIES}='1';
$ENV{AMIGO_PROJECT_NAME}='amigo';
$ENV{AMIGO_HTDOCS_PARTIAL_PATH}='/share/goweb/www-data/html';
$ENV{AMIGO_HTDOCS_PARTIAL_URL}='http://amigo.geneontology.org/';
$ENV{AMIGO_CGI_PARTIAL_PATH}='/share/goweb/www-data/cgi-bin';
$ENV{AMIGO_CGI_PARTIAL_URL}='http://amigo.geneontology.org/cgi-bin';
$ENV{AMIGO_SHOW_GP_OPTIONS}='1';
$ENV{AMIGO_SHOW_GRAPHVIZ}='1';
$ENV{AMIGO_DOT_PATH}='/usr/bin/dot';
$ENV{AMIGO_SHOW_BLAST}='1';
$ENV{AMIGO_FASTA_DB}='/share/blast/go-seqdblite.fasta';
$ENV{AMIGO_BLASTP}='/tools/wu-blast/current/blastp';
$ENV{AMIGO_BLASTX}='/tools/wu-blast/current/blastx';
$ENV{AMIGO_BLAST_METHOD}='cgi';
$ENV{AMIGO_QSUB}='/usr/local/command';
$ENV{AMIGO_QUEUE}='/usr/local/queue';
$ENV{AMIGO_PBS_USER}='nobody';
$ENV{AMIGO_MAX_SEQ_NUM}='100';
$ENV{AMIGO_MAX_SEQ_LENGTH}='3000000';
$ENV{AMIGO_SHOW_GOOSE_LINKS}='1';
$ENV{AMIGO_USE_DEFAULT_AMIGO_FILTERS}='1';
$ENV{AMIGO_SHOW_ONT_FILTER}='1';
$ENV{AMIGO_SHOW_TAXID_FILTER}='1';
$ENV{AMIGO_SHOW_SPECIESDB_FILTER}='1';
$ENV{AMIGO_SHOW_EVCODE_FILTER}='1';
$ENV{AMIGO_SHOW_GPTYPE_FILTER}='1';
$ENV{AMIGO_SHOW_ASSBY_FILTER}='0';
$ENV{AMIGO_SHOW_QUAL_FILTER}='0';
$ENV{AMIGO_TEMPLATE_PATHS}='templates/pages:templates/includes';
$ENV{AMIGO_SESSION_DIR}='sessions';
$ENV{AMIGO_MAX_SESSIONS}='200';
$ENV{AMIGO_SESSION_TIMEOUT}='7200';
$ENV{AMIGO_PAGE_SIZE}='50';
$ENV{AMIGO_MAX_RESULTS_HTML}='2000';
$ENV{AMIGO_MAX_RESULTS_DOWNLOAD}='20000';
$ENV{AMIGO_CALCULATE_GP_COUNTS}='0';
$ENV{AMIGO_CALCULATE_TERM_COUNTS}='0';
$ENV{AMIGO_GET_RELEVANCE}='1';
$ENV{AMIGO_CLEVER_MODE}='1';
$ENV{AMIGO_OBSOLETE_BEHAVIOUR}='include_commented';

Example file #2

These are the values used by one of the developer's at Berkeley.

config.pl

$ENV{GO_ROOT}='/users/sjcarbon/local/src/cvs/go-dev';
$ENV{GO_DBNAME}='go_latest_lite';
$ENV{GO_DBHOST}='spitz';
$ENV{GO_DBUSER}=;
$ENV{GO_DBAUTH}=;
$ENV{GO_DBSOCKET}=;
$ENV{GO_HAS_COUNT_BY_SPECIES}='1';
$ENV{AMIGO_PROJECT_NAME}='amigo';
$ENV{AMIGO_HTDOCS_PARTIAL_PATH}='/www/toy_9012/htdocs';
$ENV{AMIGO_HTDOCS_PARTIAL_URL}='http://toy.lbl.gov:9012';
$ENV{AMIGO_CGI_PARTIAL_PATH}='/www/toy_9012/cgi-bin';
$ENV{AMIGO_CGI_PARTIAL_URL}='http://toy.lbl.gov:9012/cgi-bin';
$ENV{AMIGO_DATA_PATH}='/www/toy_9012/cgi-bin';
$ENV{AMIGO_SHOW_GP_OPTIONS}='1';
$ENV{AMIGO_SHOW_GRAPHVIZ}='1';
$ENV{AMIGO_DOT_PATH}='/usr/bin/dot';
$ENV{AMIGO_SHOW_BLAST}='1';
$ENV{AMIGO_FASTA_DB}='/www/toy_9012/cgi-bin/data/go_20071106-seqdblite.fasta';
$ENV{AMIGO_BLASTP}='/share/bdgp64/wublast/blastp';
$ENV{AMIGO_BLASTX}='/share/bdgp64/wublast/blastx';
$ENV{AMIGO_BLASTN}='/share/bdgp64/wublast/blastn';
$ENV{AMIGO_BLAST_METHOD}='cgi';
$ENV{AMIGO_QSUB}='/usr/local/command';
$ENV{AMIGO_QUEUE}='/usr/local/queue';
$ENV{AMIGO_PBS_USER}='nobody';
$ENV{AMIGO_MAX_SEQ_NUM}='100';
$ENV{AMIGO_MAX_SEQ_LENGTH}='3000000';
$ENV{AMIGO_USE_DEFAULT_AMIGO_FILTERS}='1';
$ENV{AMIGO_SHOW_ONT_FILTER}='1';
$ENV{AMIGO_SHOW_TAXID_FILTER}='1';
$ENV{AMIGO_SHOW_SPECIESDB_FILTER}='1';
$ENV{AMIGO_SHOW_EVCODE_FILTER}='1';
$ENV{AMIGO_SHOW_GPTYPE_FILTER}='1';
$ENV{AMIGO_SHOW_ASSBY_FILTER}='0';
$ENV{AMIGO_SHOW_QUAL_FILTER}='0';
$ENV{AMIGO_TEMPLATE_PATHS}='templates/pages:templates/includes';
$ENV{AMIGO_SESSION_DIR}='sessions';
$ENV{AMIGO_MAX_SESSIONS}='200';
$ENV{AMIGO_SESSION_TIMEOUT}='7200';
$ENV{AMIGO_PAGE_SIZE}='50';
$ENV{AMIGO_MAX_RESULTS_PAGES}='40';
$ENV{AMIGO_CALCULATE_GP_COUNTS}='0';
$ENV{AMIGO_CALCULATE_TERM_COUNTS}='0';
$ENV{AMIGO_GET_RELEVANCE}='1';
$ENV{AMIGO_CLEVER_MODE}='1';
$ENV{AMIGO_OBSOLETE_BEHAVIOUR}='include_commented';

Variable meanings

Below is a list of meanings for some of the more important AmiGO variables.

  • GO_ROOT : The location of the local go-dev repository.
  • GO_DBNAME
  • GO_DBHOST
  • GO_DBUSER
  • GO_DBAUTH
  • GO_DBSOCKET
  • GO_HAS_COUNT_BY_SPECIES : Whether or not this was done during the GO db installation process.
  • AMIGO_PROJECT_NAME : This will be added to the end of the next four variables to copy files to the proper location and generate URLs. Useful to change if you want multiple AMiGO installations on the same web server.
  • AMIGO_HTDOCS_PARTIAL_PATH : The path to the root htdocs directory (in Apache terminology).
  • AMIGO_HTDOCS_PARTIAL_URL : The URL the above resolves to.
  • AMIGO_CGI_PARTIAL_PATH : The path to the root cgi-bin directory
  • AMIGO_CGI_PARTIAL_URL : The URL the above resolves to.
  • AMIGO_SHOW_GP_OPTIONS
  • AMIGO_SHOW_GRAPHVIZ
  • AMIGO_DOT_PATH : Location of the dot binary
  • AMIGO_SHOW_BLAST
  • AMIGO_FASTA_DB : The location of the downloaded FASTA file.
  • AMIGO_BLASTP
  • AMIGO_BLASTX
  • AMIGO_BLAST_METHOD
  • AMIGO_QSUB
  • AMIGO_QUEUE
  • AMIGO_PBS_USER
  • AMIGO_MAX_SEQ_NUM
  • AMIGO_MAX_SEQ_LENGTH
  • AMIGO_SHOW_GOOSE_LINKS
  • AMIGO_USE_DEFAULT_AMIGO_FILTERS
  • AMIGO_SHOW_ONT_FILTER
  • AMIGO_SHOW_TAXID_FILTER
  • AMIGO_SHOW_SPECIESDB_FILTER
  • AMIGO_SHOW_EVCODE_FILTER
  • AMIGO_SHOW_GPTYPE_FILTER
  • AMIGO_SHOW_ASSBY_FILTER
  • AMIGO_SHOW_QUAL_FILTER
  • AMIGO_TEMPLATE_PATHS
  • AMIGO_SESSION_DIR
  • AMIGO_MAX_SESSIONS
  • AMIGO_SESSION_TIMEOUT
  • AMIGO_PAGE_SIZE
  • AMIGO_MAX_RESULTS_HTML
  • AMIGO_MAX_RESULTS_DOWNLOAD
  • AMIGO_CALCULATE_GP_COUNTS
  • AMIGO_CALCULATE_TERM_COUNTS
  • AMIGO_GET_RELEVANCE
  • AMIGO_CLEVER_MODE
  • AMIGO_OBSOLETE_BEHAVIOUR

Loading an Ontology

There are numerous ways of loading an ontology into a MySQL database for AmiGO to use. Below, two of the most common will be covered. For more detailed information, please see the main GO database pages.

Loading by script

Probably the easiest way of getting a GO database to work with is using a perl script that is provided in the go-dev distribution (see above): go-dev/go-db-perl/scripts/go_db_install.pl. Usage and examples are given by:

go-dev/go-db-perl/scripts/go_db_install.pl -h

For example, the following incantation will load the latest lite database dump into a database called go_latest_lite on localhost:

go-dev/go-db-perl/scripts/go_db_install.pl -i -e go_latest_lite -v -d localhost

The following example loads the latest database dump into a database called go_latest onto localhost:

go-dev/go-db-perl/scripts/go_db_install.pl -v -d localhost

This method is also very easy to put into a crontab.

Direct manual loading

The following instructions can be used to create a GO database for AmiGO to use.

  • Download a database dump from http://archive.geneontology.org/ ; make sure that the file name ends with "-data.gz". In this example, we'll call this file go_200XXXXX-seqdblite-data.gz.
  • Unzip the database dump file.
  • Using your favorite MySQL client, create a database. In this example we'll call it go_2000XXXXX. Using the default MySQL client, the command would be :
CREATE DATABASE go_200XXXXX;
  • From the command line, load the database dump file into the database:
mysql go_200XXXXX < go_200XXXXX-seqdblite-data
  • Done!

Loading Annotations

While the usual GO database dumps found at http://archive.geneontology.org include many useful annotations, users may also load their own annotations into their local GO databases. To accomplish this, the go-dev distribution comes with many different scripts to manage association files. The easiest to use is probably go-dev/go-db-perl/scripts/load-go-into-db.pl .

The following incantation would load a gene association (ga_file.gz) file into the my_go_db database on localhost:

GO_ROOT=/path_to_go-dev/go-dev perl ./load-go-into-db.pl -d my_go_db -h localhost -datatype go_assoc -fill_count ga_file.gz

If the user does not have the perl DBIx::Stag, go-perl, or go-db-perl modules loaded and/or in their path, the same incantation would look like:

 GO_ROOT=/path_to_go-dev/go-dev perl -I /path_to_go-dev/go-dev/go-db-perl -I /path_to_go-dev/go-dev/go-perl -I /path_to_dbixstag/DBIx-DBStag-0.09 ./load-go-into-db.pl -d my_go_db -h localhost -datatype go_assoc -fill_count ga_file.gz

Contacts

If you are still having problems installing the AmiGO software, you can contact the developers directly :

  • Seth at LBNL (sjcarbon) (berkeleybop dot org)
  • Amelia at EBI (aji) (ebi dot ac dot uk)

Good Luck

Good luck!