Database FAQ

From GO Public

Jump to: navigation, search

Questions regarding the querying and installation of the GO database and go-perl. Full documentation on the database can be found at: http://www.geneontology.org/GO.database.shtml

Contents

How do I query the GO database?

For most basic queries on GO, the appropriate interface is AmiGO (http://amigo.geneontology.org). However, there may be certain kinds of queries that are difficult to express through the existing AmiGO interface (or the AmiGO interface may not be convenient for certain bulk queries). If this is the case, a request can be placed to have this functionality added, and/or the database can be queried directly; there are three options for directly querying the GO database:

  1. Query a GO database mirror using GOOSE--the GO Online SQL Environment (http://www.berkeleybop.org/goose)
  2. Connect to an existing GO database mirror and execute SQL queries using local client software (eg SquirrelSQL or a MySQL client).
  3. Download a MySQL dump and load a MySQL instantiation from the dump. Requires installation of MySQL (http://www.mysql.com), both client and server

Although there are examples in GOOSE and on the wiki, all of the above options require a basic understanding of SQL, and of the table structure of the GO database schema. The last two options require the user installing some software on their computer, and is thus most suitable for more advanced users.

For more details, see: http://www.geneontology.org/GO.database.shtml#SQL

How do I mirror or install the GO database?

To "mirror" the GO database, you will need to download the archived data files from the GO FTP site and install go-perl and go-db-perl software modules to load and access the data. All the available documentation is available from http://www.geneontology.org/GO.tools.software-libraries.shtml.

To explain the different types of GO archive data files: the go-termdb files are generated daily and contain just the ontology terms and their relationships. The go-lite files are generated 3 times a week and includes the ontology terms, all gene products and their sequences, except for IEA associations. The go-full files are generated once a month and contains all data, including IEA associations. So depending on your needs, the go-lite data files might suit your purposes and are very current.

If you load the gene-association files directly into a database, rather than use the mysql dump files, please note that we filter the gene-association files to remove duplicate associations before we load the production GO database. This might account for some of the number differences you have found.

If you have questions regarding the go-perl and go-db-perl software, using the go-database mailing list is a good idea since there are many groups who have installed a local version of the GO database who might be able to provide you with advice and suggestions.

How do I install go-perl and go-db-perl?

Here's the quick guide:

Download the packages from CPAN. See http://www.geneontology.org/GO.tools.software-libraries.shtml

install go-perl then go-db-perl

cd go-perl 
perl Makefile.PL 
make test 
make install 

Then the same for go-db-perl. You will be prompted for additional perl modules you'll need to install - these are available from http://www.cpan.org/.

What is the access policy for the GO database?

The GO Database server is a shared resource and thus we require data mining to be performed in a manner that allows others to utilize this resource at the same time. Any activity that mines the GO Database using AmiGO must be controlled so that only one request at a time. You may download and install the database locally. You can also retrieve all the source files that define the data within the database. Details on installing the database locally are available.

For more information please contact the GO helpdesk

How do I find annotations for a gene product in the SQL database?

http://wiki.geneontology.org/index.php/Example_Queries

Why do the IDs in the database not match the GO IDs?

The GO SQL Database employs the common practice of using surrogate IDs for primary keys. These are intended to be internal to the database, and not exposed to the casual user. In addition, they are not stable and will change with each release. For example, the term table has columns including:

  • id -- internal numeric identifier
  • acc -- public GO ID
  • name -- term label

The id column is the primary key for the term table used as a foreign key in tables that link here, such as term2term.

The acc column contains the GO identifier - eg GO:0008150

See also additional notes on the schema

How do I install AmiGO locally?

Full documentation for downloading and installing AmiGO are available here.

MySQL Dumps

These answers pertain to the Database Downloads

I can't parse the *.txt files

The *.txt files must be imported into a MySQL instance using mysqlimport. They are not intended to be loaded into excel or parsed using custom tools.

Personal tools