Database changes 2007

From GO Wiki
Redirect page
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Redirect to:

Database: New Features

These features have been added in 2007

Note this page is also duplicated on the internal wiki

Documentation

The documentation for the GO database as been revamped. See:

Schema documentation

This replaces the old documentation on godatabase.org (now redirected)

GOOSE

We have created GO Online SQL Environment (GOOSE). This was in response to increased demands for advanced queries that cannot be met by the AmiGO interface, which mostly caters to common queries.

GOOSE allows you to executed arbitrary SQL over any GO database mirror. Expertise in SQL and knowledge of the schema helps; however, GOOSE also includes standard query templates from the example queries page:

This means that GOOSE makes an ideal learning environment for intermediate users

Use of GOOSE will be simplified by the addition of views (see below)

SQL-Views

We have built a large library of SQL views to simplify querying the GO database. Views can be materialized for speed. This will allow us to make future versions of AmiGO faster.

Support for Database xref metadata

aka: GO.xrf_abs

Extensions to the db table

Term subset

GO Slims are now loaded into database

This will allow us to filter-by-subset in AmiGO. This is already being used in the new amigo map2slim interface

gene_product_subset

This allows us to make "slims" of gene products. The most important one will be the reference_genome slim - by tagging these it makes it easier to do analyses and filters on the refG subset.


Support for multi-species interaction Annotations

See GO Annotation Guide

The majority of gene products act within the organism that encoded them. However, sometimes gene products encoded by one organism can act on or in other organisms. For example in obligate parasitic species, almost all of their gene products will be interacting with another organism, their host. Interactions may also be between organisms of the same species: for example, the proteins used by bacteria to adhere to one another to form a biofilm.

For annotating gene products involved in these multi-organism interactions, there is a special set of biological process terms in the interaction between organisms node.

The species in the interaction can be recorded in an annotation by using terms from this node and entering two taxon IDs in the Taxon column. The first taxon ID should be that of the species encoding the gene product, and the second should be the taxon of the other species in the interaction. Where the interaction is between organisms of the same species, both taxon IDs should be the same. The taxon column of the annotation file is described in more detail in the annotation file format section.

This is stored in the database using the following new table:

Currently we lack multi-organism interaction annotation data

Support for annotation properties

  • Aka structured notes
  • Aka annotation-cross-products

See docs on wiki

Awaiting annotation

Consider/replaced_by tags

Stored in term2term_metadata table

Synonym types

Stored in synonym_category_id

Precomputed gene product counts, by species

Previously the gene_product counts were only pre-computed for the annotation database (FlyBase, UniProt, SGD). Many annotation databases cover >1 species.

counts are now pre-computed for species too:

gene_product_count table

Taxon hierarchy

The species table (which should truly be called the taxon table) has support for taxon hierarchies: species table

We use a nested set model

Later versions of the database will have this populated

Allows us to filter by taxa above the species level - eg kingdom, phylum, .. For example, filter by Viridiplantae