SWUG:Database changes 2007
Database: New Features
These features have been added in 2007
Documentation
The documentation for the GO database as been revamped. See:
Schema documentation
This replaces the old documentation on godatabase.org (now redirected)
GOOSE
We have created GO Online SQL Environment (GOOSE). This was in response to increased demands for advanced queries that cannot be met by the AmiGO interface, which mostly caters to common queries.
GOOSE allows you to executed arbitrary SQL over any GO database mirror. Expertise in SQL and knowledge of the schema helps; however, GOOSE also includes standard query templates from the example queries page:
This means that GOOSE makes an ideal learning environment for intermediate users
Use of GOOSE will be simplified by the addition of views (see below)
SQL-Views
We have built a large library of SQL views to simplify querying the GO database. Views can be materialized for speed. This will allow us to make future versions of AmiGO faster.
SQL Views can be fairly powerful; for example we have some views for calculating the Information Content of a term based on the number of annotations (see Collaboration_with_MIT_GO-Engineering).
This means we can easily write queries to fetch for example the average information content broken down by database. The following is the IC of all annotations in the refG subset:
SELECT * FROM avg_annotation_entropy_by_annotation_subset_and_source ORDER BY avg_entropy;
reference_genome WB 9.7708568320593 reference_genome RefSeq 10.0933288773331 reference_genome ZFIN 10.6886457671244 reference_genome MGI 10.9633757349792 reference_genome TAIR 11.1110688320685 reference_genome UniProt 11.3526212071035 reference_genome RGD 11.6885303554913 reference_genome DDB 12.154186698224 reference_genome FB 12.6074013048053 reference_genome SGD 12.801913637563 reference_genome GeneDB_Spombe 13.2108204603711
Support for Database xref metadata
aka: GO.xrf_abs
Extensions to the db table
Term subset
GO Slims are now loaded into database
This will allow us to filter-by-subset in AmiGO. This is already being used in the new amigo map2slim interface
gene_product_subset
This allows us to make "slims" of gene products. The most important one will be the reference_genome slim - by tagging these it makes it easier to do analyses and filters on the refG subset.
Support for multi-species interaction Annotations
The majority of gene products act within the organism that encoded them. However, sometimes gene products encoded by one organism can act on or in other organisms. For example in obligate parasitic species, almost all of their gene products will be interacting with another organism, their host. Interactions may also be between organisms of the same species: for example, the proteins used by bacteria to adhere to one another to form a biofilm.
For annotating gene products involved in these multi-organism interactions, there is a special set of biological process terms in the interaction between organisms node.
The species in the interaction can be recorded in an annotation by using terms from this node and entering two taxon IDs in the Taxon column. The first taxon ID should be that of the species encoding the gene product, and the second should be the taxon of the other species in the interaction. Where the interaction is between organisms of the same species, both taxon IDs should be the same. The taxon column of the annotation file is described in more detail in the annotation file format section.
This is stored in the database using the following new table:
Currently we lack multi-organism interaction annotation data
Support for association properties
- Aka structured notes
- Aka annotation-cross-products
See docs on wiki
Awaiting annotation
Consider/replaced_by tags
Stored in term2term_metadata table
Synonym types
Stored in synonym_category_id
Precomputed gene product counts, by species
Previously the gene_product counts were only pre-computed for the annotation database (FlyBase, UniProt, SGD). Many annotation databases cover >1 species.
counts are now pre-computed for species too:
Taxon hierarchy
The species table (which should truly be called the taxon table) has support for taxon hierarchies: species table
We use a nested set model
Later versions of the database will have this populated
Allows us to filter by taxa above the species level - eg kingdom, phylum, .. For example, filter by Viridiplantae