Database changes 2007
- 1 Database: New Features
- 1.1 Documentation
- 1.2 GOOSE
- 1.3 SQL-Views
- 1.4 Support for Database xref metadata
- 1.5 Term subset
- 1.6 gene_product_subset
- 1.7 Support for multi-species interaction Annotations
- 1.8 Support for annotation properties
- 1.9 Consider/replaced_by tags
- 1.10 Synonym types
- 1.11 Precomputed gene product counts, by species
- 1.12 Taxon hierarchy
Database: New Features
These features have been added in 2007
Note this page is also duplicated on the internal wiki
The documentation for the GO database as been revamped. See:
This replaces the old documentation on godatabase.org (now redirected)
We have created GO Online SQL Environment (GOOSE). This was in response to increased demands for advanced queries that cannot be met by the AmiGO interface, which mostly caters to common queries.
GOOSE allows you to executed arbitrary SQL over any GO database mirror. Expertise in SQL and knowledge of the schema helps; however, GOOSE also includes standard query templates from the example queries page:
This means that GOOSE makes an ideal learning environment for intermediate users
Use of GOOSE will be simplified by the addition of views (see below)
We have built a large library of SQL views to simplify querying the GO database. Views can be materialized for speed. This will allow us to make future versions of AmiGO faster.
Support for Database xref metadata
Extensions to the db table
GO Slims are now loaded into database
This will allow us to filter-by-subset in AmiGO. This is already being used in the new amigo map2slim interface
This allows us to make "slims" of gene products. The most important one will be the reference_genome slim - by tagging these it makes it easier to do analyses and filters on the refG subset.
Support for multi-species interaction Annotations
The majority of gene products act within the organism that encoded them. However, sometimes gene products encoded by one organism can act on or in other organisms. For example in obligate parasitic species, almost all of their gene products will be interacting with another organism, their host. Interactions may also be between organisms of the same species: for example, the proteins used by bacteria to adhere to one another to form a biofilm.
For annotating gene products involved in these multi-organism interactions, there is a special set of biological process terms in the interaction between organisms node.
The species in the interaction can be recorded in an annotation by using terms from this node and entering two taxon IDs in the Taxon column. The first taxon ID should be that of the species encoding the gene product, and the second should be the taxon of the other species in the interaction. Where the interaction is between organisms of the same species, both taxon IDs should be the same. The taxon column of the annotation file is described in more detail in the annotation file format section.
This is stored in the database using the following new table:
Currently we lack multi-organism interaction annotation data
Support for annotation properties
- Aka structured notes
- Aka annotation-cross-products
See docs on wiki
Stored in term2term_metadata table
Stored in synonym_category_id
Precomputed gene product counts, by species
Previously the gene_product counts were only pre-computed for the annotation database (FlyBase, UniProt, SGD). Many annotation databases cover >1 species.
counts are now pre-computed for species too:
The species table (which should truly be called the taxon table) has support for taxon hierarchies: species table
We use a nested set model
Later versions of the database will have this populated
Allows us to filter by taxa above the species level - eg kingdom, phylum, .. For example, filter by Viridiplantae