Production Progress Report for October 2008
Software and Databases
Maintenance and support of the production AmiGO web site and has been provided by SGD beginning on May 4, 2005. We have made several maintenance patches that fixed bugs and eliminated long running rogue queries. However the two production servers are monitored via automatic scripts to restart the database as a method to eliminate high load as a result of huge queries. The database is restarted about once every two days. We continue to support AmiGO on a new development server to allow more efficient testing and deployment to productions servers.
Maintenance and support of the GO relational databases supported for the entire year. Bulk loading for both associations and sequences has been added to production in March of 2008. Sequence database loading for the monthly go-full was removed due to memory issues that could not be resolved with current programming resources. Other improvements since April include full support of dual-taxa associations via the association_species_qualifier table, and the loading of GO.xref_abbr for use in AmiGO. Future plans include:
- Improved unit testing procedures
- Add IEA annotations for projects with species-specific gene association files
- Use of UniProtKB mapping files to avoid problematic and slow NCBI sequence retrieval
- Add support for secondary UniProtKB ids in gp2protein files
- Export complete protein sets for reference genomes in FASTA format
Gene Association Filters
Continued support of association file error validation before being published to the FTP site, Anonymous CVS, and loaded into the relational database. The filtering program is revised and modified continuously to account for changes in standards and format. Changes since April include:
- Reporting of the replaced_by and consider IDs provided in the OBO ontology file
- Update taxids associated with Gramene
- Add support for Escherichia coli project
- Add support for new EXP evidence code
- Provide better usage documentation
The GOC wiki continues to be hosted by the Stanford group.
GO loading and AmiGO have been installed and are fully functioning on 3 Linux machines, with a load balancer to split the AmiGO traffic between two nodes, the third is the development and database loading server.
Usage for the www.geneontology.org domain has been relatively constant since the last report at ~5,000 visits/week. Usage of AmiGO fluctuates around 17,000 visits/week.
Gail Binkley, Ben Hitz, Eurie Hong, Stuart Miyasato, Shuai Weng, Edith Wong, Mike Cherry
The SGD group at Stanford is responsible for hosting various production aspects of GO. Included are: Maintenance and hosting of geneontology.org web site, hosting of the AmiGO ontology browsing server, periodic database loading, file export, Wiki and FTP hosting of the GO database, GO CVS project and Anonymous servers, and filtering/validating gene association files supplied by member consortium projects.