Stanford Production, September 2009

From GO Wiki
Jump to navigation Jump to search

GO Production Services @ Stanford – Progress Report Gene Ontology Consortium Meeting, Cambridge UK, Sept 23-24

1.  GO Staff  Gail Binkley, Mike Cherry, Ben Hitz, Eurie Hong, Stuart Miyasato, Shuai Weng, Edith Wong
The SGD group at Stanford is responsible for hosting various production aspects of GO.  Included are:  Maintenance and hosting of the www.geneontology.org and archive.geneontology.org web sites, hosting the AmiGO ontology browsing web server, periodic loading of the GO database, hosting the GO FTP and CVS servers, hosting the wiki.geneontology.org web site, and filtering gene association files supplied by members of the consortium.  www.geneontology.org usage has been relatively constant since the last report at ~7,000 visits/week.  Usage of AmiGO fluctuates between 20,000 and 30,000 visits/week.
2.  Software & Databases

1. AmiGO: Maintenance and support of the production AmiGO web site has been provided by SGD since May 4, 2005. We have made several maintenance patches to fix bugs and fixed long running rogue queries. Currently AmiGO version 1.7 (refGenome+IEAs, and other enhancements) is being tested on the AmiGO development server, and should be ready to go into production right after the meeting, if not before.

2. GO Database: Stanford continues to provide the maintenance and support of the GO relational databases. Loading of the protein sequences was stopped for the monthly go-full due to memory issues that could not be resolved with current programming resources. It was also determined the majority of UniProt sequences were being loaded and it was appropriate for anyone interested in these sequences to have access to the UniProt database. A reasoner has been added to populate the graph_path table, and loading of IEAs (except for the large goa_uniprot file) has been started for go-lite . The IEA annotations from the MOD provided projects will go into production with Amigo 1.7 release.

Future plans include: - improved unit testing procedures - add support for secondary UniProt ids in gp2protein files - export complete protein sets for reference genomes in FASTA format

3. Gene association filters: All submitted gene association files are filtered for errors in content or syntax before being committed to the GO CVS or loaded into the relational database. The filtering program continues to be revised as in the projects standards and format specifications change. The large UniProt file (gene-association.goa_uniprot) has been removed from CVS, as it was too large for CVS. The submitted UniProt GOA file is filtered with as are all GAF. The filtered file is also no longer available from CVS. The associations continue to be loaded into go-full. A new file has been created from the UniProt filtered GAF that has all non-IEA associations removed (gene-association.go_uniprot_noniea). The contents of gene-association.go_uniprot_noniea is loaded into go-lite.

4. Wiki: The GO wiki was upgraded to MediaWiki version (1.15) as it was moved to the new Linux machine (see below). We plan to upgrade MediaWiki (along with PHP) to 1.15.1 this fall.

3. Hardware

GO loading and the AmiGO interface have been installed and are fully functioning on 3 Linux machines. A load balancer splits the AmiGO traffic between two production nodes. A new Linux machine was purchased this year to host the GO admin services, including the non-AmiGO website, FTP site, Wiki, and CVS repository.