SWUG:Meeting at Stanford 2011 06 23
Jump to navigation
Jump to search
- Waiting for The Mungall
- Intro - new database "GOLD"
- VARCHARs used instead of incrementing primary key
- ontology schema updated to emulate OWL (not exactly a mirror)
- schema designed to mirror GAF table "simpler", not as normalized
- actually "GPAD" format "Gene Product Annotation Data" (i.e, GAF sans Gene Product info
- files can be dealt with independently
- No diagram yet ACTION ITEM: Make ERD -- Amelia
- 3 modules in DB
- Ontology - more tables than old, less generic. View for inferred relationships.
- Association
- Phylogeny (in Grant, sorta nascent)
- Other ontologies don't use always use RDB - "RDF triple store" ACTION ITEM: If you don't know what that is, google it.
- Middleware
- Replace go-db-perl and various friends with Java/Hibernate based ORM.
- Some functions are hibernate independent - bulk-loader PG loader of tab-delimited files
- Incremental update "delta" - script to create "delta files" and runs a hibernate script to do CRUD.
- works for both Ontology and Annotation
- "Deltas" are actually stored in DB.
- API can be accessed via command-line scripts or Java servlet interface. Possibly expanded into WebServices
- Bundled quality control scripts
- ontology GC (downstream of OBO-Edit)
- annotation filtering script - some rules hard coded but others are in a QC XML file
- Files could be submitted by groups and run through QC pipeline
- QC Web Interface can be used instead of or in addition to flat file management system "CVS"
- Switching cost to remove CVS? Possible bridging software.
- Could be done prior to full LEAD->GOLD switchover
- Issues with non-up-to-date Ontology files, obsolete terms
- Should we make hard QC checks really hard? How to enforce compliance?
- Non-compliant evidence code flag? Shows up in AmiGO?
- Progress Report(s)
- Seth - SOLR/Lucene (future of AmiGO)
- Replace all go-perl/go-db-perl with SOLR/Lucene
- Create lucene indexes from GOLD (parallel processing)
- will be very fast once indexes are built
- could have additional "full searches" hooked up to DB
- similar to quickGO - dumped custom indexing at EBI
- where do the webservers go?
- what machines do we need? Probably at least as much
- lucene can help GoTermFinder with term look ups, Transitive Closure can be stored in memory
- Shahid - infrastructure
- see above
- Craig - update at Stanford
- have "genome-psquele" machine set up, PG set up. Software installed.
- Works, not sure what do load etc.
- Kalpana - GoMine
- using InterMine datawarehouse
- ontology and annotation files loaded.
- working on loading uniprot (shahid has a splitter for this issue)
- issues with loading many taxa, id mapping
- Seth - SOLR/Lucene (future of AmiGO)
- Future plans
- Roadmap for upgrade
- Minimum functionality needed for clean break - deadline next GO meeting (Nov 7)
- Amigo2 in beta (includes GOOSE)
- GOLD feature complete
- OBOGalaxy released
- PAINT upgraded to use GOLD
- Testing env set up at Stanford
- Minimum functionality needed for clean break - deadline next GO meeting (Nov 7)
- Hardware architecture things
- need proposal based on loading times (end August?)
- front end machines (Amigo, GOOSE, annotation QC), SOLR machines, Indexing machines, GOLD machines
- virtualization? (BB used xen, but now KVM: more stable, fewer features)
- mostly Ubuntu and Debian.
- libvirt python interface
- bad hardware is bad
- Roadmap for upgrade
- Demo
- Other projects
- PAINT
- connects to LEAD schema using Hibernate; can be converted to GOLD one presumes
- ACTION ITEM: find someone to do this and find how long (Chris)
- TermGenie
- Uses OWL API to get instantaneous terms requests. runs off a jetty server
- needs persistence layer to track
- PAINT