SWUG:Meeting at Stanford 2011 06 23

Waiting for The Mungall
Intro - new database "GOLD"
- VARCHARs used instead of incrementing primary key
- ontology schema updated to emulate OWL (not exactly a mirror)
- schema designed to mirror GAF table "simpler", not as normalized
  - actually "GPAD" format "Gene Product Annotation Data" (i.e, GAF sans Gene Product info
  - files can be dealt with independently
- No diagram yet ACTION ITEM: Make ERD -- Amelia
- 3 modules in DB
  - Ontology - more tables than old, less generic. View for inferred relationships.
  - Association
  - Phylogeny (in Grant, sorta nascent)
- Other ontologies don't use always use RDB - "RDF triple store" ACTION ITEM: If you don't know what that is, google it.
Middleware
- Replace go-db-perl and various friends with Java/Hibernate based ORM.
- Some functions are hibernate independent - bulk-loader PG loader of tab-delimited files
- Incremental update "delta" - script to create "delta files" and runs a hibernate script to do CRUD.
- works for both Ontology and Annotation
- "Deltas" are actually stored in DB.
- API can be accessed via command-line scripts or Java servlet interface. Possibly expanded into WebServices
- Bundled quality control scripts
  - ontology GC (downstream of OBO-Edit)
  - annotation filtering script - some rules hard coded but others are in a QC XML file
- Files could be submitted by groups and run through QC pipeline
- QC Web Interface can be used instead of or in addition to flat file management system "CVS"
- Switching cost to remove CVS? Possible bridging software.
- Could be done prior to full LEAD->GOLD switchover
- Issues with non-up-to-date Ontology files, obsolete terms
- Should we make hard QC checks really hard? How to enforce compliance?
- Non-compliant evidence code flag? Shows up in AmiGO?
Progress Report(s)
- Seth - SOLR/Lucene (future of AmiGO)
  - Replace all go-perl/go-db-perl with SOLR/Lucene
  - Create lucene indexes from GOLD (parallel processing)
  - will be very fast once indexes are built
  - could have additional "full searches" hooked up to DB
  - similar to quickGO - dumped custom indexing at EBI
  - where do the webservers go?
  - what machines do we need? Probably at least as much
  - lucene can help GoTermFinder with term look ups, Transitive Closure can be stored in memory
- Shahid - infrastructure
  - see above
- Craig - update at Stanford
  - have "genome-psquele" machine set up, PG set up. Software installed.
  - Works, not sure what do load etc.
- Kalpana - GoMine
  - using InterMine datawarehouse
  - ontology and annotation files loaded.
  - working on loading uniprot (shahid has a splitter for this issue)
  - issues with loading many taxa, id mapping
Future plans
- Roadmap for upgrade
  - Minimum functionality needed for clean break - deadline next GO meeting (Nov 7)
    - Amigo2 in beta (includes GOOSE)
    - GOLD feature complete
    - OBOGalaxy released
    - PAINT upgraded to use GOLD
    - Testing env set up at Stanford
- Hardware architecture things
  - need proposal based on loading times (end August?)
  - front end machines (Amigo, GOOSE, annotation QC), SOLR machines, Indexing machines, GOLD machines
  - virtualization? (BB used xen, but now KVM: more stable, fewer features)
    - mostly Ubuntu and Debian.
    - libvirt python interface
    - bad hardware is bad
Demo
Other projects
- PAINT
  - connects to LEAD schema using Hibernate; can be converted to GOLD one presumes
  - ACTION ITEM: find someone to do this and find how long (Chris)
- TermGenie
  - Uses OWL API to get instantaneous terms requests. runs off a jetty server
  - needs persistence layer to track

SWUG:Meeting at Stanford 2011 06 23

Navigation menu