SWUG:Meeting at Stanford 2011 06 23

From GO Wiki
Jump to: navigation, search
  • Waiting for The Mungall
  • Intro - new database "GOLD"
    • VARCHARs used instead of incrementing primary key
    • ontology schema updated to emulate OWL (not exactly a mirror)
    • schema designed to mirror GAF table "simpler", not as normalized
      • actually "GPAD" format "Gene Product Annotation Data" (i.e, GAF sans Gene Product info
      • files can be dealt with independently
    • No diagram yet ACTION ITEM: Make ERD -- Amelia
    • 3 modules in DB
      • Ontology - more tables than old, less generic. View for inferred relationships.
      • Association
      • Phylogeny (in Grant, sorta nascent)
    • Other ontologies don't use always use RDB - "RDF triple store" ACTION ITEM: If you don't know what that is, google it.
  • Middleware
    • Replace go-db-perl and various friends with Java/Hibernate based ORM.
    • Some functions are hibernate independent - bulk-loader PG loader of tab-delimited files
    • Incremental update "delta" - script to create "delta files" and runs a hibernate script to do CRUD.
    • works for both Ontology and Annotation
    • "Deltas" are actually stored in DB.
    • API can be accessed via command-line scripts or Java servlet interface. Possibly expanded into WebServices
    • Bundled quality control scripts
      • ontology GC (downstream of OBO-Edit)
      • annotation filtering script - some rules hard coded but others are in a QC XML file
    • Files could be submitted by groups and run through QC pipeline
    • QC Web Interface can be used instead of or in addition to flat file management system "CVS"
    • Switching cost to remove CVS? Possible bridging software.
    • Could be done prior to full LEAD->GOLD switchover
    • Issues with non-up-to-date Ontology files, obsolete terms
    • Should we make hard QC checks really hard? How to enforce compliance?
    • Non-compliant evidence code flag? Shows up in AmiGO?
  • Progress Report(s)
    • Seth - SOLR/Lucene (future of AmiGO)
      • Replace all go-perl/go-db-perl with SOLR/Lucene
      • Create lucene indexes from GOLD (parallel processing)
      • will be very fast once indexes are built
      • could have additional "full searches" hooked up to DB
      • similar to quickGO - dumped custom indexing at EBI
      • where do the webservers go?
      • what machines do we need? Probably at least as much
      • lucene can help GoTermFinder with term look ups, Transitive Closure can be stored in memory
    • Shahid - infrastructure
      • see above
    • Craig - update at Stanford
      • have "genome-psquele" machine set up, PG set up. Software installed.
      • Works, not sure what do load etc.
    • Kalpana - GoMine
      • using InterMine datawarehouse
      • ontology and annotation files loaded.
      • working on loading uniprot (shahid has a splitter for this issue)
      • issues with loading many taxa, id mapping
  • Future plans
    • Roadmap for upgrade
      • Minimum functionality needed for clean break - deadline next GO meeting (Nov 7)
        • Amigo2 in beta (includes GOOSE)
        • GOLD feature complete
        • OBOGalaxy released
        • PAINT upgraded to use GOLD
        • Testing env set up at Stanford
    • Hardware architecture things
      • need proposal based on loading times (end August?)
      • front end machines (Amigo, GOOSE, annotation QC), SOLR machines, Indexing machines, GOLD machines
      • virtualization? (BB used xen, but now KVM: more stable, fewer features)
        • mostly Ubuntu and Debian.
        • libvirt python interface
        • bad hardware is bad
  • Demo
  • Other projects
    • PAINT
      • connects to LEAD schema using Hibernate; can be converted to GOLD one presumes
      • ACTION ITEM: find someone to do this and find how long (Chris)
    • TermGenie
      • Uses OWL API to get instantaneous terms requests. runs off a jetty server
      • needs persistence layer to track