SWUG:Meeting at Stanford 2011 06 23

From GO Wiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
  • Waiting for The Mungall
  • Intro - new database "GOLD"
    • VARCHARs used instead of incrementing primary key
    • ontology schema updated to emulate OWL (not exactly a mirror)
    • schema designed to mirror GAF table "simpler", not as normalized
      • actually "GPAD" format "Gene Product Annotation Data" (i.e, GAF sans Gene Product info
      • files can be dealt with independently
    • No diagram yet ACTION ITEM: Make ERD -- Amelia
    • 3 modules in DB
      • Ontology - more tables than old, less generic. View for inferred relationships.
      • Association
      • Phylogeny (in Grant, sorta nascent)
    • Other ontologies don't use always use RDB - "RDF triple store" ACTION ITEM: If you don't know what that is, google it.
  • Middleware
    • Replace go-db-perl and various friends with Java/Hibernate based ORM.
    • Some functions are hibernate independent - bulk-loader PG loader of tab-delimited files
    • Incremental update "delta" - script to create "delta files" and runs a hibernate script to do CRUD.
    • works for both Ontology and Annotation
    • "Deltas" are actually stored in DB.
    • API can be accessed via command-line scripts or Java servlet interface. Possibly expanded into WebServices
    • Bundled quality control scripts
      • ontology GC (downstream of OBO-Edit)
      • annotation filtering script - some rules hard coded but others are in a QC XML file
    • Files could be submitted by groups and run through QC pipeline
    • QC Web Interface can be used instead of or in addition to flat file management system "CVS"
    • Switching cost to remove CVS? Possible bridging software.
    • Could be done prior to full LEAD->GOLD switchover
    • Issues with non-up-to-date Ontology files, obsolete terms
    • Should we make hard QC checks really hard? How to enforce compliance?
    • Non-compliant evidence code flag? Shows up in AmiGO?
  • Progress Report(s)
    • Seth - SOLR/Lucene (future of AmiGO)
      • Replace all go-perl/go-db-perl with SOLR/Lucene
      • Create lucene indexes from GOLD (parallel processing)
      • will be very fast once indexes are built
      • could have additional "full searches" hooked up to DB
      • similar to quickGO - dumped custom indexing at EBI
      • where do the webservers go?
      • what machines do we need? Probably at least as much
      • lucene can help GoTermFinder with term look ups, Transitive Closure can be stored in memory
    • Shahid - infrastructure
      • see above
    • Craig - update at Stanford
      • have "genome-psquele" machine set up, PG set up. Software installed.
      • Works, not sure what do load etc.
    • Kalpana - GoMine
      • using InterMine datawarehouse
      • ontology and annotation files loaded.
      • working on loading uniprot (shahid has a splitter for this issue)
      • issues with loading many taxa, id mapping
  • Future plans
    • Roadmap for upgrade
      • Minimum functionality needed for clean break - deadline next GO meeting (Nov 7)
        • Amigo2 in beta (includes GOOSE)
        • GOLD feature complete
        • OBOGalaxy released
        • PAINT upgraded to use GOLD
        • Testing env set up at Stanford
    • Hardware architecture things
      • need proposal based on loading times (end August?)
      • front end machines (Amigo, GOOSE, annotation QC), SOLR machines, Indexing machines, GOLD machines
      • virtualization? (BB used xen, but now KVM: more stable, fewer features)
        • mostly Ubuntu and Debian.
        • libvirt python interface
        • bad hardware is bad
  • Demo
  • Other projects
    • PAINT
      • connects to LEAD schema using Hibernate; can be converted to GOLD one presumes
      • ACTION ITEM: find someone to do this and find how long (Chris)
    • TermGenie
      • Uses OWL API to get instantaneous terms requests. runs off a jetty server
      • needs persistence layer to track