Specific go load

From GO Wiki
Revision as of 20:54, 22 February 2007 by Hitz (talk | contribs)
Jump to navigation Jump to search

This is pseudocode + detailed road map to bulk load associations

Primary approach is to leverage existing go-db-perl/go-perl code and keep as much of loading intact as possible, replacing only the slowest DBStag: portions. Additionally, we have much invested in the loading infrastructre, wrapping scripts, etc., so this is a minimal mutation.

Steps (in rough order of how they are executed)

  • Modify load_assoc method in go-prepare-release.pl to to call new script: go-load-assoc-bulk.pl (based on load-go-into-db)
  • go-load-assoc-bulk.pl will use the go_assoc_parser (GO::Parser), but a new handler, obo_godb_flat.pm)
    • Before the file is parsed, hashes similar to acc2name_h need to be set up, and the last index in dbxref needs to be stored.
  • obo_godb_flat.pm is similar to obo_text.pm, but will write the table files
  • go_assoc_parser fires the following stag events, each has to be assocated with an e_EventName method in obo_godb_flat.pm
    • $self->start_event(DBSET);
    • $self->event(PRODDB, $proddb); - this sets the DBXREF.XREF_DBname for all following gene_products accession (i.e. DBXREF_ID -> SGD or UniProt)... or is this DB table?
    • $self->start_event(PROD) - When this event fires, need to write a line in gene_product table (file), the following colums are accessed via $self->stag_get(product, tag)
      • $self->event(PRODACC, $prodacc); - XREF to DBXREF table
      • $self->event(PRODSYMBOL, $prodsymbol);
      • $self->event(PRODNAME, $prodname)
      • $self->event(PRODTYPE, $prodtype)
      • $self->event(SECONDARY_PRODTAXA, $other[0]); - XREF to species table
      • $self->event(PRODTAXA, $prodtaxa); - XREF to species table
      • $self->event(PRODSYN, $_); GENE_PRODUCT_SYNONYM table
    • $self->start_event(ASSOC); - When this event fires need to write a line in assocation table (file)
      • $self->event(ASSOCDATE, $assocdate);
      • $self->event(SOURCE_DB, $source_db) see DB table XREF?
      • $self->event(TERMACC, $termacc); from acc2id hash
      • $self->event(IS_NOT, $is_not || '0');
    • $self->event(QUALIFIER, $_) - writes to ASSOCIATION_QUALIFIER table file
    • $self->event(ASPECT, $aspect); -- can be tossed I think
    • $self->start_event(EVIDENCE); - writes to EVIDENCE table file
      • $self->event(EVCODE, $evcode);
    • $self->event(WITH, $_) writes to ASSOC_REL
      • $self->event(REF, $_) writes to DBXREF, EVIDENCE_DBXREF