LEGO July 18, 2016
Bluejeans URL
https://bluejeans.com/969313231
Agenda
UK Training Session
- Doodle poll for dates from late-August through late-October
- Thanks to everyone who filled the poll out so far; there is strong interest in a UK training session
- We are converging on a very few dates in September and October, but still need to hear from Claire (back on 7/26), Paul T., Chris, Seth, Tony
- Last week of August/first week of September or second week of September seem to be the best times right now
- Please fill out so we can find a time to meet
Training Videos
- Need to assign responsibility for writing up outlines/scripts
- Berkeley will use Vimeo to create?
Written Documentation
- Need feedback from people who didn't actually write it :-)
- Volunteers?
Software Updates
NEO Overview and GPI Files
- Chris to provide an overview of what NEO is and how it's constructed - try again this week?
- GPI files - examples on Google spreadsheet
- All entries in the spreadsheet now follow gpi file format 1.2
- MGI submitted their new gpi file last week
- Questions, issues still to be sorted out?
- We have entries for:
- Genes
- Proteins
- Transcripts
- ncRNAs
- Protein Complexes
- Need clarification on this: If groups (MODs, AGR members) have internal IDs for proteins or ncRNAs, should they be including UniProtKB and RNAcentral accessions as well? What are the implications, then, for what entities are available for curators to use in Noctua?
- What is the purpose of the db_xref column and how will it be used wrt NEO and Noctua?
- If groups don't have parent transcript or protein IDs, what ID should be used in Noctua and with what relation?
- For example, if a curator needs to specify any mRNA transcript of a gene to add context to an MF annotation, should they use:
- has_input(WB:WBGene00004804) OR has_input_some_product_of (WB:WBGene00004804) OR has_input_some_mRNA_transcript_of (WB:WBGene00004804)
- For example, if a curator needs to specify any mRNA transcript of a gene to add context to an MF annotation, should they use:
- Mapping all IDs in gpi file back to GCRP accession? Can this be done, and if so, how?
- Use case for this: WormBase skn-1 gene and protein identifiers in Google spreadsheet; the GCRP accession for SKN-1 is UniProtKB:P34707
- We have entries for:
WormBase Proposed gpi DB WB Protein ID Symbol Name Syn. Type Taxon WB Parent ID dx_xref WB WP:CE27591 SKN-1 (?) SKN-1, isoform a ?? protein taxon:6239 WB:WBGene00004804 UniProtKB:P34707-1 WB WP:CE49174 SKN-1 (?) SKN-1, isoform d ?? protein taxon:6239 WB:WBGene00004804 UniProtKB:V6CLA3
UniProt GCRP gpi (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/WORM/) UniProtKB P34707 skn-1 Protein skinhead-1 SKN1_CAEEL|skn-1|T19E7.2 protein taxon:6239 db_subset=Swiss-Prot
- Next steps - documentation of contents, communication of pipeline to other groups
MGI Meeting Follow Up
- Review the list of software and annotation issues that were discussed at the MGI training session, June 15th-16th.
- See the Google doc
- Some specific follow-up:
- GAF/GPAD output is probably highest priority
- Remaining issues:
- How to handle causal chains
- Multiple evidence = multiple lines in the GAF
- Remaining issues:
- Using a limited set of relations in Noctua to make it easier for curators to find what they need github ticket 165
- GAF/GPAD output is probably highest priority
Minutes
- On call: Chris, Dan, David H., David O-S., Helen, Kimberly, Midori, Paul T., Sabrina, Seth, Stacia, Suzi
- Regrets: Melanie
UK Training Session
- Still trying to find agreeable dates
- Please fill out poll or ask Kimberly for link if you don't have it
- Tack on extra day or two of Noctua workshop at USC meeting? Will discuss on manager's call on Wednesday, 7/20.
Training Videos
- What would we like to have?
- A concept video like the one seen at Force16 meeting at OHSU
- How-to videos that show people how to use the tool
- ACTION ITEM: review existing videos and decide what we need to do for new ones
Written Documentation
- Need to have detailed documentation, but also a Quick-Start Guide that gets people working without having to do a lot of reading
- David H., Kimberly, and Paul T. will work on a Quick-Start Guide
- Could have both a PowerPoint and videos since people may prefer one format over the other
- Would still be good to have new eyes on the detailed documentation Google
Software Updates
- Textpresso/Capella kickout - Seth working on this; will update github ticket when ready for further testing
NEO Overview and GPI Files
- Tentative proposal:
- For LEGO models, curators will use entity-specific identifiers and semantically correct relations
- This will allow us to express the biology with appropriate semantics
- To confirm:
- What should be in the db_xref column of the gpi and how will it be used in LEGO/Noctua (or other GO pipelines)?
- What will different groups put into their gpi files and how will we handle use of each type of identifier in practice?
- Need to make sure mapping between different IDs and GCPR accessions will still work - important for PAINT