LEGO August 8, 2016

From GO Wiki
Jump to: navigation, search

Bluejeans URL

https://bluejeans.com/969313231

Agenda

UK Training Session

Training Documenation

  • Kimberly drafted the beginnings of a Quick Start guide
  • If the right idea, can flesh out a bit more, finish up, and create a page to link from Noctua homepage?
  • Also use as guide for what videos to make?

Software Updates

NEO Overview and GPI Files

  • Questions, issues still to be sorted out?
    • We have entries for:
      • Genes
      • Proteins
      • Transcripts
      • ncRNAs
      • Protein Complexes
    • Need clarification on this: If groups (MODs, AGR members) have internal IDs for proteins or ncRNAs, should they be including UniProtKB and RNAcentral accessions as well? What are the implications, then, for what entities are available for curators to use in Noctua?
    • What is the purpose of the db_xref column and how will it be used wrt NEO and Noctua?
    • Mapping all IDs in gpi file back to GCRP accession? Can this be done, and if so, how? Should this be the default db_xref in each groups' gene entry?
    • If groups don't have parent transcript or protein IDs, what ID should be used in Noctua and with what relation?
      • For example, if a curator needs to specify any mRNA transcript of a gene to add context to an MF annotation, should they use:
        • has_input(WB:WBGene00004804) OR has_input_some_product_of (WB:WBGene00004804) OR has_input_some_mRNA_transcript_of (WB:WBGene00004804)
      • Use case for this: WormBase skn-1 gene and protein identifiers in Google spreadsheet; the GCRP accession for SKN-1 is UniProtKB:P34707
 WormBase Proposed gpi:
 DB    DBID           Symbol        Name               Syn.    Type    Taxon           WB Parent ID            dx_xref
 WB    WBGene00004804 skn-1         skn-1                      gene    taxon:6239                             UniProtKB:P34707
 WB    T19E7.2a       skn-1         skn-1, isoform a           transcript taxon:6239   WB:WBGene00004804      ????
 WB	WP:CE27591     SKN-1 (?)     SKN-1, isoform a		protein	taxon:6239	WB:WBGene00004804      UniProtKB:P34707-1
 WB	WP:CE49174     SKN-1 (?)     SKN-1, isoform d		protein	taxon:6239	WB:WBGene00004804      UniProtKB:V6CLA3
 UniProt GCRP gpi (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/WORM/)
 UniProtKB	P34707	skn-1	Protein skinhead-1	SKN1_CAEEL|skn-1|T19E7.2	protein	taxon:6239		 db_subset=Swiss-Prot
  • Should UniProt add MOD gene IDs as db_xrefs for the GCRP gpi file (and also the isoform gpi file)?
  • Next steps - documentation of contents, communication of pipeline to other groups

MGI Meeting Follow Up

  • Review the list of software and annotation issues that were discussed at the MGI training session, June 15th-16th.
  • See the Google doc
  • Some specific follow-up:
    • GAF/GPAD output is probably highest priority
      • Remaining issues:
        • How to handle causal chains
        • Multiple evidence = multiple lines in the GAF
    • Using a limited set of relations in Noctua to make it easier for curators to find what they need github ticket 165

Minutes

  • On call: Chris, David H., Jim, Judy, Kimberly, Melanie, Midori, Sabrina, Seth, Stacia

UK Training Session

  • Dates set for training session at EBI - 8/31 - 9/2
  • Will include curators from UniProt, FlyBase, UCL, and PomBase

Training Documenation

  • Draft of Quick Start Guide on Google docs
  • Still needs info on GAF/GPAD output
  • Starting point for videos?

Software Updates

  • Jim Balhoff now officially joining the Berkeley team
    • Will work on ontology and minerva (Noctua back-end stuff)
  • Judy raised the issue of mappings between Uberon and EMAPA anatomy terms
    • Noctua uses Uberon, but MGI uses EMAPA terms in their annotation files, so we need to make sure that the xrefs are appropriate and complete; Terry at MGI is working on this
    • Ultimately, the plan is to expand on the autocomplete functionality so that term info, e.g., term definitions, synonyms, xrefs, etc. for ontology terms can be viewed by the user to help with selecting the correct term
  • More generally, plans are to employ taxon restrictions in Noctua so that curators cannot choose incorrect terms for annotation

NEO Overview and GPI Files

  • Still discussing issues about what IDs can be used to search and annotate?
  • Search can make use of db_xrefs so curators can search on both MOD gene IDs and UniProtKB GCRP accessions
  • However, in the OWL models and display, annotations will be associated with the IDs from the appropriate MOD resource or UniProtKB, in the case of human
  • We also discussed the issues surrounding the semantics in the OWL models where we need/want to be precise about how we represent the biology
  • To accommodate the reality of the literature, we will likely need a two-pronged approach where we include relations that allow curators to use a more generic ID but still get the semantics right (e.g. activity enabled_by_some_product_of gene ID), but also, wherever possible, use IDs that represent the entity to which the annotation should be made (e.g. a specific protein isoform)
    • Relations - Do we run the risk of having too many and confusing curators?
    • Entities - Does the correct ID exist for all entities in a given organism?
  • PRO seems to handle the entity issue correctly in that it distinguishes generic proteins, generic isoforms, and specific isoforms
    • How many organisms have systematic data exchange vs one-off requests with PRO that would enable use of these generic IDs?
  • For UniProtKB GCPRs, how many accessions, lacking a dash number, indicate a superclass of protein products vs a specific protein?
  • For gpi files:
    • MOD gene ID lines should xref UniProtKB GCRPs
    • Conversely, we propose that the UniProtKB GCRP gpi files xref, where available, a MOD gene ID - this would result in symmetrical xrefs between these two types of resources
    • transcript entries in gpi files can reference things like Ensembl transcript IDs; this could help with broadening AmiGO search capabilities
    • ncRNAs could reference RNAcentral IDs
    • protein entries should xref to the UniProtKB accession
    • TrEMBL records also need to be associated with a parent ID in the gpi - this may require work between a MOD and UniProtKB
    • we did not discuss how IntAct protein complexes should xref in the gpi file, but we need to include this in the gpi documentation

MGI Meeting Follow Up

  • MGI is testing upload of GPAD files from Jenkins
  • MGI curators are working on creating one LEGO model a week
    • Still need to have the reasoner generate the appropriate regulates annotations
    • Still need multiple GAF lines when there is multiple evidence
    • List of relations needs to be pruned for easier use
      • Ideally we want a context-sensitive list, but even a shorter, TermGenie-style menu might be helpful
      • There is a github ticket for this issue: github ticket 165