LEGO August 22, 2016

From GO Wiki
Jump to: navigation, search

Bluejeans URL - NOTE NEW MEETING TIME: 8:00am PST

https://bluejeans.com/969313231

Agenda

Training and Documentation

UK Training Session

USC Training Session

  • Looks like the best day is 2016-11-07, the Monday after the consortium meeting
  • Paul T. looking into arranging a room at USC.

Documentation

  • Some comments and edits made to the Quick Start guide
  • Any more comments on Quick Start Guide?

Noctua-Minerva Mailing List

  • Review to make sure everyone who attends these meetings is on the list

Software Updates

NEO Overview and GPI Files

  • Questions, issues still to be sorted out?
    • We have entries for:
      • Genes
      • Proteins
      • Transcripts
      • ncRNAs
      • Protein Complexes
    • Convening on specs for Noctua
      • MODs should include UniProtKB GCRP accession as the db_xref for their gene-level entries
      • Protein, transcript, ncRNAs can include UniProtKB isoform accessions, PRO accessions, ENSEMBL IDs, RNACentral IDs in db_xref field
      • Acessions and IDs in db_xref field will be used for purposes of searching (in Noctua, and possibly also in AmiGO), but annotations in the models will be associated with the primary ID entered in each group's gpi file
      • Human gene products will use UniProtKB accessions as primary IDs
      • Also include HGNC gene IDs in human gpi
    • If groups don't have parent transcript or protein IDs, what ID should be used in Noctua and with what relation?
      • For example, if a curator needs to specify any mRNA transcript of a gene to add context to an MF annotation, should they use:
        • has_input(WB:WBGene00004804) OR has_input_some_product_of (WB:WBGene00004804) OR has_input_some_mRNA_transcript_of (WB:WBGene00004804)
    • How should protein complexes be represented?
  • Next steps - documentation of contents, communication of pipeline to other groups

MGI Meeting Follow Up

  • Review the list of software and annotation issues that were discussed at the MGI training session, June 15th-16th.
  • See the Google doc
  • Some specific follow-up:
    • GAF/GPAD output is probably highest priority
      • Remaining issues:
        • How to handle causal chains
        • Multiple evidence = multiple lines in the GAF
    • Using a limited set of relations in Noctua to make it easier for curators to find what they need github ticket 165
    • What should we use for internal refs that are not in PubMed; GO_REF, MGI:MGI:, J:?
    • We still need to figure out attribution. Right now everything coming in from Noctua has the group attribute GO_Noctua. That means that different groups making annotations to mouse genes cannot be distinguished.

LEGO Relations

  • Some specific issues have come up wrt LEGO relations that need clarification
  • The first concerns what to use to express the relationship between two activities, the acetylcholine transporter and the acetylcholinesterase, to the acetylcholine receptor in the Drosophila memory model
    • The transport and acetylcholinesterase are proposed to regulate the receptor activity by removing acetylcholine from the synaptic cleft, so there is a clear hypothesis about their mechanism of action
    • Currently, we have 'directly inhibits' to express the relation, but curators don't feel this accurately reflects the biology as there isn't a direct physical interaction between the gene products
    • However, use of 'negatively regulates' doesn't quite say enough, as there is a proposed molecular mechanism for the regulation
    • The RO has the relation 'directly provides input for' but does it have one for 'removes input for' (or 'directly removes iput for')? If so, would this be the correct relation to use?
  • The second issue concerns whether we should create additional relations for the 'causally upstream of or within' and 'causally upstream of' to indicate directionality, i.e. positive or negative
    • If we are going to stipulate that some understanding of mechanism is required to select a 'regulates' relationship, then having the directionality in the less granular parent relations would allow curators to still capture the effect of a gene product's activity on a process, even if the mechanism is not yet known.
    • These types of relations will probably be used widely to describe gene product effects based on mutant phenotypes

Minutes

  • On call: Chris, David H., David OS, Giulia, Jim, Judy, Kimberly, Midori, Moni, Paul T., Ruth, Seth, Stacia

Training and Documentation

UK Training Session

  • Reviewed agenda; okay for now, but fluid so we can make changes, if needed
  • Paul, Judy, David, Chris, Kimberly, Moni, Seth will plan to meet for a working dinner on Tuesday night; can meet earlier, too, if everyone has arrived in Hinxton
  • AI: email Claire to confirm schedule and who will be attending from UniProt - DONE

USC Training Session

  • Planning for a one-day training session the Monday (11/7) after the consortium meeting
  • With only one day, this will be a condensed session
  • We may have a number of curators (SGD, WB, etc.) who are completely new to LEGO, so need to think about how best to use our time
  • Sign up is on the meeting logistics page - so far, we have 13 attendees

Quick Start Guide

  • AI:Seth will absorb the Google doc into the github repo
  • This will enable the doc to always be publicly available, while still being editable in git

Noctua-Minerva Mailing List

  • Seth added a few more names to the mailing list
  • Reminder - if you attend the calls, and are not on the mailing list, please get in touch with Seth so he can add your email

Software Updates

NEO Overview and GPI Files

  • Noctua Entity Ontology (NEO) is created from a gpi file from each group, if it exists, and if not, then the GAF
  • Generates a list of, hopefully unique, entities to which annotations can be made
  • Protein complexes can either be IntAct protein complexes or created on the fly
  • For more information, there is a NEO README. Chris - where is this?
  • One outstanding issue is how to reconcile the entities described in papers with the entities available for curation, i.e. a paper may not specify one or many possible protein isoforms. Some groups have generic protein isoforms, many do not. Possible solutions:
    • Add more relations to allow curators to say something like 'enabled_by_some_product_of'
      • This would get the semantics right, but would necessitate adding more relations and that may be harder for curators.
    • Work with MODs, PRO, and UniProt to make sure generic IDs are available
      • A good long-term solution, but would take some time to implement so not immediately viable
    • Handle this in NEO
      • Create generic protein or transcript IDs when generating NEO, e.g. WBGene00004808 -> Protein_Product:WBGene00004808
      • Doing this would allow curators to continue to use the same relations they'd use for annotating gene products, but wouldn't require a lot of upfront work from annotation groups
      • Since the newly minted NEO IDs use, for example, the gene IDs in the gpi file as input, mapping back to those gene IDs in the GPAD/GAF output files would be straightforward
  • AI:Write up documentation for creating gpi files so groups can begin to get these files ready for submission

MGI Meeting Follow Up

Existing Relations List

  • Relations list in the pop-up window will be updated in time for the UK training session
  • More frequently used GO relations will appear at the top of the list with basic parent-child relations apparent
  • This will be hard coded, at least for now
  • Long-term solution will be to have a context-sensitive list of relations appear for specific curation scenarios, e.g. cellular component part_of some anatomical structure, and have the list be driven off of the structure in the RO

References

  • Some groups have internal ID for the GO_REFs. Which should be used?
  • All agreed that GO_REFs are appropriate here.
  • There are plans to move the GO_REFs out of SVN to make for easier editing, and also to get doi's for them.

Attribution

  • Right now, annotations are output with source GO_Noctua
  • This is also an issue when importing existing annotations into Noctua
  • Individual curation groups will want proper attribution
  • Curator-group mappings can be found in the users.yaml file
  • We need to be clear, though, on exactly how individuals associated with multiple groups will be tracked

New Relations

  • From FlyBase annotation consistency exercise, we think we need 'removes_input_for' or 'removes_direct_input_for'
    • We also discussed this in Geneva wrt the metabolic pathways and agreed there that these relations are necessary and will provide symmetry wrt the existing 'provides_input_for' and 'provides_direct_input_for'
  • Also need to add directionality to the casually_upstream_of_or_within relation and its immediate child so that curators can indicate a positive or negative effect
    • This is essential when annotating from mutant phenotypes where curators want to make a statement about how a process is affected, but don't know the mechanism, i.e. how many steps in between the gene product being annotated and the phenotypic outcome