Annotation Conf. Call 2016-12-13

From GO Wiki
Jump to navigation Jump to search




Progress Reports

  • Not absolutely required since this is the last year of the current grant, but many groups are doing them anyway
  • 2016 curation groups template on wiki
  • 2016 specific aims reports on Google drive


Use of github for ID authentication

  • Tools like TermGenie and Noctua are now using github for user authentication (Persona system has been retired)
  • Is anyone having trouble with this?
  • There is documentation about this on github
  • Melanie sent an email to go-discuss list



Review USC meeting minutes and action items


  • Infrastructure changes coming for 2017 see Chris' slides
    • Changes to gene association files submissions (GAF, GPAD, GPI)
  • GPI files
    • GPI files will be the preferred source of annotatable entities in Noctua
    • Right now, entities are derived from GPI files first, GAF files second, but if no entity in GAF, then it's not available in Noctua
    • Summary of GPI specs is available on the website


  • Communication between Ontology Editors and Annotators
    • Implement an automatic way to contact groups that have annotations impacted by ontology changes using the group name in the GAF and the contact email provided by the group.
    • github ticket #1465
  • Conference Calls
    • Establish an agenda template for the bimonthly Tuesday ‘annotation’ meetings, circulation of the agenda (timely) with actionable items (Kimberly, David). No longer have Monday LEGO calls.
    • We will resume annotation working group calls on Tuesday, January 10th, 2017
    • Continue Thursday ontology calls (weekly), anyone can join these calls, Paola circulates agenda.
  • Documentation
    • Start moving documentation off the wiki and on to github and the website - Seth will lead
    • Add documentation notices to appropriate github tracker - ontology, annotation, go-site
  • Upcoming Meetings
    • Decide on Montreal or Stanford then PIs to inform Pascale what dates would work for Montreal or Stanford. At this point, Stanford first choice with Montreal in 2018.
    • Cambridge September/October Val to doodle PIs with possible dates in Sept/October.
    • Alternative to Cambridge might be Bar Harbor in fall of 2017 - maybe directly following Acadia Night Sky Festival Sept 21- 24


Transcription Factor Annotations
  • Form a working group to discuss evidence required to support transcription annotations. Email Ruth if interested in joining group (
  • GO editors to create occurs_in relation between RNA polymerase II GO terms and nuclear chromatin (as all RNA pol II transcription occurs in nucleus)
  • GO curators to use regulation of gene expression for annotation, when the only evidence to support annotation is a change in mRNA levels. Therefore, regulation of transcription only applied when there is evidence that the protein regulates transcription.
  • GO curators to always use regulation of RNA polymerase II transcription terms when there is evidence that the transcription factor regulates mRNA levels.
  • Create a decision tree: look at other formats for this, flowchart or columns with choices and include information presented in Norwegian transcription factor paper.
  • github ticket #1463 - assign yourself if you want to be part of the working group
Annotation Quality Control
  • Investigate use of this approach (annotation matrix) for all submissions. Use the matrix to focus on obvious problems on the annotation calls
  • Investigate encoding rules (as disjointness axioms?) that record intersections that are not allowed. This could allow checks to be run on Jenkins => reports. First step: Val provides list of rules.
  • github ticket #1467
  • Introduce a blacklist of SwissProt keywords for particular proteins and/or improved mapping of keywords2GO.
Evidence Codes
  • Add examples of correct use of specific code (include these in usage statements on term in ECO) (David H, Pascale G, Marcus C, TAIR curators)
  • Combinatorial evidence: Decision - everyone in the room (apart from 3 people) agreed that using IDA evidence code to support the application of the ‘integral to synaptic vesicle membrane’ term, based on (e.g.) immunofluorescence demonstrating a protein is located in the ‘synaptic vesicle membrane’, and InterPro annotation confirming the protein is ‘integral to membrane’; i.e. a combination of IDA and IEA evidence. Needs to be documented.
  • DOS to document combinations of terms specified by Synapse project as examples of the types of combinations we are likely to want in future. How the evidence types break down may be informative (e.g. sample prep vs assay)
  • Establish working group to develop issues/requirements for evidence codes
  • Mapping all ECO codes to GO codes
    • Proposal 1: Mass precomposition of 'manual' terms in ECO. Only these are made available in Noctua. ECO will work to make sure that explicit definitions are used in these precomposed terms (Right now the definitions of precomposed terms are much less informative than the terms they are based on).
  • Changes to evidence code usage/documentation/rules based on LEGO representation?
    • IEP for CC annotations?
    • IMP for root node annotations?
Qualifiers and Annotation Extension Relations
  • Add additional QUALIFIERS (GAF/ GPAD column2) to describe relation between a GP and a GO BP; implement in GO curation tools and ingest pipelines
  • Form working group to discuss these and make sure there is documentation on using the correct qualifiers and relations in Noctua and GAFs
  • github ticket #1468
  • Regulates Relations
    • Document examples and build LEGO models to test this (regulation of a process vs regulation of a function).
    • Look at ‘direct activation’ versus ‘direct positive regulation’ Document cases where this distinction might be useful: Val, Pascale, David H, David OS
    • github ticket #70
    • github ticket #12811
  • Using has_input for annotating protein binding
High Throughput Experiments
  • Create a new evidence code HTP (Marcus) - as a parent term to the more specific HTP ECO codes
  • Write guidelines for curating HTP papers (working group to be created - Marcus - Ruth? Pascale?)
  • Identify/implement tool to process data from papers, in particular test for non-proteotypic peptides (Sylvain to provide a paper and contact person)
  • github ticket #1469
GO Slims
  • Need all hands call to discuss creating a new SLIM.
  • Who will be responsible for modifying this and to maintaining.
  • Mary is able creates this automatically but then edits this. Make sure Mary on the call.
  • AGR implications for ribbon display?

Ontology Development and Refactoring

Virus and Phage Terms
  • Revisit viral/phage branch of GO and get that started again (connect Sylvain and Jim Hu with ontology development group)
Ontology Editor Training
  • Schedule formal ontology development training for some curators
Using PAINT to Help Identify Areas of Development/Refactoring
  • PAINT group to discuss how to leverage PAINT for prioritizing areas of ontology development or refactoring
github SOPs
  • Make better use of labels in GitHub + clean up for prioritization of tickets
Implementation of Annotation Updates based on Ontology Changes
  • Use CACAO students to makes fixes (based on ontology changes) - make a list of issues
Design Patterns and LEGO Templates
  • MF refactoring should involve defining combined ontology design patterns and LEGO templates for compound functions. Start from top of major branches and work down to major sub-branches of these. E.g. receptor -> Major receptor types; TF -> major TF types
  • Work on combined design patterns/templates for major receptor types.
  • Co-ordinate with Berkeley to develop implementation of templates in noctua
  • Coordinate with Reactome - get their models : tyrosine kinase, G protein-coupled, nuclear hormone. MAP kinase; TLR signaling (innate immunity). Are these useful as prototypes in each case for defining bounds of that kind of signaling process and identifying the parts of each to be expected as we go from one specific signaling path to the next one within each family?
Modified Protein Binding
  • Need “XXX dependent binding” as in “phospho-dependent binding” (Paul S, Pascale G, Jim H, Sylvain, David OS)
  • github ticket #12787
Modification of BP 'involved in' terms to better align with LEGO
  • Look into Val’s suggestion of replacing GO terms with patterns ‘x [molecular process, for eg. ubiquitination] involved in [biological process, for example exit from mitosis]’ by ‘x [molecular function, for eg. ubiquitin ligase activity] involved in [biological process, for example exit from mitosis]’

GOC Membership

  • Standardise files (need to be discussed on all hands call)

Interactions with External Groups

  • Links to AmiGO in PubMed
  • Check annotations being included in NCBI and Ensembl -Paul


  • On call: Alice, Antonia, Chris, David H, David OS, Edith, George, Giulia, Helen, Jim, Judy, Karen, Kevin, Kimberly, Li, Midori, Olivia, Paola, Paul T., Petra, Sabrina, Sage, Stacia, Stan, Suzi, Terry


  • Not absolutely required right now, but will be needed at end of grant cycle
  • Links are above for document templates


  • github now used for authentication; persona was retired
  • if github authentication is not working for you, please let software team know


  • Will normally slot some time for ontology discussion
  • David H and Kimberly keeping track of issues on a Google spreadsheet

USC Meeting Minutes Action Items and Decisions

Infrastructure changes

  • Major take home message - moving from SVN to github
    • Future call - Chris can give a tour of the .yaml files needed for this
  • gpi files will be needed for entity lists for annotation in Noctua
    • specs on website and github
  • gpad file specs also on github
    • need to reconcile Tony's gpad checks with what is on github specs


  • Automatic alerts to curators when ontology changes necessitate changes to annotations
  • New annotation call template now being used
    • Calls will resume January 10th
  • Ontology calls will continue on Thursdays
    • Annotators are welcome to sit in
  • Issues that come up on these calls will be assigned a github ticket
  • Documentation
  • Start to move documentation from wiki to github and website
  • Top level project in Gene Ontology github
    • Add needed documentation tickets here
    • Documentation requests will be tracked in respective repository (annotation, ontology, go-site)
  • Upcoming meetings
    • Spring meeting - still TBD; go-top working on it


  • Transcription factor annotations
    • Ruth presented a flow chart/decision tree
    • Will form a working group to discuss this further - see github ticket above
    • If you want to be part of the working group, assign yourself to the ticket
    • Note that is you respond, you don't automatically get assigned to the ticket
    • Will need to sort out exactly how the working groups will use github and tickets to manage the work
  • Annotation Quality Control
    • Investigate use of Val's matrix approach
      • Creates GO slims, places each on X and Y axis, annotations should roughly follow along the diagonal, outliers may present annotations or ontology areas that need to be addressed
    • Look into blacklist of associations between SwissProt keyword mappings and specific gene products
  • Evidence Codes
    • Provide evidence for correct usage of evidence codes
    • Think about how much we want to use the full ECO, especially wrt Noctua
      • Is this a tooling issue? Partly; at least three different groups have written scripts to map more granular ECO codes to GO three-letter codes
      • Cross-products can also be requested of ECO - there is a github ticket for this
  • Current evidence code rules do not always fit with LEGO models, so will need to change some of the rules
    • IEP for cellular component annotations?
      • Paul T. - IDA still okay for now; don't want to worry too much about changing the standard here
    • IMP for root node annotations?
  • Qualifiers and Annotation Extension Relations
    • List above to be included in qualifier column of GAF for capturing gene product - GO term relations
    • Also talking about how we use regulates relations
    • Use of has_input for protein binding targets?
  • High Throughput Experiments
    • Create a working group to develop evidence codes and guidelines for how to curate these experiments
  • GO Slims
    • Need to create new slims and new slim gatekeepers
    • Mary Dolan has a script for using distribution analysis to create slims
    • Melanie also has a script for this
    • Would be very helpful to have these scripts on github for others to use


  • Revisit viral and phage areas of the ontology
    • Ongoing work - find github ticket
  • Ontology editor training
    • Schedule training for curators - connect ontology developers with annotators to allow curators to begin to contribute to the ontology
  • Use PAINT to help identify areas for ontology re-factoring
    • Create better and more efficient ways to allow PAINT annotators make requests to ontology editors, or develop ontology themselves
  • Try(ing) to make better use of github and labels in github
  • Use CACAO students to help make fixes in annotations
    • Scenario: big changes to ontology that necessitate annotation changes could be farmed out to CACAO students
  • Design Patterns and LEGO Templates
    • Need this for compound functions
    • David OS starting to pull tickets together for this project - currently a top-level ticket in gene ontology on github
    • Some high-level decisions need to be made, but some problems could be solved, e.g. with transporters, if these patterns can be developed
    • There is a Noctua ticket for LEGO templates with slots for curators to fill in entities
    • Need some tooling and decisions on high-level patterns, but will help LEGO and ontology fit nicely together in ways that failed with use of annotation extensions
    • Coordinate with Reactome
      • May be good to form a working group for this - especially look at signaling pathways, metabolic pathways
      • Start by looking at pathways by hand and then figure out what aspects of this alignment can be automated
  • Modified Protein Binding
    • Can't really go down the path wrt all protein modifications reflect in ontology terms
    • Need to decide how we're going to deal with modified proteins in GO in general
  • Revise ontology terms for X involved in Y
    • Val had a proposal for this
    • github ticket to illustrate this specifically wrt transcription
  • GOC Membership
    • This stemmed from differences in content of GAF files provided by GOA vs the MODs
    • Also touches on high throughput-based annotations
  • Interaction with External Groups
    • Links to AmiGO from PubMed
      • See TAIR example
    • Check GO annotations at NCBI and Ensembl
      • Paul T. ?