Annotation Conf. Call 2017-03-14: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
 
(27 intermediate revisions by 2 users not shown)
Line 9: Line 9:


== Software, Infrastructure ==
== Software, Infrastructure ==
*GO-Help, Ontology migration to github, Jenkins pipeline - Chris, Seth, Moni
*GO-Help, Ontology migration to github, Jenkins pipeline - Chris, Seth, Moni, Eric
** [https://drive.google.com/open?id=0B8kRPmmvPJU3S1Fwck9CTzNEUVE GO Consortium Meeting 2016 overview]
** https://github.com/geneontology/go-site/blob/master/metadata/datasets/
*Migration of annotation files to github at some point in the future?
*Migration of annotation files to github at some point in the future?


== Annotation Review ==
== Annotation Review ==
*Review of IMP annotations for possible use of new qualifiers
*Review of IMP annotations for possible use of new qualifiers
**What is the scope of the issue?  Are there general principles we can derive to help with applying new qualifiers to legacy annotations?
**On the [http://wiki.geneontology.org/index.php/Annotation_Conf._Call_2017-02-14 February 14th call], we discussed adding new qualifiers to describe the relationship between a gene/gene product and a GO BP term
**Right now, the default relation between a gene/gene product and a GO BP term is 'involved in', but for a long time we've wanted a way to be more specific about describing this relation
**Action item from February 14th call, was for groups to start looking at their existing BP annotations, specifically IMP, to determine whether the default involved in qualifier is still appropriate, and if not, how would they describe the relations between the gene/gene product and process?
**[https://docs.google.com/spreadsheets/d/1HbvdyKmRI7Zhj6qPqVEsvyN6cMwahsf4aLa54fiROvs BP_Annotation_Qualifiers_Spreadsheet]
**[https://docs.google.com/spreadsheets/d/1HbvdyKmRI7Zhj6qPqVEsvyN6cMwahsf4aLa54fiROvs BP_Annotation_Qualifiers_Spreadsheet]
**Explore effects on enrichment analyses
**Are there general principles we can derive to help with applying qualifiers to new and existing legacy annotations?
***Defining starts and ends of processes will be critical for doing this.
****Signaling, or otherwise well-defined, molecular pathways vs BPs like behaviors or developmental processes
****Look at genes annotated to both the process and regulation of the process
****Look at genes for which there is a regulation annotation and no MF annotation.
**Explore effects on enrichment analyses - if we give users the option to filter annotations based on use of specific qualifiers, what might the outcome be?
**David H has run gene sets from three papers using VLAD including and excluding IMP evidence codes.
**David H has run gene sets from three papers using VLAD including and excluding IMP evidence codes.
**GO and phenotype enrichment analyses may be complementary.  Can/should we start encouraging users to do both?
*Single-step biological processes - review annotations to help assess impact if such terms are obsoleted
*Single-step biological processes - review annotations to help assess impact if such terms are obsoleted
**[https://github.com/geneontology/go-ontology/issues/12859 Github ticket 'Remove all single-step BP classes']
**[https://github.com/geneontology/go-ontology/issues/12859 Github ticket 'Remove all single-step BP classes']
Line 27: Line 37:
*[https://github.com/geneontology/go-annotation/issues/1463 Transcription factor decision tree]
*[https://github.com/geneontology/go-annotation/issues/1463 Transcription factor decision tree]
**From Rachael:  The update is that we have revised the decision tree, which is attached above (see github ticket), and we would like feedback on it from this working group. When the working group are happy with it, then we will announce it at a future annotation call and get it added to the website.
**From Rachael:  The update is that we have revised the decision tree, which is attached above (see github ticket), and we would like feedback on it from this working group. When the working group are happy with it, then we will announce it at a future annotation call and get it added to the website.
*Annotating high throughput experiments
*[https://github.com/geneontology/go-annotation/issues/1469 Annotating high throughput experiments]
**Proposed first meeting: Tuesday, March 21st, 8am PST
 
= Minutes =
*On call: Chris, David H., Edith, Eric, George, Giulia, Harold, Helen, Jim, Judy, Karen, Kimberly, Li, Mary, Midori, Moni, Nancy, Pascale, Petra, Sabrina, Shur-Jen, Stacia, Stan, Tony, Val
 
== GOC Meeeting - Corvallis, Oregon ==
*Early June
*Three days of GOC meeting
*Noctua workshop
*Reactome workshop
**'''AI: Need to check what the focus of the Reactome workshop will be'''
 
== Software, Infrastructure ==
*Chris - update
*Check slides from USC presentation in GO Google directory
*GO will be ceasing support for subversion (SVN)
**Replaced with a mixture of github (for the ontology)
***Meeting in Berkeley at end of February for training on git and ontology editing - generally going well
**Will bypass version control for ontology files and publish directly on Amazon S3
*Look at go-site directory on github
**go-site/metadata/datasets
**datasets includes .yaml files of metadata for all GAF providers
***gives info about the project, what types of files are submitted
***use this metadata for validation checks, e.g. taxon
***the metadata will be used to drive future Jenkins jobs - validations, OWL tools checks, prediction GAFs
****see go-gaf-pipeline-NEW
****produces a folder for each producer, e.g. pipeline/target/mgi
****note file format with type of file as extension, e.g. .gpi or .gaf
****prediction GAFs can be slurped into their own database pipeline or can just get incorporated into the GAF as part of the Jenkins pipeline
**Harold - What is happening with the gp2protein, gp2rna, etc.?
***Chris - Could be incorporated into the target directory, but these files seem to have been subsumed by the Quest for Orthologs project.  We can start dispensing with these.
**Harold - For the errors files, Mike's script would remove the lines, what does Jenkins do with the errors?
***Chris - The current Jenkins filters might be a bit more liberal right now; Mike's script checked for more line-by-line types of errors, e.g. a missing column, while the Jenkins checks are more involved and require things like loading the ontologies, etc.
**Midori - What is the mechanism for alerting curators to errors?  emails?
***Chris - emails turn out to be problematic
**Harold - Will errors block the GAF from being released?
***Chris - Gross violations will be filterd
**Tony - How often will you be checking for updated files at the submitter's? Can we trigger a fetch?
***Chris - Cron job will look for new files once a week for now, probably on weekends; maybe move to a daily update, if possible?
**Harold - What about the PAINT files?  MGI fetches PAINT files and will incorporate them.
***Chris - At the moment, PAINT is considered a separate submitter and has its own metadata file.  Can conceivably push PAINT files automatically into the public annotation set, but that can be set on a per group basis.
**Sabrina - What is the timeline on this new way of doing things?
***Chris - There are still a number of dependencies, so will likely be on the order of a few months.  But, we know we need to get off of SVN (running on machines at Stanford) soon.
**David - How will the gpi files fit in with all of this?
***Chris - gpi files do/will subsume gp2protein and also give us more information and thus are much more useful.  This also relates to AGR activities; AGR has recently identified a JSON format for what is essentially the same content as a gpi file.  Maybe GO should move to that JSON format?
 
== Annotation Review ==
=== Proposed New Qualifiers ===
*A proposed new set of gene product - GO term qualifiers is now available
*Current default is 'involved in' but we know we probably have other types of relations in our existing annotations
*Action item from February 14th was for curators to look at existing annotations
*Some entries in spreadsheet for C. elegans, dicty, Drosophila, and human
*Verdict is still out on whether there are general principles that can be applied for adding new qualifiers to legacy IMP annotations
*Many annotations require a review of at least the supporting paper's abstract
*Looking at gene/gene product annotations where there is an IMP annotation to regulation of a process but NO MF may help pinpoint a set of annotations that definitely need review
**For this, we may be able to refine the MF criteria such that MF annotations to terms like 'metal ion binding' can also be considered since they may not be informative of mechanism and that is really what we are trying to get at with use of regulation terms
*Also, looking at gene/gene product annotations where there are annotations to both a BP term and regulation of that BP term
*Behavior and development are two areas of the BP ontology that would benefit from this review
**For example, what is the difference between 'deveopment' and 'regulation of development'?
*'''AI: Mary and Eric will work on querying the GO graph store for: 1) regulation BP annotations with no corresponding MF annotation, and 2) annotations to both a process and regulation of that process
**'''[https://github.com/geneontology/go-annotation/issues/1542 github ticket for GO graph store queries]
*'''AI: Curators continue to look at BP annotations to see if we can start to get a handle on some general rules for applying qualifiers (and exactly what those qualifiers mean)
 
=== Single-step Biological Process Terms ===
*Did not get to this agenda item - will try to cover next time.
 
=== Working Groups ===
*Did not get to this agenda item, either!  Will try to report back next time.
 
=== Next Call - April 11th ===
*No call on March 28th due to overlap with BioCurator meeting at Stanford.
 
 
 



Latest revision as of 15:53, 14 March 2017

Bluejeans URL

https://bluejeans.com/993661940

Agenda

GO Meeting Reminder

Software, Infrastructure

Annotation Review

  • Review of IMP annotations for possible use of new qualifiers
    • On the February 14th call, we discussed adding new qualifiers to describe the relationship between a gene/gene product and a GO BP term
    • Right now, the default relation between a gene/gene product and a GO BP term is 'involved in', but for a long time we've wanted a way to be more specific about describing this relation
    • Action item from February 14th call, was for groups to start looking at their existing BP annotations, specifically IMP, to determine whether the default involved in qualifier is still appropriate, and if not, how would they describe the relations between the gene/gene product and process?
    • BP_Annotation_Qualifiers_Spreadsheet
    • Are there general principles we can derive to help with applying qualifiers to new and existing legacy annotations?
      • Defining starts and ends of processes will be critical for doing this.
        • Signaling, or otherwise well-defined, molecular pathways vs BPs like behaviors or developmental processes
        • Look at genes annotated to both the process and regulation of the process
        • Look at genes for which there is a regulation annotation and no MF annotation.
    • Explore effects on enrichment analyses - if we give users the option to filter annotations based on use of specific qualifiers, what might the outcome be?
    • David H has run gene sets from three papers using VLAD including and excluding IMP evidence codes.
    • GO and phenotype enrichment analyses may be complementary. Can/should we start encouraging users to do both?
  • Single-step biological processes - review annotations to help assess impact if such terms are obsoleted
    • Github ticket 'Remove all single-step BP classes'
    • What terms could possibly be considered single-step processes?
      • Look at MF-BP links
    • Mary D's analysis shows that there are 2380 bioentities that have experimental annotation to phosphorylation or its children that do not have annotation to kinase or its children. This is just one example.

Working Groups

  • Transcription factor decision tree
    • From Rachael: The update is that we have revised the decision tree, which is attached above (see github ticket), and we would like feedback on it from this working group. When the working group are happy with it, then we will announce it at a future annotation call and get it added to the website.
  • Annotating high throughput experiments
    • Proposed first meeting: Tuesday, March 21st, 8am PST

Minutes

  • On call: Chris, David H., Edith, Eric, George, Giulia, Harold, Helen, Jim, Judy, Karen, Kimberly, Li, Mary, Midori, Moni, Nancy, Pascale, Petra, Sabrina, Shur-Jen, Stacia, Stan, Tony, Val

GOC Meeeting - Corvallis, Oregon

  • Early June
  • Three days of GOC meeting
  • Noctua workshop
  • Reactome workshop
    • AI: Need to check what the focus of the Reactome workshop will be

Software, Infrastructure

  • Chris - update
  • Check slides from USC presentation in GO Google directory
  • GO will be ceasing support for subversion (SVN)
    • Replaced with a mixture of github (for the ontology)
      • Meeting in Berkeley at end of February for training on git and ontology editing - generally going well
    • Will bypass version control for ontology files and publish directly on Amazon S3
  • Look at go-site directory on github
    • go-site/metadata/datasets
    • datasets includes .yaml files of metadata for all GAF providers
      • gives info about the project, what types of files are submitted
      • use this metadata for validation checks, e.g. taxon
      • the metadata will be used to drive future Jenkins jobs - validations, OWL tools checks, prediction GAFs
        • see go-gaf-pipeline-NEW
        • produces a folder for each producer, e.g. pipeline/target/mgi
        • note file format with type of file as extension, e.g. .gpi or .gaf
        • prediction GAFs can be slurped into their own database pipeline or can just get incorporated into the GAF as part of the Jenkins pipeline
    • Harold - What is happening with the gp2protein, gp2rna, etc.?
      • Chris - Could be incorporated into the target directory, but these files seem to have been subsumed by the Quest for Orthologs project. We can start dispensing with these.
    • Harold - For the errors files, Mike's script would remove the lines, what does Jenkins do with the errors?
      • Chris - The current Jenkins filters might be a bit more liberal right now; Mike's script checked for more line-by-line types of errors, e.g. a missing column, while the Jenkins checks are more involved and require things like loading the ontologies, etc.
    • Midori - What is the mechanism for alerting curators to errors? emails?
      • Chris - emails turn out to be problematic
    • Harold - Will errors block the GAF from being released?
      • Chris - Gross violations will be filterd
    • Tony - How often will you be checking for updated files at the submitter's? Can we trigger a fetch?
      • Chris - Cron job will look for new files once a week for now, probably on weekends; maybe move to a daily update, if possible?
    • Harold - What about the PAINT files? MGI fetches PAINT files and will incorporate them.
      • Chris - At the moment, PAINT is considered a separate submitter and has its own metadata file. Can conceivably push PAINT files automatically into the public annotation set, but that can be set on a per group basis.
    • Sabrina - What is the timeline on this new way of doing things?
      • Chris - There are still a number of dependencies, so will likely be on the order of a few months. But, we know we need to get off of SVN (running on machines at Stanford) soon.
    • David - How will the gpi files fit in with all of this?
      • Chris - gpi files do/will subsume gp2protein and also give us more information and thus are much more useful. This also relates to AGR activities; AGR has recently identified a JSON format for what is essentially the same content as a gpi file. Maybe GO should move to that JSON format?

Annotation Review

Proposed New Qualifiers

  • A proposed new set of gene product - GO term qualifiers is now available
  • Current default is 'involved in' but we know we probably have other types of relations in our existing annotations
  • Action item from February 14th was for curators to look at existing annotations
  • Some entries in spreadsheet for C. elegans, dicty, Drosophila, and human
  • Verdict is still out on whether there are general principles that can be applied for adding new qualifiers to legacy IMP annotations
  • Many annotations require a review of at least the supporting paper's abstract
  • Looking at gene/gene product annotations where there is an IMP annotation to regulation of a process but NO MF may help pinpoint a set of annotations that definitely need review
    • For this, we may be able to refine the MF criteria such that MF annotations to terms like 'metal ion binding' can also be considered since they may not be informative of mechanism and that is really what we are trying to get at with use of regulation terms
  • Also, looking at gene/gene product annotations where there are annotations to both a BP term and regulation of that BP term
  • Behavior and development are two areas of the BP ontology that would benefit from this review
    • For example, what is the difference between 'deveopment' and 'regulation of development'?
  • AI: Mary and Eric will work on querying the GO graph store for: 1) regulation BP annotations with no corresponding MF annotation, and 2) annotations to both a process and regulation of that process
  • AI: Curators continue to look at BP annotations to see if we can start to get a handle on some general rules for applying qualifiers (and exactly what those qualifiers mean)

Single-step Biological Process Terms

  • Did not get to this agenda item - will try to cover next time.

Working Groups

  • Did not get to this agenda item, either! Will try to report back next time.

Next Call - April 11th

  • No call on March 28th due to overlap with BioCurator meeting at Stanford.