Release Pipeline: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 33: Line 33:
*For both the monthly public releases and the daily snapshot releases, the same set of QA/QC checks and annotation file merges are performed.  Resulting annotation files and reports are then made accessible via the release URLs listed above.
*For both the monthly public releases and the daily snapshot releases, the same set of QA/QC checks and annotation file merges are performed.  Resulting annotation files and reports are then made accessible via the release URLs listed above.
*When GO Central retrieves an annotation file from a contributing group, the pipeline will run checks on the file and repair any auto-repairable issues (for example, migrating annotations to merged terms). It will then publish the processed GAFs, GPADs, GPIs, etc. to a public site, available for download.
*When GO Central retrieves an annotation file from a contributing group, the pipeline will run checks on the file and repair any auto-repairable issues (for example, migrating annotations to merged terms). It will then publish the processed GAFs, GPADs, GPIs, etc. to a public site, available for download.
*'''Need to link to the GO rules on github and/or articulate all of the checks that annotation files undergo after submission.'''


== Per-group curator QC reports ==
== Per-group curator QC reports ==

Revision as of 12:50, 5 June 2018

June 2018: This documentation is currently a work in progress.

Overview

The GO Consortium (GOC) is now publically releasing data on a monthly basis. Data includes annotation files, ontology files, GO-CAM models, and... Official monthly releases are versioned and archived so that analyses performed with these data can be reproduced at any point in the future. Additionally, daily snapshot releases of GO data are available for internal use by GOC members. This allows annotators, for example, to have access to the most up-to-date version of the ontology for their curation. However, data generated using snapshot releases will not be officially released until the monthly public release.

Release Cycle

For both the daily and monthly releases, the pipeline runs start at midnight (12am) PDT, and currently take about 14hrs (this will be decreased in the future); starting nightly for the daily snapshot release and the first of the month (or as close as can be obtained if there are failures) for the monthly public release. As a note to that, the `snapshot` run does not also run on the day of the monthly `release`. Data associated with each release can be accessed at the URLs below, with specific details about the contents of released files discussed where appropriate below.

Annotations

Overview

Annotation files are retrieved from each participating consortium member by GO Central, merged with PAINT annotation files, run through annotation QA/QC checks and then released as daily snapshot and monthly public releases.

Annotation Source Files

How to Submit an Annotation File
What Happens to Annotations During a Release Cycle
  • For both the monthly public releases and the daily snapshot releases, the same set of QA/QC checks and annotation file merges are performed. Resulting annotation files and reports are then made accessible via the release URLs listed above.
  • When GO Central retrieves an annotation file from a contributing group, the pipeline will run checks on the file and repair any auto-repairable issues (for example, migrating annotations to merged terms). It will then publish the processed GAFs, GPADs, GPIs, etc. to a public site, available for download.
  • Need to link to the GO rules on github and/or articulate all of the checks that annotation files undergo after submission.

Per-group curator QC reports

We run the full pipeline every day, sans the SVN writeback and software deployments; we call this snapshot. As part of this run, just as in the release, we generate products and reports that are of use to db admins and curators of the various contributing groups (MODs, UniProt, etc) to the GO Consortium.

Summary

This is a basic summary of the parsing of your GAF file. It functionally replaces the old "Mike's script"

These are found in reports

Example: http://snapshot.geneontology.org/reports/dictybase.report.md

These report basic syntax errors and implement a subset of checks in the GO QC Rules

Prediction Report and OWLTools Checks

Predictions

Technical Details

See the README.md in the pipeline GitHub repo: https://github.com/geneontology/pipeline

GO Consortium Dataflow

https://github.com/geneontology/go-site/blob/master/docs/go-consortium-dataflow.png