Release Pipeline: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
Line 18: Line 18:
* http://release.geneontology.org (~monthly, plus historical sets from the new pipeline)
* http://release.geneontology.org (~monthly, plus historical sets from the new pipeline)
* http://current.geneontology.org (~monthly, containing the latest release set)
* http://current.geneontology.org (~monthly, containing the latest release set)
All pipeline runs start at midnight (12am) PDT, and currently take about 14hrs (this will be decreased in the future). The `release`/`current` pipeline runs are attempted on the first of every month. As a note to that, the `snapshot` run does not currently run on the day of the `release`.


== Per-group curator QC reports ==
== Per-group curator QC reports ==

Revision as of 15:43, 4 April 2018

Release Cycle

The release cycle is monthly.

Additionally snapshot releases are created daily. These are intended only for internal consumption by GOC members. End-users get the monthly releases

The pipeline is driven by yaml metadata files in the metadata/datasets folder on the go-site repo. Please see the README.md in this folder for a description of the structure

For example: https://github.com/geneontology/go-site/blob/master/metadata/datasets/tair.yaml

The most important field here is the source tag. This dictates where GO Central pulls each contributing group's GAF from. This can be an FTP URL on a FTP site managed by that group, an S3 bucket, an HTTP server, etc--anything as long as it resolves to the latest submitted GAF.

The pipeline will then run checks on this (see below) and repair any auto-repairable issues (for example, migrating annotations to merged terms). It will then public the processes GAFs, GPADs, GPIs, etc. to a public site, where it is available for the public to download.

The "publish" sites that are currently part of the pipeline are:

All pipeline runs start at midnight (12am) PDT, and currently take about 14hrs (this will be decreased in the future). The `release`/`current` pipeline runs are attempted on the first of every month. As a note to that, the `snapshot` run does not currently run on the day of the `release`.

Per-group curator QC reports

Every day a dry-run of the full pipeline runs. We call this snapshot. As part of this run, we generate products and reports that are of use to db admins and curators of the various contributing groups (MODs, UniProt, etc) to the GO Consortium.

Summary

This is a basic summary of the parsing of your GAF file. It functionally replaces the old "Mike's script"

These are found in reports

Example: http://snapshot.geneontology.org/reports/dictybase.report.md

These report basic syntax errors and implement a subset of checks in the GO QC Rules

Prediction Report and OWLTools Checks

Predictions

Technical Details

See the README.md in the pipeline GitHub repo: https://github.com/geneontology/pipeline