Release Pipeline: Difference between revisions
No edit summary |
|||
Line 9: | Line 9: | ||
For example: https://github.com/geneontology/go-site/blob/master/metadata/datasets/tair.yaml | For example: https://github.com/geneontology/go-site/blob/master/metadata/datasets/tair.yaml | ||
The most important field here is the '''source''' tag. This dictates where GO Central pulls each contributing group's GAF from. This can be an | The most important field here is the '''source''' tag. This dictates where GO Central pulls each contributing group's GAF from. This can be an FTP URL on a FTP site managed by that group, an S3 bucket, an HTTP server, etc--anything as long as it resolves to the latest submitted GAF. | ||
The pipeline will then run checks on this (see below) and repair any auto-repairable issues (for example, migrating annotations to merged terms). It will then public the processes GAFs, GPADs, GPIs to S3, where it is available for the public to download | The pipeline will then run checks on this (see below) and repair any auto-repairable issues (for example, migrating annotations to merged terms). It will then public the processes GAFs, GPADs, GPIs to S3, where it is available for the public to download |
Revision as of 12:50, 3 April 2018
Release Cycle
The release cycle is monthly.
Additionally snapshot releases are created daily. These are intended only for internal consumption by GOC members. End-users get the monthly releases
The pipeline is driven by yaml metadata files in the metadata/datasets folder on the go-site repo. Please see the README.md in this folder for a description of the structure
For example: https://github.com/geneontology/go-site/blob/master/metadata/datasets/tair.yaml
The most important field here is the source tag. This dictates where GO Central pulls each contributing group's GAF from. This can be an FTP URL on a FTP site managed by that group, an S3 bucket, an HTTP server, etc--anything as long as it resolves to the latest submitted GAF.
The pipeline will then run checks on this (see below) and repair any auto-repairable issues (for example, migrating annotations to merged terms). It will then public the processes GAFs, GPADs, GPIs to S3, where it is available for the public to download
Per-group curator QC reports
Every day a dry-run of the full pipeline runs. We call this snapshot. As part of this run, we generate products and reports that are of use to db admins and curators of the various contributing groups (MODs, UniProt, etc) to the GO Consortium.
- http://snapshot.geneontology.org/reports/
- summary.txt. EXAMPLE: dictybase.report.md
- prediction-report.txt EXAMPLE: mgi-prediction-report.txt
- owltools-check.txt EXAMPLE: mgi-owltools-check.txt
- http://snapshot.geneontology.org/products/annotations/
- GROUP-prediction.gaf EXAMPLE: pombase-prediction.gaf
Summary
This is a basic summary of the parsing of your GAF file. It functionally replaces the old "Mike's script"
These are found in reports
Example: http://snapshot.geneontology.org/reports/dictybase.report.md
These report basic syntax errors and implement a subset of checks in the GO QC Rules
Prediction Report and OWLTools Checks
Predictions
Technical Details
See the README.md in the pipeline GitHub repo: https://github.com/geneontology/pipeline