No project description provided
Project description
"Staging" for Snakemake
This package provides a mechanism for Snakemake workflows to explicitly "stage out" the output files from certain rules to a public repository like Zenodo to allow faster re-execution of the workflow, using these previously generated artifacts. This can be especially useful for workflows with computationally expensive rules that don't need to be frequently re-run.
snakemake-staging is a spin-off of the
showyourwork project, which
provides a "caching" framework for Snakemake workflows, to transparently avoid
re-execution of rules that have been cached to Zenodo. The
implementation of this logic in showyourwork is, however, somewhat fragile and
unpredictable. In snakemake-staging, we take a more explicit approach, where
"staged" rules are always either explicitly executed or restored.
Installation
To use snakemake-staging in your workflow, you can install it using pip
(it's probably best to set up your Snakemake installation following the
Snakemake
docs
first):
python -m pip install snakemake-staging
Quickstart
The Snakefile
While testing, it's probably best to use the Zenodo
Sandbox, rather than the main site, since any
archive published to Zenodo is permanent. To use the sandbox, you'll need a
personal access token stored in the SANDBOX_TOKEN environment variable. You
can generate a new token
here.
Once you've added this token to your environment, you can edit the Snakefile for
your workflow to use snakemake-staging as follows. First, towards the top of
your Snakefile, add:
import snakemake_staging as staging
stage = staging.ZenodoStage(
"zenodo-stage",
config.get("restore", False)
)
to create a new stage called zenodo-stage. Note that here we're extracting a
restore flag from the Snakemake config, which will be used to determine
whether to restore files for the stage. This means that you can control the
behavior of this stage from the command line. By passing --config restore=True
to the snakemake command line interface, all files staged out by the
zenodo-stage stage will be restored from the archive rather than generated.
Then, to stage out a rule, you can apply the stage as follows:
rule expensive:
input:
...
output:
stage(
"path/to/output1.txt",
"path/to/output2.txt",
)
shell:
...
Finally, after defining all the rules that you want to stage out, you must
add the following include which defines all the staging rules:
include: staging.snakefile()
At this point, here's the full Snakefile:
Full Snakefile
import snakemake_staging as staging
stage = staging.ZenodoStage(
"zenodo-stage",
config.get("restore", False)
)
rule expensive:
input:
...
output:
stage(
"path/to/output1.txt",
"path/to/output2.txt",
)
shell:
...
include: staging.snakefile()
Usage
With the Snakefile defined in the previous section, you can now run your
workflow in 3 ways:
-
Normal execution: If you run something like
snakemake path/to/output1.txt(where I have omitted the usual--coresand--condaarguments) will execute the workflow as normal, without staging out any files. -
Stage upload: If you instead have Snakemake target the
staging__uploadrule, theexpensiverule will be executed, and the outputs will be uploaded to Zenodo, saving the record information tozenodo-stage.zenodo.json(this filename can be changed by passing theinfo_fileargument to theZenodoStageconstructor). -
Stage restore: Finally, after these outputs have been uploaded to Zenodo, you can call Snakemake
--config restore=Trueto disable theexpensiverule, and force the outputs to be restored from Zenodo.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file snakemake_staging-0.0.2.tar.gz.
File metadata
- Download URL: snakemake_staging-0.0.2.tar.gz
- Upload date:
- Size: 14.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.1 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3e783db2998662e25a6c9009a9ed57abef606c27bee8050d724eb15da164ed9
|
|
| MD5 |
e168d57398b78fa1ce1780cba2549e0d
|
|
| BLAKE2b-256 |
61eeaacf9faab85333a05dd29bac95a1c8d7c1003e15faba8fbb1139f6f39d0b
|
File details
Details for the file snakemake_staging-0.0.2-py3-none-any.whl.
File metadata
- Download URL: snakemake_staging-0.0.2-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.1 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d96323d390c349a75639318a93c137578a4ddfea4627f31d76582071fe53751
|
|
| MD5 |
fb4aac86a5573b57f3a47b9103ebf410
|
|
| BLAKE2b-256 |
85951868b907ee8db840697d3a0b2bd1d618ed938e4b1e220c2e8d627a41ec9c
|