Snakemake storage plugin for downloading Git LFS files with optional local caching
Project description
Snakemake Storage Plugin: LFS
A Snakemake storage plugin for downloading files from Git LFS (Large File Storage) servers with optional local caching.
Features
- Git LFS protocol: Fetches objects via the Git LFS Batch API
- Local repo lookup: Checks a local git repository's LFS store before downloading remotely
- Local caching: Downloaded objects can be cached to avoid redundant transfers
- Checksum verification: Verifies SHA-256 integrity (the LFS OID is the SHA-256 digest)
- Authentication: Supports token-based Basic Auth via environment variable
- Concurrent download control: Limits simultaneous downloads
- Progress bars: Shows download progress with tqdm
- Immutable objects: Returns mtime=0 (LFS objects are content-addressed and never change)
- Environment variable support: Configure via environment variables for CI/CD
Installation
pip install snakemake-storage-plugin-lfs
URL Format
LFS objects are referenced using the lfs:// scheme:
lfs://{oid}/{path}
{oid}— SHA-256 hex digest of the object (64 hex characters){path}— logical file path used as the local filename
Example:
lfs://3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4/data/natura.tiff
Configuration
Register the plugin in your Snakefile:
storage lfs:
provider="lfs",
repo_url="https://github.com/org/repo", # required
Settings
| Setting | Default | Env var | Description |
|---|---|---|---|
repo_url |
"" |
SNAKEMAKE_STORAGE_LFS_REPO_URL |
Git repository URL used to construct the LFS Batch API endpoint (e.g. https://github.com/org/repo). Required. |
token_envvar |
"" |
— | Name of the environment variable containing the authentication token (used as Basic Auth password with username git). |
local_repo |
"" |
SNAKEMAKE_STORAGE_LFS_LOCAL_REPO |
Path to a local git repository. Files are looked up by path in the working tree before downloading remotely. If the file exists but its SHA-256 hash does not match the OID, a WorkflowError is raised (the local repo contains a different version). LFS pointer stubs (not-yet-pulled files) are detected and skipped. |
cache |
"" |
SNAKEMAKE_STORAGE_LFS_CACHE |
Path to a cache directory for downloaded objects. Set to a path to enable caching; leave empty to disable. |
skip_remote_checks |
False |
SNAKEMAKE_STORAGE_LFS_SKIP_REMOTE_CHECKS |
Skip existence/size checks against the remote LFS server. Useful in CI/CD when inputs are known to exist. |
max_concurrent_downloads |
3 |
— | Maximum number of simultaneous downloads. |
Usage
Use lfs:// URLs directly in your rules:
rule use_lfs_file:
input:
storage.lfs(
"lfs://3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4/data/natura.tiff"
),
output:
"resources/natura.tiff"
shell:
"cp {input} {output}"
The plugin will:
- Check the local git repository's LFS store (if
local_repois set) - Check the local cache (if
cacheis set) - If not found locally, query the LFS Batch API for a download URL
- Download the object with a progress bar
- Verify the SHA-256 checksum against the OID
- Store in the cache (if
cacheis set)
Authentication
To access private repositories, set token_envvar to the name of an environment variable that holds the token:
storage lfs:
provider="lfs",
repo_url="https://github.com/org/private-repo",
token_envvar="GITHUB_TOKEN",
export GITHUB_TOKEN="ghp_..."
snakemake --cores all
CI/CD Configuration
# GitHub Actions example
- name: Run snakemake workflows
env:
SNAKEMAKE_STORAGE_LFS_REPO_URL: "https://github.com/org/repo"
SNAKEMAKE_STORAGE_LFS_SKIP_REMOTE_CHECKS: "1"
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
snakemake --cores all
How LFS Objects Are Located
Priority order in managed_retrieve():
- Local repo (
local_reposetting): Looks up the file by its path in the working tree ({local_repo}/{lfs_path}). LFS pointer stubs are skipped (with a warning). If the file is present but its SHA-256 does not match the OID, aWorkflowErroris raised — the local repo contains a different version of the file. - Cache (
cachesetting): Checks the configured cache directory. - Remote: Queries the LFS Batch API (
{repo_url}.git/info/lfs/objects/batch) and downloads from the returned URL.
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file snakemake_storage_plugin_lfs-0.2.tar.gz.
File metadata
- Download URL: snakemake_storage_plugin_lfs-0.2.tar.gz
- Upload date:
- Size: 27.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a5029c298e20244f04d5cb79923c7d3687c59a8450b454a88c29582e1c54718
|
|
| MD5 |
b08a32881b136b93ac95dad0fe25c012
|
|
| BLAKE2b-256 |
d19788d938cf88a4c5d88a3c27980ada3403388401538736bf3df45ba6fa61cb
|
Provenance
The following attestation bundles were made for snakemake_storage_plugin_lfs-0.2.tar.gz:
Publisher:
publish.yml on open-energy-transition/snakemake-storage-plugin-lfs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
snakemake_storage_plugin_lfs-0.2.tar.gz -
Subject digest:
4a5029c298e20244f04d5cb79923c7d3687c59a8450b454a88c29582e1c54718 - Sigstore transparency entry: 1356946364
- Sigstore integration time:
-
Permalink:
open-energy-transition/snakemake-storage-plugin-lfs@47c2d6a6cf404d5b3550843a8be8c7210d1ac1b6 -
Branch / Tag:
refs/tags/v0.2 - Owner: https://github.com/open-energy-transition
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@47c2d6a6cf404d5b3550843a8be8c7210d1ac1b6 -
Trigger Event:
release
-
Statement type:
File details
Details for the file snakemake_storage_plugin_lfs-0.2-py3-none-any.whl.
File metadata
- Download URL: snakemake_storage_plugin_lfs-0.2-py3-none-any.whl
- Upload date:
- Size: 10.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec7468a454fe62f47f011dd1f6343fa0d8dacedc088eec8fb35592aae4bcae89
|
|
| MD5 |
7751b7a482790cb593d86bc99da3cb18
|
|
| BLAKE2b-256 |
089f58cefead573ffa9942537aa5fade9e0479da997159266509e80436544e64
|
Provenance
The following attestation bundles were made for snakemake_storage_plugin_lfs-0.2-py3-none-any.whl:
Publisher:
publish.yml on open-energy-transition/snakemake-storage-plugin-lfs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
snakemake_storage_plugin_lfs-0.2-py3-none-any.whl -
Subject digest:
ec7468a454fe62f47f011dd1f6343fa0d8dacedc088eec8fb35592aae4bcae89 - Sigstore transparency entry: 1356946377
- Sigstore integration time:
-
Permalink:
open-energy-transition/snakemake-storage-plugin-lfs@47c2d6a6cf404d5b3550843a8be8c7210d1ac1b6 -
Branch / Tag:
refs/tags/v0.2 - Owner: https://github.com/open-energy-transition
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@47c2d6a6cf404d5b3550843a8be8c7210d1ac1b6 -
Trigger Event:
release
-
Statement type: