Skip to main content

Snakemake storage plugin for downloading Git LFS files with optional local caching

Project description

Snakemake Storage Plugin: LFS

A Snakemake storage plugin for downloading files from Git LFS (Large File Storage) servers with optional local caching.

Features

  • Git LFS protocol: Fetches objects via the Git LFS Batch API
  • Local repo lookup: Checks a local git repository's LFS store before downloading remotely
  • Local caching: Downloaded objects can be cached to avoid redundant transfers
  • Checksum verification: Verifies SHA-256 integrity (the LFS OID is the SHA-256 digest)
  • Authentication: Supports token-based Basic Auth via environment variable
  • Concurrent download control: Limits simultaneous downloads
  • Progress bars: Shows download progress with tqdm
  • Immutable objects: Returns mtime=0 (LFS objects are content-addressed and never change)
  • Environment variable support: Configure via environment variables for CI/CD

Installation

pip install snakemake-storage-plugin-lfs

URL Format

LFS objects are referenced using the lfs:// scheme:

lfs://{oid}/{path}
  • {oid} — SHA-256 hex digest of the object (64 hex characters)
  • {path} — logical file path used as the local filename

Example:

lfs://3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4/data/natura.tiff

Configuration

Register the plugin in your Snakefile:

storage lfs:
    provider="lfs",
    repo_url="https://github.com/org/repo",  # required

Settings

Setting Default Env var Description
repo_url "" SNAKEMAKE_STORAGE_LFS_REPO_URL Git repository URL used to construct the LFS Batch API endpoint (e.g. https://github.com/org/repo). Required.
token_envvar "" Name of the environment variable containing the authentication token (used as Basic Auth password with username git).
local_repo "" SNAKEMAKE_STORAGE_LFS_LOCAL_REPO Path to a local git repository. Files are looked up by path in the working tree before downloading remotely. If the file exists but its SHA-256 hash does not match the OID, a WorkflowError is raised (the local repo contains a different version). LFS pointer stubs (not-yet-pulled files) are detected and skipped.
cache "" SNAKEMAKE_STORAGE_LFS_CACHE Path to a cache directory for downloaded objects. Set to a path to enable caching; leave empty to disable.
skip_remote_checks False SNAKEMAKE_STORAGE_LFS_SKIP_REMOTE_CHECKS Skip existence/size checks against the remote LFS server. Useful in CI/CD when inputs are known to exist.
max_concurrent_downloads 3 Maximum number of simultaneous downloads.

Usage

Use lfs:// URLs directly in your rules:

rule use_lfs_file:
    input:
        storage.lfs(
            "lfs://3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4/data/natura.tiff"
        ),
    output:
        "resources/natura.tiff"
    shell:
        "cp {input} {output}"

The plugin will:

  1. Check the local git repository's LFS store (if local_repo is set)
  2. Check the local cache (if cache is set)
  3. If not found locally, query the LFS Batch API for a download URL
  4. Download the object with a progress bar
  5. Verify the SHA-256 checksum against the OID
  6. Store in the cache (if cache is set)

Authentication

To access private repositories, set token_envvar to the name of an environment variable that holds the token:

storage lfs:
    provider="lfs",
    repo_url="https://github.com/org/private-repo",
    token_envvar="GITHUB_TOKEN",
export GITHUB_TOKEN="ghp_..."
snakemake --cores all

CI/CD Configuration

# GitHub Actions example
- name: Run snakemake workflows
  env:
    SNAKEMAKE_STORAGE_LFS_REPO_URL: "https://github.com/org/repo"
    SNAKEMAKE_STORAGE_LFS_SKIP_REMOTE_CHECKS: "1"
    GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
  run: |
    snakemake --cores all

How LFS Objects Are Located

Priority order in managed_retrieve():

  1. Local repo (local_repo setting): Looks up the file by its path in the working tree ({local_repo}/{lfs_path}). LFS pointer stubs are skipped (with a warning). If the file is present but its SHA-256 does not match the OID, a WorkflowError is raised — the local repo contains a different version of the file.
  2. Cache (cache setting): Checks the configured cache directory.
  3. Remote: Queries the LFS Batch API ({repo_url}.git/info/lfs/objects/batch) and downloads from the returned URL.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snakemake_storage_plugin_lfs-0.2.tar.gz (27.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snakemake_storage_plugin_lfs-0.2-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file snakemake_storage_plugin_lfs-0.2.tar.gz.

File metadata

File hashes

Hashes for snakemake_storage_plugin_lfs-0.2.tar.gz
Algorithm Hash digest
SHA256 4a5029c298e20244f04d5cb79923c7d3687c59a8450b454a88c29582e1c54718
MD5 b08a32881b136b93ac95dad0fe25c012
BLAKE2b-256 d19788d938cf88a4c5d88a3c27980ada3403388401538736bf3df45ba6fa61cb

See more details on using hashes here.

Provenance

The following attestation bundles were made for snakemake_storage_plugin_lfs-0.2.tar.gz:

Publisher: publish.yml on open-energy-transition/snakemake-storage-plugin-lfs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file snakemake_storage_plugin_lfs-0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for snakemake_storage_plugin_lfs-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ec7468a454fe62f47f011dd1f6343fa0d8dacedc088eec8fb35592aae4bcae89
MD5 7751b7a482790cb593d86bc99da3cb18
BLAKE2b-256 089f58cefead573ffa9942537aa5fade9e0479da997159266509e80436544e64

See more details on using hashes here.

Provenance

The following attestation bundles were made for snakemake_storage_plugin_lfs-0.2-py3-none-any.whl:

Publisher: publish.yml on open-energy-transition/snakemake-storage-plugin-lfs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page