Skip to main content

Cloud blob virtual filesystem — per-file inventory, lazy fetch, dry-run offload, Cursor skill

Project description

cloud-vfs

Manual cloud blob virtual filesystem for repos with large artifacts. Keep primary disks small: data lives in Azure Blob or S3, local paths keep tiny inline refs (same path) or .cloudstub directory pointers, and a machine-maintained per-file inventory tracks explicit cloud paths.

Design: generic source (cloud archive) and target (filesystem) — see docs/DESIGN.md.

Works with Cursor / Claude agents or plain shell + Azure CLI / AWS CLI.

License: MIT

Why cloud-vfs (not DVC / Git LFS)

cloud-vfs DVC / Git LFS
Lean repo; data stays out of git Data lineage tied to git commits
Agent-safe dry-run offload Heavier toolchain
Dual archive (primary + optional secondary backend) Single-remote patterns
Large data/ only inventory Tracks everything you add

Best for: disk hygiene + lazy fetch + explicit offload when projects store large files under data/ (or policy-defined prefixes).

Features

  • Per-file inventory.cloud-vfs/index/<shard>.json with local, blob, sha256, etag, state
  • Lazy fetchcloud-vfs ensure <path> (single file or whole tree)
  • Manual offload — hash before delete; --dry-run first
  • Drift auditcloud-vfs reconcile compares disk ↔ inventory ↔ blob
  • Large-data scope — default ≥ 50 MB under data/; prefix overrides for weights, etc.
  • Multi-cloud — Azure Blob and AWS S3
  • Cursor skillcloud-vfs init --skill

No auto-tracking, no cron, no background jobs.

Install

pip install cloud-vfs

Or from GitHub:

pip install git+https://github.com/sahasrarjn/cloud-vfs.git
curl -fsSL https://raw.githubusercontent.com/sahasrarjn/cloud-vfs/main/install.sh | bash

Requires Python 3.9+, az and/or aws CLI, and cloud credentials.

Try it in 5 minutes

pip install cloud-vfs
cloud-vfs try
cd cloud-vfs-try
cp .cloud-vfs/config.env.example .cloud-vfs/config.env   # set a TEST bucket
cloud-vfs doctor --roundtrip
./scripts/create-sample.sh
cloud-vfs offload --dry-run data/sample && cloud-vfs offload data/sample
cloud-vfs ensure data/sample

Full walkthrough: docs/TRY.md. Same demo lives in examples/minimal-demo/ if you cloned this repo.

Quick start (your project)

Point at any repo or folder (must be writable; run from repo root or pass --path):

cd /path/to/your-ml-repo
cloud-vfs init --path . --skill
cp .cloud-vfs/config.env.example .cloud-vfs/config.env   # set bucket (see config.env.example)
cloud-vfs doctor --roundtrip

cloud-vfs scan                    # what large files can you offload?
cloud-vfs scan --add              # add them to manifest (no upload yet)
cloud-vfs offload --dry-run       # preview: sizes + cloud target
cloud-vfs offload data/your_run   # upload + stub (you choose paths)
cloud-vfs ensure data/your_run    # fetch back when needed

Optional: cloud-vfs register <path> indexes sha256 without upload; cloud-vfs status --drift audits inventory.

Two layers

Layer File Who edits
Policy .cloud-vfs/manifest.json Human / agent
Policy .cloud-vfs/inventory-policy.json Human / agent
Inventory .cloud-vfs/index/<root>.json Tools only

Inventory rows are created by offload, register, and reconcile --fix-index — never hand-edited.

Commands

Command Description
cloud-vfs guard <paths> Block unsafe local deletes (not managed by cloud-vfs)
cloud-vfs doctor [--probe] [--roundtrip] Verify install, config, CLI, and cloud access
cloud-vfs ensure [--source A] [--target-root DIR] [--check-only] Materialize cloud source → project or custom target
cloud-vfs preflight <paths> Exit non-zero if stubs/refs need ensure
cloud-vfs ingest --source FILE --target REL One-shot upload from arbitrary local file
cloud-vfs try [--path DIR] Create sandbox demo project (default ./cloud-vfs-try)
cloud-vfs init [--path DIR] [--skill] Scaffold .cloud-vfs/ in any folder
cloud-vfs scan [--add] [--prefix P] Find large local files; optionally add to manifest
cloud-vfs register <paths> Index local files (+ sha256); respects min size
cloud-vfs ensure <path> Fetch from cloud if inline ref / stub / cloud-only
cloud-vfs resolve <path> JSON: blob URL + inventory row (for agents)
cloud-vfs status [--drift] Manifest paths + inventory counts
cloud-vfs reconcile [--from-blob] [--fix-index] Drift audit; rebuild index from blob
cloud-vfs prune Remove inventory rows below min size
cloud-vfs offload --dry-run Preview offload candidates
cloud-vfs offload <paths> Upload + index (large files) + inline ref or dir stub
cloud-vfs materialize-stubs Write inline/sidecar refs; migrate legacy file sidecars

Project layout

your-project/
  .cloud-vfs/
    config.env              # account names (commit)
    secrets.env             # keys (gitignored)
    manifest.json           # folder-level policy (commit)
    inventory-policy.json   # min size, include/exclude (commit)
    index/                  # per-file inventory shards
      data/
        ADME.json             # commit benchmark shards
        generated/            # often gitignored — regenerate from blob
  data/
    big.npy                   # inline JSON ref when single file offloaded
    big/.cloudstub            # directory pointer when tree offloaded
  .cursor/skills/cloud-vfs/   # optional

Tracking scope (defaults)

Rule Default
include_prefixes data/ only
min_size_bytes 50 MB (52_428_800)
prefix_min_size_bytes e.g. data/model_weights/ → 5 MB
exclude_prefixes code/, research/, …
Offloaded split trees dir stub blob_prefix for small members; index only large files
Offloaded single files inline ref at original path ("cvfs": 1)

See docs/INVENTORY.md.

One or two archives (Azure and/or AWS)

Set LOCAL_PROVIDER=azure or aws in .cloud-vfs/config.env.

Azure: AZ_LOCAL_*, AZ_REMOTE_* + keys in secrets.env

AWS: AWS_LOCAL_BUCKET, AWS_LOCAL_REGION (uses aws CLI credentials)

Manifest archive keys: local_archive (primary), remote_staging (secondary). See docs/SOURCE_TARGET.md.

Agents

cloud-vfs ensure path/to/file          # before reading cloud-only paths
cloud-vfs register path/to/new.npy     # after creating outputs ≥ min size
cloud-vfs reconcile                    # after compute runs
cloud-vfs offload --dry-run path       # always dry-run + confirm with user
cloud-vfs offload path

Never hand-edit .cloud-vfs/index/*.json.

Environment variables

Variable Purpose
CLOUD_VFS_PROJECT_ROOT Force project root
CLOUD_VFS_CONFIG Path to config.env
CLOUD_VFS_SECRETS Path to secrets.env
CLOUD_VFS_MANIFEST Path to manifest.json

Documentation

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloud_vfs-0.5.5.tar.gz (55.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cloud_vfs-0.5.5-py3-none-any.whl (59.6 kB view details)

Uploaded Python 3

File details

Details for the file cloud_vfs-0.5.5.tar.gz.

File metadata

  • Download URL: cloud_vfs-0.5.5.tar.gz
  • Upload date:
  • Size: 55.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cloud_vfs-0.5.5.tar.gz
Algorithm Hash digest
SHA256 943d2ca79ac2fc9a88e3569e478b0c0838cdce11ac3b8bdc3329f17b12ea210c
MD5 e18ce170b96773268444847c69e69991
BLAKE2b-256 a04d2619b380abf2c7f088b5e5122a952d74dcc98f876bb787c853a9c2f3b446

See more details on using hashes here.

Provenance

The following attestation bundles were made for cloud_vfs-0.5.5.tar.gz:

Publisher: publish.yml on sahasrarjn/cloud-vfs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cloud_vfs-0.5.5-py3-none-any.whl.

File metadata

  • Download URL: cloud_vfs-0.5.5-py3-none-any.whl
  • Upload date:
  • Size: 59.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cloud_vfs-0.5.5-py3-none-any.whl
Algorithm Hash digest
SHA256 55c003e62ae42b326523f72f29400a295dad43b45dae4d56d2bbfe3fcb181d33
MD5 873d87e9a0052ee432fb2021ca59299a
BLAKE2b-256 7b77c1361ad75bf96521a511f2974c4876efac954fe51948c081ca506b4c24bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for cloud_vfs-0.5.5-py3-none-any.whl:

Publisher: publish.yml on sahasrarjn/cloud-vfs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page