Skip to main content

Cloud blob virtual filesystem — per-file inventory, lazy fetch, dry-run offload, Cursor skill

Project description

cloud-vfs

Manual cloud blob virtual filesystem for repos with large artifacts. Keep primary disks small: data lives in Azure Blob or S3, local paths keep tiny inline refs (same path) or .cloudstub directory pointers, and a machine-maintained per-file inventory tracks explicit cloud paths.

Design: generic source (cloud archive) and target (filesystem) — see docs/DESIGN.md.

Works with Cursor / Claude agents or plain shell + Azure CLI / AWS CLI.

License: MIT

Why cloud-vfs (not DVC / Git LFS)

cloud-vfs DVC / Git LFS
Lean repo; data stays out of git Data lineage tied to git commits
Agent-safe dry-run offload Heavier toolchain
Dual archive (primary + optional secondary backend) Single-remote patterns
Large data/ only inventory Tracks everything you add

Best for: disk hygiene + lazy fetch + explicit offload when projects store large files under data/ (or policy-defined prefixes).

Features

  • Per-file inventory.cloud-vfs/index/<shard>.json with local, blob, sha256, etag, state
  • Lazy fetchcloud-vfs ensure <path> (single file or whole tree)
  • Manual offload — hash before delete; --dry-run first
  • Drift auditcloud-vfs reconcile compares disk ↔ inventory ↔ blob
  • Large-data scope — default ≥ 50 MB under data/; prefix overrides for weights, etc.
  • Multi-cloud — Azure Blob and AWS S3
  • Cursor skillcloud-vfs init --skill

No auto-tracking, no cron, no background jobs.

Install

pip install cloud-vfs

Or from GitHub:

pip install git+https://github.com/sahasrarjn/cloud-vfs.git
curl -fsSL https://raw.githubusercontent.com/sahasrarjn/cloud-vfs/main/install.sh | bash

Requires Python 3.9+, az and/or aws CLI, and cloud credentials.

Try it in 5 minutes

pip install cloud-vfs
cloud-vfs try
cd cloud-vfs-try
cp .cloud-vfs/config.env.example .cloud-vfs/config.env   # set a TEST bucket
cloud-vfs doctor --roundtrip
./scripts/create-sample.sh
cloud-vfs offload --dry-run data/sample && cloud-vfs offload data/sample
cloud-vfs ensure data/sample

Full walkthrough: docs/TRY.md. Same demo lives in examples/minimal-demo/ if you cloned this repo.

Quick start (your project)

Point at any repo or folder (must be writable; run from repo root or pass --path):

cd /path/to/your-ml-repo
cloud-vfs init --path . --skill
cp .cloud-vfs/config.env.example .cloud-vfs/config.env   # set bucket (see config.env.example)
cloud-vfs doctor --roundtrip

cloud-vfs scan                    # what large files can you offload?
cloud-vfs scan --add              # add them to manifest (no upload yet)
cloud-vfs offload --dry-run       # preview: sizes + cloud target
cloud-vfs offload data/your_run   # upload + stub (you choose paths)
cloud-vfs ensure data/your_run    # fetch back when needed

Optional: cloud-vfs register <path> indexes sha256 without upload; cloud-vfs status --drift audits inventory.

Two layers

Layer File Who edits
Policy .cloud-vfs/manifest.json Human / agent
Policy .cloud-vfs/inventory-policy.json Human / agent
Inventory .cloud-vfs/index/<root>.json Tools only

Inventory rows are created by offload, register, and reconcile --fix-index — never hand-edited.

Commands

Command Description
cloud-vfs guard <paths> Block unsafe local deletes (not managed by cloud-vfs)
cloud-vfs doctor [--probe] [--roundtrip] Verify install, config, CLI, and cloud access
cloud-vfs ensure [--source A] [--target-root DIR] [--check-only] Materialize cloud source → project or custom target
cloud-vfs preflight <paths> Exit non-zero if stubs/refs need ensure
cloud-vfs ingest --source FILE --target REL One-shot upload from arbitrary local file
cloud-vfs try [--path DIR] Create sandbox demo project (default ./cloud-vfs-try)
cloud-vfs init [--path DIR] [--skill] Scaffold .cloud-vfs/ in any folder
cloud-vfs scan [--add] [--prefix P] Find large local files; optionally add to manifest
cloud-vfs register <paths> Index local files (+ sha256); respects min size
cloud-vfs ensure <path> Fetch from cloud if inline ref / stub / cloud-only
cloud-vfs resolve <path> JSON: blob URL + inventory row (for agents)
cloud-vfs status [--drift] Manifest paths + inventory counts
cloud-vfs reconcile [--from-blob] [--fix-index] Drift audit; rebuild index from blob
cloud-vfs prune Remove inventory rows below min size
cloud-vfs offload --dry-run Preview offload candidates
cloud-vfs offload <paths> Upload + index (large files) + inline ref or dir stub
cloud-vfs materialize-stubs Write inline/sidecar refs; migrate legacy file sidecars

Project layout

your-project/
  .cloud-vfs/
    config.env              # account names (commit)
    secrets.env             # keys (gitignored)
    manifest.json           # folder-level policy (commit)
    inventory-policy.json   # min size, include/exclude (commit)
    index/                  # per-file inventory shards
      data/
        ADME.json             # commit benchmark shards
        generated/            # often gitignored — regenerate from blob
  data/
    big.npy                   # inline JSON ref when single file offloaded
    big/.cloudstub            # directory pointer when tree offloaded
  .cursor/skills/cloud-vfs/   # optional

Tracking scope (defaults)

Rule Default
include_prefixes data/ only
min_size_bytes 50 MB (52_428_800)
prefix_min_size_bytes e.g. data/model_weights/ → 5 MB
exclude_prefixes code/, research/, …
Offloaded split trees dir stub blob_prefix for small members; index only large files
Offloaded single files inline ref at original path ("cvfs": 1)

See docs/INVENTORY.md.

One or two archives (Azure and/or AWS)

Set LOCAL_PROVIDER=azure or aws in .cloud-vfs/config.env.

Azure: AZ_LOCAL_*, AZ_REMOTE_* + keys in secrets.env

AWS: AWS_LOCAL_BUCKET, AWS_LOCAL_REGION (uses aws CLI credentials)

Manifest archive keys: local_archive (primary), remote_staging (secondary). See docs/SOURCE_TARGET.md.

Agents

cloud-vfs ensure path/to/file          # before reading cloud-only paths
cloud-vfs register path/to/new.npy     # after creating outputs ≥ min size
cloud-vfs reconcile                    # after compute runs
cloud-vfs offload --dry-run path       # always dry-run + confirm with user
cloud-vfs offload path

Never hand-edit .cloud-vfs/index/*.json.

Environment variables

Variable Purpose
CLOUD_VFS_PROJECT_ROOT Force project root
CLOUD_VFS_CONFIG Path to config.env
CLOUD_VFS_SECRETS Path to secrets.env
CLOUD_VFS_MANIFEST Path to manifest.json

Documentation

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloud_vfs-0.5.6.tar.gz (60.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cloud_vfs-0.5.6-py3-none-any.whl (63.2 kB view details)

Uploaded Python 3

File details

Details for the file cloud_vfs-0.5.6.tar.gz.

File metadata

  • Download URL: cloud_vfs-0.5.6.tar.gz
  • Upload date:
  • Size: 60.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for cloud_vfs-0.5.6.tar.gz
Algorithm Hash digest
SHA256 2c41d5932640daf37d5aefac6c1385cded63f98d6e5cb5a9b6fd57405282846f
MD5 4c638552694b86bac6cbb43e86517e0a
BLAKE2b-256 5785679d19e530e7eff8a170225df95ca9efa49cb0030eff972ea488d1ba0508

See more details on using hashes here.

File details

Details for the file cloud_vfs-0.5.6-py3-none-any.whl.

File metadata

  • Download URL: cloud_vfs-0.5.6-py3-none-any.whl
  • Upload date:
  • Size: 63.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for cloud_vfs-0.5.6-py3-none-any.whl
Algorithm Hash digest
SHA256 3fd936097a894a61f3d0579fce91c3726f77b7fa362b9e2bf4ea7a88abc75987
MD5 ce2cb0b142545b8bf30f1b994ef27ef8
BLAKE2b-256 33aa4e6c6fa3e9a8b1ab0ab7b1b5f1dd095df1d1b7d19213085b29d71017af41

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page