git-like data management for arbitrary data trees: workspaces, checkouts, remotes, and sync
Project description
🌲 forest 
git-like data management for arbitrary data trees.
Forest is the data-side parallel to git's version control. Git tracks code in
.git/; forest tracks large data — local layout plus remote sync — in
.forest/. It borrows git's mental model and verbs (checkout, status,
push, pull, remote, a HEAD-style pointer) so your git intuition carries
over, but the two domains never overlap and neither requires the other.
Forest is domain-agnostic and self-contained: it manages any data trees, knows nothing about what the data means, and depends on no other project. It moves files and tracks their sync state; it does not validate or interpret their contents.
Model
Fixed-depth, no arbitrary nesting:
workspace → checkout → stage → unit → files
- Workspace — a per-repo
.forest/control area (registry + active pointer). - Checkout — a named data view/focus registered in the workspace (like a git branch you stay rooted in). Switching is an O(1) pointer rewrite; data never moves.
- Stage — a named data category inside a checkout, with a remote layout.
- Unit — one addressable item within a stage (a subdirectory, a directory,
or a file, per the stage's
sync_by).
Install
Requires rclone on $PATH for transfers:
pip install -e ./forest
Quick start
No config files are hand-written. Onboarding is a few commands:
forest init # create the nameless .forest/ workspace container
forest checkout demo # create if absent, register + activate 'demo'
forest remote add origin s3://my-bucket/prefix # allowed before any local data exists
forest add raw ./data/raw # register stage 'raw' and bind it to a local path
forest push # sync every bound stage to the active remote
forest initcreates only.forest/config.yaml(version: 1,checkouts: {}) and a managed.gitignore. No root config, no checkout, no active pointer.forest checkout <name>switches to the checkout, creating, registering, and activating it first (with.forest/checkouts/<name>/forest.yaml) when the name is not registered.forest checkout create <name>is the explicit form.- Remotes can be added before any local binding — useful when your data is remote-only at first.
forest add STAGE PATHregisters a new stage and binds it to a local path in one step (useforest bindto rebind an existing stage).
Metadata layout
Everything forest owns lives under .forest/; your data does not.
.forest/
config.yaml # workspace registry: version, checkouts{}
HEAD # active checkout name (gitignored)
checkouts/
demo/
forest.yaml # shared: stages, remotes, manifest
local.yaml # user-local: active_remote, stage_paths (gitignored)
sync_state.json # user-local push/pull state (gitignored)
Shared metadata (config.yaml, each forest.yaml) is committed so a fresh
clone bootstraps with bind + remote use + pull. User-local files
(HEAD, local.yaml, sync_state.json) are gitignored.
Commands
| Command | Purpose |
|---|---|
forest init |
Create the workspace container, or report setup status if it exists. |
forest checkout create/adopt/list/current/remove <name> |
Manage checkouts; bare forest checkout <name> switches, creating first if needed. remove --yes skips the prompt for scripts. |
forest add STAGE PATH [--sync-by MODE] |
Register a new stage and bind it to a local path; --sync-by picks unit discovery (subdirectory/directory/file). |
forest bind [STAGE PATH] / forest unbind STAGE |
Manage local stage↔path bindings. |
forest remote add/remove/list/use/show |
Manage remotes; use selects the active remote (optional while only one remote exists). |
forest push / pull / status / diff / ls |
Sync and inspect against the active remote. Bare push/pull/status/diff cover every bound stage (unbound stages warn and skip); --all requires all stages bound. |
forest flow |
Emit a Mermaid data-flow diagram of the active checkout. |
forest migrate |
Migrate a legacy biostore layout in place (see below). |
Run any command with -C <path> to operate on another repo without cd.
Forest syncs all files in a data unit, skipping OS junk (.DS_Store,
AppleDouble ._*, *.tmp). It applies no content-based include/exclude rules.
Config reference
Checkout forest.yaml (shared, committed):
project: demo
remotes:
origin:
url: s3://my-bucket/prefix
region: us-east-2 # optional; also endpoint, profile, key_file, known_hosts
stages:
raw:
remote_path: demo/raw # optional; defaults to <checkout>/<stage>
sync_by: subdirectory # subdirectory | directory | file
Checkout local.yaml (per-machine, gitignored):
active_remote: origin
stage_paths:
raw: ../data/raw # relative resolves from the workspace root
Environment variables
All optional, all off by default — forest is silent and sends nothing anywhere
unless configured. Copy .env.example for a commented template; operational
guides live in docs/runbooks/.
| Variable | Default | Effect |
|---|---|---|
FOREST_LOG_FILE |
unset | Append structured logs (JSON lines) to this file. |
FOREST_LOG_FORMAT |
json |
json or text; set without FOREST_LOG_FILE to log to stderr. |
FOREST_LOG_LEVEL |
INFO |
Standard logging level name. |
FOREST_METRICS_FILE |
unset | Append metric samples as JSON lines for external collectors. |
FOREST_ANALYTICS_FILE |
unset | Opt-in local usage analytics (JSON lines); nothing leaves the machine. |
FOREST_SENTRY_DSN |
unset | Sentry error tracking; needs pip install "forest-cli[observability]". |
FOREST_ALERT_WEBHOOK |
unset | POST failure alerts to this HTTPS endpoint (Slack/Mattermost compatible). |
FOREST_TRANSFER_RETRIES |
2 |
Extra attempts for transient rclone failures; 0 disables. |
FOREST_RETRY_BASE_DELAY |
0.5 |
Initial retry backoff in seconds; doubles per attempt. |
FOREST_BREAKER_THRESHOLD |
5 |
Consecutive transfer failures before the circuit opens; 0 disables. |
FOREST_BREAKER_RESET_SECONDS |
60 |
Cool-down before an open circuit allows a probe operation. |
FOREST_FLAGS |
unset | Comma-separated feature flags; raw-logs disables log secret-scrubbing. |
Dogfood: this repo runs forest
This repository manages its own examples/ tree with forest — a live
demonstration that .forest/ and .git/ coexist without overlapping. It was
set up with exactly the quick-start commands:
forest init
forest checkout demo
forest remote add origin s3://my-forest-bucket --region us-east-2
forest add examples ./examples
forest push
Inspect the result:
git ls-files .forest # what a clone gets: config.yaml + checkouts/demo/forest.yaml
cat .gitignore # forest-managed: HEAD, local.yaml, sync_state.json stay local
forest status # sync state of the examples stage
A fresh clone bootstraps the local half with forest bind examples ./examples
followed by forest pull (the single configured remote is used automatically).
Pulling needs AWS credentials for the bucket; the layout is the demonstration.
Migrating from biostore
Existing biostore repos use .biostore/ and biostore.yaml. Migrate in place:
forest migrate
This renames .biostore/ → .forest/, each biostore.yaml → forest.yaml,
rewrites the managed .gitignore patterns, and verifies the registry parses. It
refuses to run if a .forest/ already exists.
Notes
- Single active machine (v1).
HEAD/local.yaml/sync_state.jsonare git-invisible but may be synced by a file-syncing tool; forest assumes one active machine and uses atomic writes plus a per-checkoutflockfor intra-machine write races. - Real filenames. Forest stores data under real paths, not a content-addressed blob store.
- See
docs/adr/for the design decisions behind the workspace/checkout model.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file forest_cli-0.2.0.tar.gz.
File metadata
- Download URL: forest_cli-0.2.0.tar.gz
- Upload date:
- Size: 169.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
96d0700058e882d8d01c716c8b3592f6a3240b3957eeebeb7b5f22779c06f333
|
|
| MD5 |
c30912d0c464bf1bf377b568a63e832d
|
|
| BLAKE2b-256 |
adf6083117c8fb370802ac60746833ae791783417299de189cd9efd5706fed6b
|
Provenance
The following attestation bundles were made for forest_cli-0.2.0.tar.gz:
Publisher:
publish.yaml on tmsincomb/forest
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
forest_cli-0.2.0.tar.gz -
Subject digest:
96d0700058e882d8d01c716c8b3592f6a3240b3957eeebeb7b5f22779c06f333 - Sigstore transparency entry: 2045393752
- Sigstore integration time:
-
Permalink:
tmsincomb/forest@cf01c20f817549fac07a3e51ee36ee42d59d01cd -
Branch / Tag:
refs/heads/main - Owner: https://github.com/tmsincomb
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@cf01c20f817549fac07a3e51ee36ee42d59d01cd -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file forest_cli-0.2.0-py3-none-any.whl.
File metadata
- Download URL: forest_cli-0.2.0-py3-none-any.whl
- Upload date:
- Size: 63.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
628cd846a5704ccf8f68cf64d15a545c87f1d03a18921916948c3756a3cf30f9
|
|
| MD5 |
65f5446db47bebc2f8e3c8636218c9a3
|
|
| BLAKE2b-256 |
70816717349485f37a2d0bbaec3daf1fc83adf6f4622e985c624b8ed7e942b67
|
Provenance
The following attestation bundles were made for forest_cli-0.2.0-py3-none-any.whl:
Publisher:
publish.yaml on tmsincomb/forest
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
forest_cli-0.2.0-py3-none-any.whl -
Subject digest:
628cd846a5704ccf8f68cf64d15a545c87f1d03a18921916948c3756a3cf30f9 - Sigstore transparency entry: 2045393824
- Sigstore integration time:
-
Permalink:
tmsincomb/forest@cf01c20f817549fac07a3e51ee36ee42d59d01cd -
Branch / Tag:
refs/heads/main - Owner: https://github.com/tmsincomb
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@cf01c20f817549fac07a3e51ee36ee42d59d01cd -
Trigger Event:
workflow_dispatch
-
Statement type: