Skip to main content

Create timestamp records for recursive operations on directory trees.

Project description

📦 Treestamps

Fast, persistent timestamps for recursive filesystem operations.

Treestamps lets you skip work you’ve already done.

If your program walks directory trees and processes files (optimization, transcoding, validation, etc.), Treestamps tracks what you've already handled—so subsequent runs are incremental, not repetitive.


🚀 Why Treestamps?

Treestamps gives you:

  • Persistent state across runs
  • O(1) “have I seen this file before?” checks
  • Automatic invalidation when config changes
  • No database dependency (just YAML files)
  • Safe writes via WAL (write-ahead log)
  • Quiet by default — your program owns user-facing output

🧠 Mental model

Treestamps is built around three concepts:

1. Grove

A Grovestamps instance manages timestamps across multiple root paths.

2. Tree

Each configured path (e.g. /photos) is a tree.

3. Stamp

Each file gets a timestamp keyed by its relative path within the tree.

Examples

Full use

from pathlib import Path
from treestamps import Grovestamps, GrovestampsConfig

config = GrovestampsConfig(
    "MyProgram",
    paths=("/data/photos", "/data/videos"),
    program_config={"quality": 90}
)

gs = Grovestamps(config)

ts = gs[Path("/data/photos")].get()

if ts.get("img001.jpg") is None:
    process("img001.jpg")
    ts.set("img001.jpg")

gs.dumpf()

Skip unchanged files

for file in files:
    if ts.get(file) is not None:
        continue  # already processed

    process(file)
    ts.set(file)

Invalidate when config changes

GrovestampsConfig(
    "MyProgram",
    paths=("/data",),
    program_config={"quality": 80}
)

If you later change:

program_config={"quality": 90}

👉 All timestamps are invalidated automatically.

Multi-root trees

config = GrovestampsConfig(
    "MyProgram",
    paths=("/a", "/b"),
)

Each root gets its own timestamp file, but shares config logic.

⚙️ How it works

Treestamps uses two files per root directory:

1. WAL file (write-ahead log)

.MyProgram_treestamps.wal.yaml
  • Appended during runtime
  • Fast writes
  • Crash-safe

2. Final snapshot

.MyProgram_treestamps.yaml
  • Written on dump()
  • Compact
  • Used on next startup

Lifecycle

  1. Load .yaml (if exists)
  2. Replay .wal.yaml (if exists)
  3. Serve reads/writes in memory
  4. Append writes to WAL
  5. On dumpf():
    • Merge everything
    • Write .yaml
    • Delete WAL

💾 When to call dumpf()

dumpf() commits the in memory treestamps data to disk.

Call it when

  • At the end of a successful run
  • After processing a large batch
  • Before shutdown in long-running processes

Don’t call it

  • After every file (too slow)
  • If the run failed (you may want to discard progress)

🤫 Output and progress

Treestamps does not print progress, status, or success messages. Reporting what's happening to your users is your program's job.

The only output Treestamps emits is a handful of error messages from loadf(), loads(), and the WAL load path when YAML or timestamp entries can't be parsed. Set verbose=0 on your config to suppress those too.

Reporting from return values

Treestamps.loadf(), Treestamps.loads(), and Treestamps.dumpf() return a bool so you can drive your own logging or progress UI:

  • loadf() / loads()True on a successful load
  • dumpf()True if a write to disk actually happened, False if there was nothing new to commit (no set() since the last dump and no consumed child timestamp files)
if ts.dumpf():
    print(f"Saved timestamps for {top_path}")

🧨 Error handling

Treestamps is designed to be robust but not magical.

Corrupt YAML

If .yaml is unreadable or treated as missing the WAL may still recover recent writes

WAL corruption

  • Partial WAL entries may be ignored
  • Worst case: last few writes lost (not the entire dataset)

Config mismatch

  • If program_config changes:
    • Old timestamps are ignored
    • No partial reuse

Missing files

  • If a file disappears:
    • Its stamp remains
    • It is your responsibility to handle filesystem drift

🧩 Configuration (GrovestampsConfig)

GrovestampsConfig(
    program_name: str,
    paths: Iterable[str | Path],
    program_config: dict = None,
    verbose: int = 0,
    wal: bool = True,
)

Fields

program_name

  • Used in filenames:

    .<program_name>_treestamps.yaml
    

paths

  • Root directories to manage
  • Each gets its own stamp file

program_config

  • Arbitrary dict
  • Included in hash/signature
  • Changing it invalidates all timestamps

verbose

  • 0 (default): silent — no output at all
  • >0: print error messages from load and WAL load failures

wal (if supported)

  • Enables/disables WAL behavior
  • Disabling may reduce safety but simplify writes

🧾 YAML file format

Snapshot file

version: 1
program: MyProgram
config_hash: abc123

timestamps:
    img001.jpg: 1700000000.123
    img002.jpg: 1700000001.456

WAL file

- set:
      path: img003.jpg
      time: 1700000002.789
- set:
      path: img004.jpg
      time: 1700000003.000

Notes

  • Paths are relative to root
  • Timestamps are typically float seconds
  • WAL is append-only

🧪 Real-world use cases

🖼️ Image optimization (picopt)

In picopt:

  • Avoid re-optimizing unchanged images
  • Skip entire archives if contents are unchanged
  • Handle millions of files efficiently
  • Handle config changes (e.g. compression settings) by invalidating stamps ([New Releases][1])

🎬 Media cleanup (nudebomb)

In nudebomb:

  • Avoid reprocessing already-cleaned MKVs
  • Track work across large media libraries
  • Resume interrupted runs safely

🧰 General pattern

Treestamps is ideal for anything that

  • walks a tree
  • does expensive work
  • runs repeatedly

🛠️ Troubleshooting

“Everything is reprocessing every run”

  • Did program_config change?
  • Did program_name change?
  • Are you calling dumpf()?

“Timestamps not persisting”

  • Ensure dumpf() is called (and check its return value — False means nothing was written)
  • Check write permissions in root directories

“Unexpected invalidation”

  • Any change in program_config invalidates all stamps
  • Even ordering or defaults may matter

“WAL file keeps growing”

  • You’re not calling dumpf()
  • WAL is expected to grow until committed

“Files moved or renamed”

  • Treestamps uses relative paths
  • Renames = treated as new files

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

treestamps-4.0.0.tar.gz (179.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

treestamps-4.0.0-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file treestamps-4.0.0.tar.gz.

File metadata

  • Download URL: treestamps-4.0.0.tar.gz
  • Upload date:
  • Size: 179.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for treestamps-4.0.0.tar.gz
Algorithm Hash digest
SHA256 28100e871302be4420477421403c66499aa177946eb94ac4f9aa22cc5aa66f3a
MD5 fb23626c69642287c38883791a7959e4
BLAKE2b-256 3109893e905230b8eaa3f529bd50e6a7e2bc00fb1c8450f94743c3cdcc3306dc

See more details on using hashes here.

File details

Details for the file treestamps-4.0.0-py3-none-any.whl.

File metadata

  • Download URL: treestamps-4.0.0-py3-none-any.whl
  • Upload date:
  • Size: 18.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for treestamps-4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4c8352cc4d9039d77722bb0c976236ebc04ea02de2328991e5e23ef531d046e6
MD5 5ee0a825956531e31e1cf9bc4a3ca82f
BLAKE2b-256 6acbb234d4809d6a9d403b662ff83d19ea0a5d104ba95f0398cdf4da74c0391f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page