Skip to main content

Git-based change detection for folders

Project description

storage-driven-events

Git-based change detection for folders. Detect what changed in a git repository since your last scan and pipe the changes to a handler script.

No daemon, no polling loop — just run storage-events scan whenever you want to check for changes.

How it works

The tool uses a custom git ref (refs/storage-events/last-processed) to track the last commit you processed. On each scan, it runs git diff-tree between that ref and the current HEAD to get a list of added, modified, deleted, and renamed files. If a handler is configured, the changes are piped to it via stdin. The ref only advances after the handler succeeds (exit code 0), giving you automatic retry on failure.

Installation

With pip

pip install storage-driven-events

With uv

uv add storage-driven-events

From source

git clone https://github.com/your-username/storage-driven-events.git
cd storage-driven-events
uv sync

Quick start

Option 1: Interactive setup

storage-events setup

This walks you through cloning a repo, initializing change tracking, and optionally installing a git hook and cron job.

Option 2: Non-interactive setup

storage-events setup \
    --repo git@github.com:user/data-repo.git \
    --path ./data-repo \
    --branch main

Option 3: Manual setup on an existing repo

cd /path/to/your/repo
storage-events scan  # First run initializes tracking

Usage

Scanning for changes

# Scan the current directory
storage-events scan

# Scan a specific repo
storage-events scan /path/to/repo

# Pull first, then scan
storage-events scan --pull

# Preview changes without advancing the ref
storage-events scan --dry-run

# Pipe changes to a custom handler
storage-events scan --handler ./my-handler.sh

Change output format

Changes are printed as tab-separated lines matching git diff-tree --name-status output:

A       reports/q1-summary.pdf
M       data/metrics.csv
D       tmp/scratch.txt
R       docs/guide-v2.md

Status codes: A (added), M (modified), D (deleted), R (renamed), C (copied).

Setup command

# Interactive mode (prompts for everything)
storage-events setup

# Non-interactive mode
storage-events setup --repo <url> [options]

Options:

Flag Default Description
--repo URL Git repository URL (required for non-interactive)
--path PATH ./<repo-name> Local clone path
--branch NAME main Branch to track
--handler PATH built-in Path to handler script
--cron MINUTES Set up cron polling at this interval
--no-hook Skip post-merge hook installation

The setup command:

  1. Clones the repo (or reuses an existing clone)
  2. Initializes the last-processed ref to the current HEAD
  3. Installs a default handler (default-handler.py) that pretty-prints changes
  4. Installs a post-merge git hook (so changes are shown automatically after git pull)
  5. Optionally sets up a cron job for automated polling

Writing handlers

A handler is any executable that reads tab-separated change lines from stdin. Exit 0 to mark changes as processed (advances the ref). Exit non-zero to leave the ref unchanged so the same changes are retried on the next scan.

Default handler

The setup command installs default-handler.py, which pretty-prints changes:

ADDED      reports/q1-summary.pdf
MODIFIED   data/metrics.csv
DELETED    tmp/scratch.txt

Example: Slack notification

#!/usr/bin/env bash
payload=$(jq -Rs '{text: ("Files changed:\n" + .)}' <<< "$(cat)")
curl -s -X POST -H 'Content-type: application/json' \
    --data "$payload" "$SLACK_WEBHOOK_URL"

Example: Process only CSV files

#!/usr/bin/env bash
while IFS=$'\t' read -r status file; do
    if [[ "$file" == *.csv && "$status" != "D" ]]; then
        python3 pipeline.py "$file"
    fi
done

Example: Python handler

#!/usr/bin/env python3
import sys

for line in sys.stdin:
    status, filepath = line.strip().split("\t", 1)
    if status == "A":
        print(f"New file detected: {filepath}")
        # Do something with the new file...

Automation

Cron

Poll every 15 minutes:

# Via setup
storage-events setup --repo <url> --cron 15

# Or add manually to crontab
*/15 * * * * cd /path/to/repo && git pull -q && storage-events scan

Post-merge hook

The setup command installs a .git/hooks/post-merge hook that shows changes automatically after every git pull. You can customize the handler with the STORAGE_EVENTS_HANDLER environment variable:

STORAGE_EVENTS_HANDLER=./notify.sh git pull

Launchd (macOS)

Create ~/Library/LaunchAgents/com.storage-events.scan.plist for a timer-based approach that's more reliable than cron on macOS.

How the ref works

The tool stores a single git ref at refs/storage-events/last-processed inside the repo's .git directory. This ref points to the last commit that was successfully processed.

  • First scan: The ref is created pointing to the current HEAD. No changes are reported.
  • Subsequent scans: Changes between the ref and HEAD are reported. If the handler succeeds, the ref advances to HEAD.
  • Handler failure: The ref stays where it is. The next scan will report the same changes again.
  • No external state: Everything lives inside the git repo. No config files, databases, or lock files.

Requirements

  • Python 3.10+
  • Git
  • No runtime dependencies (stdlib only)

Development

git clone https://github.com/your-username/storage-driven-events.git
cd storage-driven-events
uv sync
uv run pytest -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

storage_driven_events-0.1.0.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

storage_driven_events-0.1.0-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file storage_driven_events-0.1.0.tar.gz.

File metadata

  • Download URL: storage_driven_events-0.1.0.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for storage_driven_events-0.1.0.tar.gz
Algorithm Hash digest
SHA256 43a685bb2e8b736eb3dea49e07d575126313d38e61dd10369bd29404e48444b8
MD5 5016274d0df46dd26f22936abde58c3c
BLAKE2b-256 81ca8c0695c82573a1188bba6111244c2b916e55fd9041465272e084b876e34f

See more details on using hashes here.

Provenance

The following attestation bundles were made for storage_driven_events-0.1.0.tar.gz:

Publisher: publish.yml on vbalasu/storage-driven-events

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file storage_driven_events-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for storage_driven_events-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e61532742b6060a839c8e10a0958c811a633741e5cfbb1e47bc37b2846333de9
MD5 9247c2cceb26467de7cd931205a7ff48
BLAKE2b-256 063a22d9dc98afc21aa96ace049f4aebe5a7cfdc9f87c05d11c8f14a79df51d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for storage_driven_events-0.1.0-py3-none-any.whl:

Publisher: publish.yml on vbalasu/storage-driven-events

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page