Skip to main content

Rust-powered filesystem toolkit: fast walk, parallel hashing, bulk copy, file watching, deduplication, content search, directory diff/sync, snapshots, disk usage, and batch rename.

Project description

pyfs-watcher logo

PyPI Python License CI

pyfs_watcher

Rust-powered filesystem toolkit for Python. Fast recursive directory listing, parallel file hashing, bulk copy/move with progress, cross-platform file watching, file deduplication, content search, directory diff/sync, snapshots, disk usage, and batch rename.

Install

pip install pyfs_watcher

From source:

pip install maturin
maturin develop

Usage

Walk directories (parallel, faster than os.walk)

import pyfs_watcher

# Streaming iterator
for entry in pyfs_watcher.walk("/data", file_type="file", glob_pattern="*.py"):
    print(entry.path, entry.file_size)

# Bulk collect (faster when you need all results)
entries = pyfs_watcher.walk_collect("/data", max_depth=3, sort=True, skip_hidden=True)

Hash files (parallel SHA256/BLAKE3)

# Single file
result = pyfs_watcher.hash_file("large.iso", algorithm="blake3")
print(result.hash_hex)

# Parallel batch hashing
results = pyfs_watcher.hash_files(paths, algorithm="blake3", callback=lambda r: print(r.path))

Copy/move with progress

def on_progress(p):
    pct = p.bytes_copied / p.total_bytes * 100
    print(f"{pct:.0f}% - {p.current_file}")

pyfs_watcher.copy_files(sources, "/dest", progress_callback=on_progress)
pyfs_watcher.move_files(sources, "/dest")  # rename if same fs, copy+delete otherwise

Watch for file changes

# Sync
with pyfs_watcher.FileWatcher("/data", debounce_ms=500, ignore_patterns=["*.tmp"]) as w:
    for changes in w:
        for c in changes:
            print(c.path, c.change_type)  # "created", "modified", "deleted"

# Async
async for changes in pyfs_watcher.async_watch("/data"):
    for c in changes:
        print(c.path, c.change_type)

Find duplicate files

groups = pyfs_watcher.find_duplicates(
    ["/photos", "/backup"],
    min_size=1024,
    progress_callback=lambda stage, done, total: print(f"{stage}: {done}/{total}"),
)
for g in groups:
    print(f"{g.file_size}B x {len(g.paths)} copies = {g.wasted_bytes}B wasted")

Search file contents (parallel regex)

# Find all files containing "TODO" in Python files
results = pyfs_watcher.search("/project", r"TODO", glob_pattern="*.py")
for r in results:
    for m in r.matches:
        print(f"  {r.path}:{m.line_number}: {m.line_text.strip()}")

# Streaming mode
for r in pyfs_watcher.search_iter("/project", r"FIXME"):
    print(r.path, r.match_count)

Compare directories

diff = pyfs_watcher.diff_dirs("/original", "/copy", detect_moves=True)
print(f"Added: {len(diff.added)}, Removed: {len(diff.removed)}, "
      f"Modified: {len(diff.modified)}, Moved: {len(diff.moved)}")

Sync directories

result = pyfs_watcher.sync("/source", "/backup", delete_extra=True)
print(f"Copied: {len(result.copied)}, Deleted: {len(result.deleted)}, "
      f"Skipped: {len(result.skipped)}")

# Preview changes without writing
result = pyfs_watcher.sync("/source", "/backup", dry_run=True)

Snapshot and verify file integrity

# Take a snapshot
snap = pyfs_watcher.snapshot("/important_data")
snap.save("baseline.json")

# Later, verify nothing changed
result = pyfs_watcher.verify("baseline.json")
if not result.ok:
    for c in result.modified:
        print(f"Modified: {c.path}")
    for c in result.removed:
        print(f"Removed: {c.path}")

Disk usage

usage = pyfs_watcher.disk_usage("/data")
print(f"Total: {usage.total_size:,} bytes in {usage.total_files} files")
for child in usage.children[:5]:  # top 5 largest
    print(f"  {child.path}: {child.size:,} bytes")

Bulk rename

# Preview renames (dry_run=True by default)
result = pyfs_watcher.bulk_rename("/photos", r"IMG_(\d+)", r"photo_\1")
for entry in result.renamed:
    print(f"  {entry.old_name} -> {entry.new_name}")

# Apply renames
result = pyfs_watcher.bulk_rename("/photos", r"IMG_(\d+)", r"photo_\1", dry_run=False)
# Undo if needed
result.undo()

API

All functions raise typed exceptions inheriting from FsWatcherError:

  • WalkError - directory walk failures
  • HashError - hashing failures
  • CopyError - copy/move failures
  • WatchError - file watching failures
  • SearchError - content search failures
  • DirDiffError - directory diff failures
  • SyncError - sync failures
  • SnapshotError - snapshot/verify failures
  • DiskUsageError - disk usage failures
  • RenameError - bulk rename failures

Standard FileNotFoundError and PermissionError are raised for I/O errors.

Development

# Setup
uv venv && source .venv/bin/activate
uv pip install maturin pytest pytest-asyncio pytest-timeout

# Build
maturin develop

# Test
cargo test        # Rust tests
pytest tests/     # Python tests

# Benchmark
python benches/bench_walk.py
python benches/bench_hash.py

Tech

  • Rust + PyO3 for Python bindings
  • jwalk for parallel directory traversal
  • BLAKE3/SHA-256 for hashing with rayon parallelism
  • notify + debouncer for cross-platform file watching
  • Staged dedup pipeline: size grouping -> partial hash -> full hash
  • regex crate for parallel content search
  • serde/serde_json for snapshot serialization
  • chrono for timestamps

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyfs_watcher-0.3.0.tar.gz (253.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyfs_watcher-0.3.0-cp39-abi3-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.9+Windows x86-64

pyfs_watcher-0.3.0-cp39-abi3-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

pyfs_watcher-0.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file pyfs_watcher-0.3.0.tar.gz.

File metadata

  • Download URL: pyfs_watcher-0.3.0.tar.gz
  • Upload date:
  • Size: 253.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyfs_watcher-0.3.0.tar.gz
Algorithm Hash digest
SHA256 f7a3e365db6c8430b4056fc37c39d785aa28980fdc1449dbd678c0036608c3e5
MD5 05aa2625a2dde23e99cd3f2ebcecc577
BLAKE2b-256 4fd595024ae7848ee015ad95148865b2364ef2633c88d2df39aaa6358f490819

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyfs_watcher-0.3.0.tar.gz:

Publisher: publish.yml on pratyush618/pyfs-watcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyfs_watcher-0.3.0-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: pyfs_watcher-0.3.0-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyfs_watcher-0.3.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 2fa4fd944aa789b12c665bc3a68a40ec68e4dbdeb631c482f5fc3b77f842f873
MD5 4ee0554cfdb26b98902efde139085c47
BLAKE2b-256 22ba49363edce6e30d8dc3cbeabef341ff8bbd435782f0083a8a8fb4fa734432

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyfs_watcher-0.3.0-cp39-abi3-win_amd64.whl:

Publisher: publish.yml on pratyush618/pyfs-watcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyfs_watcher-0.3.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyfs_watcher-0.3.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2af2ec59d432d54daef9eff2ea675f5dfb46ae805ddbe273f31397eddff427fe
MD5 be2e3a9742db5348ea3f9fef566c818f
BLAKE2b-256 bc048fda3451475917f141f3c288b10dcd4c3a62476fce3499f5789d16837800

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyfs_watcher-0.3.0-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: publish.yml on pratyush618/pyfs-watcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyfs_watcher-0.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyfs_watcher-0.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b4d3694d5e9d23b5df0e88814a8f1671c58fa76494f70d407abaf9a650826938
MD5 c295bf071c4ddb0f6fe9f0056cd0b1f9
BLAKE2b-256 fe9e7353d2fbe10b997b0980c6cd70a896ec85e358434599453d426dab998226

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyfs_watcher-0.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on pratyush618/pyfs-watcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page