Skip to main content

Rust-powered filesystem toolkit: fast walk, parallel hashing, bulk copy, file watching, deduplication, content search, directory diff/sync, snapshots, disk usage, and batch rename.

Project description

pyfs_watcher

Rust-powered filesystem toolkit for Python. Fast recursive directory listing, parallel file hashing, bulk copy/move with progress, cross-platform file watching, file deduplication, content search, directory diff/sync, snapshots, disk usage, and batch rename.

Install

pip install pyfs_watcher

From source:

pip install maturin
maturin develop

Usage

Walk directories (parallel, faster than os.walk)

import pyfs_watcher

# Streaming iterator
for entry in pyfs_watcher.walk("/data", file_type="file", glob_pattern="*.py"):
    print(entry.path, entry.file_size)

# Bulk collect (faster when you need all results)
entries = pyfs_watcher.walk_collect("/data", max_depth=3, sort=True, skip_hidden=True)

Hash files (parallel SHA256/BLAKE3)

# Single file
result = pyfs_watcher.hash_file("large.iso", algorithm="blake3")
print(result.hash_hex)

# Parallel batch hashing
results = pyfs_watcher.hash_files(paths, algorithm="blake3", callback=lambda r: print(r.path))

Copy/move with progress

def on_progress(p):
    pct = p.bytes_copied / p.total_bytes * 100
    print(f"{pct:.0f}% - {p.current_file}")

pyfs_watcher.copy_files(sources, "/dest", progress_callback=on_progress)
pyfs_watcher.move_files(sources, "/dest")  # rename if same fs, copy+delete otherwise

Watch for file changes

# Sync
with pyfs_watcher.FileWatcher("/data", debounce_ms=500, ignore_patterns=["*.tmp"]) as w:
    for changes in w:
        for c in changes:
            print(c.path, c.change_type)  # "created", "modified", "deleted"

# Async
async for changes in pyfs_watcher.async_watch("/data"):
    for c in changes:
        print(c.path, c.change_type)

Find duplicate files

groups = pyfs_watcher.find_duplicates(
    ["/photos", "/backup"],
    min_size=1024,
    progress_callback=lambda stage, done, total: print(f"{stage}: {done}/{total}"),
)
for g in groups:
    print(f"{g.file_size}B x {len(g.paths)} copies = {g.wasted_bytes}B wasted")

Search file contents (parallel regex)

# Find all files containing "TODO" in Python files
results = pyfs_watcher.search("/project", r"TODO", glob_pattern="*.py")
for r in results:
    for m in r.matches:
        print(f"  {r.path}:{m.line_number}: {m.line_text.strip()}")

# Streaming mode
for r in pyfs_watcher.search_iter("/project", r"FIXME"):
    print(r.path, r.match_count)

Compare directories

diff = pyfs_watcher.diff_dirs("/original", "/copy", detect_moves=True)
print(f"Added: {len(diff.added)}, Removed: {len(diff.removed)}, "
      f"Modified: {len(diff.modified)}, Moved: {len(diff.moved)}")

Sync directories

result = pyfs_watcher.sync("/source", "/backup", delete_extra=True)
print(f"Copied: {len(result.copied)}, Deleted: {len(result.deleted)}, "
      f"Skipped: {len(result.skipped)}")

# Preview changes without writing
result = pyfs_watcher.sync("/source", "/backup", dry_run=True)

Snapshot and verify file integrity

# Take a snapshot
snap = pyfs_watcher.snapshot("/important_data")
snap.save("baseline.json")

# Later, verify nothing changed
result = pyfs_watcher.verify("baseline.json")
if not result.ok:
    for c in result.modified:
        print(f"Modified: {c.path}")
    for c in result.removed:
        print(f"Removed: {c.path}")

Disk usage

usage = pyfs_watcher.disk_usage("/data")
print(f"Total: {usage.total_size:,} bytes in {usage.total_files} files")
for child in usage.children[:5]:  # top 5 largest
    print(f"  {child.path}: {child.size:,} bytes")

Bulk rename

# Preview renames (dry_run=True by default)
result = pyfs_watcher.bulk_rename("/photos", r"IMG_(\d+)", r"photo_\1")
for entry in result.renamed:
    print(f"  {entry.old_name} -> {entry.new_name}")

# Apply renames
result = pyfs_watcher.bulk_rename("/photos", r"IMG_(\d+)", r"photo_\1", dry_run=False)
# Undo if needed
result.undo()

API

All functions raise typed exceptions inheriting from FsWatcherError:

  • WalkError - directory walk failures
  • HashError - hashing failures
  • CopyError - copy/move failures
  • WatchError - file watching failures
  • SearchError - content search failures
  • DirDiffError - directory diff failures
  • SyncError - sync failures
  • SnapshotError - snapshot/verify failures
  • DiskUsageError - disk usage failures
  • RenameError - bulk rename failures

Standard FileNotFoundError and PermissionError are raised for I/O errors.

Development

# Setup
uv venv && source .venv/bin/activate
uv pip install maturin pytest pytest-asyncio pytest-timeout

# Build
maturin develop

# Test
cargo test        # Rust tests
pytest tests/     # Python tests

# Benchmark
python benches/bench_walk.py
python benches/bench_hash.py

Tech

  • Rust + PyO3 for Python bindings
  • jwalk for parallel directory traversal
  • BLAKE3/SHA-256 for hashing with rayon parallelism
  • notify + debouncer for cross-platform file watching
  • Staged dedup pipeline: size grouping -> partial hash -> full hash
  • regex crate for parallel content search
  • serde/serde_json for snapshot serialization
  • chrono for timestamps

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyfs_watcher-0.2.0.tar.gz (122.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyfs_watcher-0.2.0-cp39-abi3-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.9+Windows x86-64

pyfs_watcher-0.2.0-cp39-abi3-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

pyfs_watcher-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file pyfs_watcher-0.2.0.tar.gz.

File metadata

  • Download URL: pyfs_watcher-0.2.0.tar.gz
  • Upload date:
  • Size: 122.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyfs_watcher-0.2.0.tar.gz
Algorithm Hash digest
SHA256 fa48fe1c615ece80af669de2ff5c04aa7159abe3c36724ba5a4e74b9248867fe
MD5 da2e04897afff0e151840e79926070e3
BLAKE2b-256 68639f82df558724c47757d1599659ef808555e4b61fdc2fec047cf0b16bba8a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyfs_watcher-0.2.0.tar.gz:

Publisher: publish.yml on pratyush618/pyfs-watcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyfs_watcher-0.2.0-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: pyfs_watcher-0.2.0-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyfs_watcher-0.2.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 6d4f460249bb9a6f406279b94aaf72b4929f6a9b26249a63938772b3228133f4
MD5 457c36f73ceaa8afb98da22c3fee0c40
BLAKE2b-256 be81b3f37eefbd47da6e9c57e67d89a1b5207f76dff521bbbe6ee18415770ef2

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyfs_watcher-0.2.0-cp39-abi3-win_amd64.whl:

Publisher: publish.yml on pratyush618/pyfs-watcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyfs_watcher-0.2.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyfs_watcher-0.2.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9dfe179a0afd30c8665da15ac66d9915e1f2bccbe8625f22b531db5c04b377b9
MD5 753676d183cb0d9e8cfc250c5588b745
BLAKE2b-256 cfe610720978d75316d7630e10cd07aa3474b60c38961d9f44a0f4f9c10d39f2

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyfs_watcher-0.2.0-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: publish.yml on pratyush618/pyfs-watcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyfs_watcher-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyfs_watcher-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2e0fee119faddc5c81914a4a3387b655261e1579ce30896477c2774848392339
MD5 3954e721982402948871b7d6445a9cb9
BLAKE2b-256 68822126435da7096a6645fc31dea41afecd046b062725696399f7d7733d51b9

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyfs_watcher-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on pratyush618/pyfs-watcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page