Rust-powered filesystem toolkit: fast walk, parallel hashing, bulk copy, file watching, deduplication, content search, directory diff/sync, snapshots, disk usage, and batch rename.
Project description
pyfs_watcher
Rust-powered filesystem toolkit for Python. Fast recursive directory listing, parallel file hashing, bulk copy/move with progress, cross-platform file watching, file deduplication, content search, directory diff/sync, snapshots, disk usage, and batch rename.
Install
pip install pyfs_watcher
From source:
pip install maturin
maturin develop
Usage
Walk directories (parallel, faster than os.walk)
import pyfs_watcher
# Streaming iterator
for entry in pyfs_watcher.walk("/data", file_type="file", glob_pattern="*.py"):
print(entry.path, entry.file_size)
# Bulk collect (faster when you need all results)
entries = pyfs_watcher.walk_collect("/data", max_depth=3, sort=True, skip_hidden=True)
Hash files (parallel SHA256/BLAKE3)
# Single file
result = pyfs_watcher.hash_file("large.iso", algorithm="blake3")
print(result.hash_hex)
# Parallel batch hashing
results = pyfs_watcher.hash_files(paths, algorithm="blake3", callback=lambda r: print(r.path))
Copy/move with progress
def on_progress(p):
pct = p.bytes_copied / p.total_bytes * 100
print(f"{pct:.0f}% - {p.current_file}")
pyfs_watcher.copy_files(sources, "/dest", progress_callback=on_progress)
pyfs_watcher.move_files(sources, "/dest") # rename if same fs, copy+delete otherwise
Watch for file changes
# Sync
with pyfs_watcher.FileWatcher("/data", debounce_ms=500, ignore_patterns=["*.tmp"]) as w:
for changes in w:
for c in changes:
print(c.path, c.change_type) # "created", "modified", "deleted"
# Async
async for changes in pyfs_watcher.async_watch("/data"):
for c in changes:
print(c.path, c.change_type)
Find duplicate files
groups = pyfs_watcher.find_duplicates(
["/photos", "/backup"],
min_size=1024,
progress_callback=lambda stage, done, total: print(f"{stage}: {done}/{total}"),
)
for g in groups:
print(f"{g.file_size}B x {len(g.paths)} copies = {g.wasted_bytes}B wasted")
Search file contents (parallel regex)
# Find all files containing "TODO" in Python files
results = pyfs_watcher.search("/project", r"TODO", glob_pattern="*.py")
for r in results:
for m in r.matches:
print(f" {r.path}:{m.line_number}: {m.line_text.strip()}")
# Streaming mode
for r in pyfs_watcher.search_iter("/project", r"FIXME"):
print(r.path, r.match_count)
Compare directories
diff = pyfs_watcher.diff_dirs("/original", "/copy", detect_moves=True)
print(f"Added: {len(diff.added)}, Removed: {len(diff.removed)}, "
f"Modified: {len(diff.modified)}, Moved: {len(diff.moved)}")
Sync directories
result = pyfs_watcher.sync("/source", "/backup", delete_extra=True)
print(f"Copied: {len(result.copied)}, Deleted: {len(result.deleted)}, "
f"Skipped: {len(result.skipped)}")
# Preview changes without writing
result = pyfs_watcher.sync("/source", "/backup", dry_run=True)
Snapshot and verify file integrity
# Take a snapshot
snap = pyfs_watcher.snapshot("/important_data")
snap.save("baseline.json")
# Later, verify nothing changed
result = pyfs_watcher.verify("baseline.json")
if not result.ok:
for c in result.modified:
print(f"Modified: {c.path}")
for c in result.removed:
print(f"Removed: {c.path}")
Disk usage
usage = pyfs_watcher.disk_usage("/data")
print(f"Total: {usage.total_size:,} bytes in {usage.total_files} files")
for child in usage.children[:5]: # top 5 largest
print(f" {child.path}: {child.size:,} bytes")
Bulk rename
# Preview renames (dry_run=True by default)
result = pyfs_watcher.bulk_rename("/photos", r"IMG_(\d+)", r"photo_\1")
for entry in result.renamed:
print(f" {entry.old_name} -> {entry.new_name}")
# Apply renames
result = pyfs_watcher.bulk_rename("/photos", r"IMG_(\d+)", r"photo_\1", dry_run=False)
# Undo if needed
result.undo()
API
All functions raise typed exceptions inheriting from FsWatcherError:
WalkError- directory walk failuresHashError- hashing failuresCopyError- copy/move failuresWatchError- file watching failuresSearchError- content search failuresDirDiffError- directory diff failuresSyncError- sync failuresSnapshotError- snapshot/verify failuresDiskUsageError- disk usage failuresRenameError- bulk rename failures
Standard FileNotFoundError and PermissionError are raised for I/O errors.
Development
# Setup
uv venv && source .venv/bin/activate
uv pip install maturin pytest pytest-asyncio pytest-timeout
# Build
maturin develop
# Test
cargo test # Rust tests
pytest tests/ # Python tests
# Benchmark
python benches/bench_walk.py
python benches/bench_hash.py
Tech
- Rust + PyO3 for Python bindings
- jwalk for parallel directory traversal
- BLAKE3/SHA-256 for hashing with rayon parallelism
- notify + debouncer for cross-platform file watching
- Staged dedup pipeline: size grouping -> partial hash -> full hash
- regex crate for parallel content search
- serde/serde_json for snapshot serialization
- chrono for timestamps
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyfs_watcher-0.3.0.tar.gz.
File metadata
- Download URL: pyfs_watcher-0.3.0.tar.gz
- Upload date:
- Size: 253.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7a3e365db6c8430b4056fc37c39d785aa28980fdc1449dbd678c0036608c3e5
|
|
| MD5 |
05aa2625a2dde23e99cd3f2ebcecc577
|
|
| BLAKE2b-256 |
4fd595024ae7848ee015ad95148865b2364ef2633c88d2df39aaa6358f490819
|
Provenance
The following attestation bundles were made for pyfs_watcher-0.3.0.tar.gz:
Publisher:
publish.yml on pratyush618/pyfs-watcher
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyfs_watcher-0.3.0.tar.gz -
Subject digest:
f7a3e365db6c8430b4056fc37c39d785aa28980fdc1449dbd678c0036608c3e5 - Sigstore transparency entry: 1031092181
- Sigstore integration time:
-
Permalink:
pratyush618/pyfs-watcher@5ee25483909f277de406155dbce4064cc5f4bf54 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/pratyush618
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5ee25483909f277de406155dbce4064cc5f4bf54 -
Trigger Event:
release
-
Statement type:
File details
Details for the file pyfs_watcher-0.3.0-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: pyfs_watcher-0.3.0-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2fa4fd944aa789b12c665bc3a68a40ec68e4dbdeb631c482f5fc3b77f842f873
|
|
| MD5 |
4ee0554cfdb26b98902efde139085c47
|
|
| BLAKE2b-256 |
22ba49363edce6e30d8dc3cbeabef341ff8bbd435782f0083a8a8fb4fa734432
|
Provenance
The following attestation bundles were made for pyfs_watcher-0.3.0-cp39-abi3-win_amd64.whl:
Publisher:
publish.yml on pratyush618/pyfs-watcher
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyfs_watcher-0.3.0-cp39-abi3-win_amd64.whl -
Subject digest:
2fa4fd944aa789b12c665bc3a68a40ec68e4dbdeb631c482f5fc3b77f842f873 - Sigstore transparency entry: 1031092226
- Sigstore integration time:
-
Permalink:
pratyush618/pyfs-watcher@5ee25483909f277de406155dbce4064cc5f4bf54 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/pratyush618
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5ee25483909f277de406155dbce4064cc5f4bf54 -
Trigger Event:
release
-
Statement type:
File details
Details for the file pyfs_watcher-0.3.0-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: pyfs_watcher-0.3.0-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2af2ec59d432d54daef9eff2ea675f5dfb46ae805ddbe273f31397eddff427fe
|
|
| MD5 |
be2e3a9742db5348ea3f9fef566c818f
|
|
| BLAKE2b-256 |
bc048fda3451475917f141f3c288b10dcd4c3a62476fce3499f5789d16837800
|
Provenance
The following attestation bundles were made for pyfs_watcher-0.3.0-cp39-abi3-macosx_11_0_arm64.whl:
Publisher:
publish.yml on pratyush618/pyfs-watcher
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyfs_watcher-0.3.0-cp39-abi3-macosx_11_0_arm64.whl -
Subject digest:
2af2ec59d432d54daef9eff2ea675f5dfb46ae805ddbe273f31397eddff427fe - Sigstore transparency entry: 1031092284
- Sigstore integration time:
-
Permalink:
pratyush618/pyfs-watcher@5ee25483909f277de406155dbce4064cc5f4bf54 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/pratyush618
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5ee25483909f277de406155dbce4064cc5f4bf54 -
Trigger Event:
release
-
Statement type:
File details
Details for the file pyfs_watcher-0.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: pyfs_watcher-0.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4d3694d5e9d23b5df0e88814a8f1671c58fa76494f70d407abaf9a650826938
|
|
| MD5 |
c295bf071c4ddb0f6fe9f0056cd0b1f9
|
|
| BLAKE2b-256 |
fe9e7353d2fbe10b997b0980c6cd70a896ec85e358434599453d426dab998226
|
Provenance
The following attestation bundles were made for pyfs_watcher-0.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on pratyush618/pyfs-watcher
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyfs_watcher-0.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
b4d3694d5e9d23b5df0e88814a8f1671c58fa76494f70d407abaf9a650826938 - Sigstore transparency entry: 1031092326
- Sigstore integration time:
-
Permalink:
pratyush618/pyfs-watcher@5ee25483909f277de406155dbce4064cc5f4bf54 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/pratyush618
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5ee25483909f277de406155dbce4064cc5f4bf54 -
Trigger Event:
release
-
Statement type: