Modular Python tool for profiling files, analyzing directory structures, and inspecting image data
Project description
filoma
Fast, multi-backend Python tool for directory analysis and file profiling.
Analyze directory structures, profile files, and inspect image data with automatic performance optimization through Rust, fd, or Python backends.
Documentation: Installation โข Backends โข Advanced Usage โข Benchmarks
Source Code: https://github.com/kalfasyan/filoma
Quick Start
# Install
uv add filoma # or: pip install filoma
from filoma.directories import DirectoryProfiler
profiler = DirectoryProfiler()
res = profiler.probe("/")
profiler.print_summary(res)
Example output:
Directory Analysis: / (๐ฆ Rust (Parallel)) - 29.56s
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ
โ Metric โ Value โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ Total Files โ 2,186,785 โ
โ Total Folders โ 209,401 โ
โ Total Size โ 135,050,621.82 MBโ
โ Average Files per Folder โ 10.44 โ
โ Maximum Depth โ 21 โ
โ Empty Folders โ 7,930 โ
โ Analysis Time โ 29.56 s โ
โ Processing Speed โ 81,074 items/sec โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโ
Key Features
- ๐ 3 Performance Backends - Automatic selection: Rust (~2.3x faster *), fd (competitive), Python (baseline)
- ๐ Directory Analysis - File counts, extensions, empty folders, depth distribution, size statistics
- ๐ Smart File Search - Advanced patterns with regex/glob support via FdSearcher
- ๐ DataFrame Support - Build Polars DataFrames for advanced analysis and filtering
- ๐ผ๏ธ Image Analysis - Profile .tif, .png, .npy, .zarr files with metadata and statistics
- ๐ File Profiling - System metadata, permissions, timestamps, symlink analysis
- ๐จ Rich Terminal Output - Beautiful progress bars and formatted reports
* According to benchmarks
Examples
Directory Analysis
The simplest way to probe a directory and print a summary:
from filoma.directories import DirectoryProfiler
profiler = DirectoryProfiler()
res = profiler.probe("/", max_depth=3)
profiler.print_summary(res)
Async (opt-in) โ good for network filesystems
If traversing NFS/SMB, remote mounts, cloud-fuse, etc., enable async mode to parallelize filesystem calls and improve throughput.
# Async (optโin) scanning for network / highโlatency filesystems
# Enable when traversing NFS/SMB, remote mounts, cloud-fuse, etc.
from filoma.directories import DirectoryProfiler
profiler = DirectoryProfiler(
use_async=True,
network_concurrency=32, # Parallel in-flight filesystem ops
network_timeout_ms=3000, # Per op timeout (ms)
network_retries=2 # Retries for transient errors
)
result = profiler.probe("/mnt/nfs/share")
profiler.print_summary(result)
Tips:
- Lower network_concurrency if the server throttles you; raise for high-latency links.
- Increase network_timeout_ms for very slow metadata calls.
- Retries help with flaky mounts; set to 0 for strict mode.
- Fallback: omit use_async for local SSDs (sync is usually faster there).
Smart File Search
The FdSearcher class provides advanced file searching with regex and glob support, leveraging the high-performance fd tool when available.
from filoma.directories import FdSearcher
searcher = FdSearcher()
# Find Python files
python_files = searcher.find_files(pattern=r"\.py$", max_depth=2)
# Find by multiple extensions
code_files = searcher.find_by_extension(['py', 'rs', 'js'], directory=".")
# Glob patterns
config_files = searcher.find_files(pattern="*.{json,yaml}", use_glob=True)
DataFrame Analysis
filoma can build Polars DataFrames for advanced analysis and filtering, allowing you to leverage the full power of Polars for downstream tasks.
# Build DataFrame for advanced analysis
profiler = DirectoryProfiler(build_dataframe=True)
result = profiler.probe(".")
df = profiler.get_dataframe(result)
# Add path components and probe
df = df.add_path_components().add_file_stats()
python_files = df.filter_by_extension('.py')
df.save_csv("analysis.csv")
File & Image Profiling
Individual file profiling with metadata and image analysis:
from filoma.files import FileProfiler
from filoma.images import PngProfiler
# File metadata
file_profiler = FileProfiler()
# 1) dict-style (legacy) โ returns the same report dict that print_report expects
report = file_profiler.probe("/path/to/file.txt")
file_profiler.print_report(report)
# 2) dataclass-style (recommended) โ returns a `Filo` dataclass with attribute access
# `compute_hash=True` will compute a SHA256 fingerprint (optional/expensive)
filo = file_profiler.probe_filo("/path/to/file.txt", compute_hash=True)
print(filo) # dataclass repr; access fields like filo.path, filo.sha256
print(filo.sha256) # full SHA256 (if computed)
print(filo.to_dict()) # convert to plain dict
# Image analysis
img_profiler = PngProfiler()
img_report = img_profiler.probe("/path/to/image.png")
print(img_report) # Shape, dtype, stats, etc.
Performance
Automatic backend selection for optimal speed:
| Backend | Speed | Use Case |
|---|---|---|
| ๐ฆ Rust | ~70K files/sec | Large directories, DataFrame building |
| ๐ fd | ~46K files/sec | Pattern matching, network filesystems |
| ๐ Python | ~30K files/sec | Universal compatibility, reliable fallback |
Cold cache benchmarks on NVMe SSD. See benchmarks for detailed methodology.
System directories: filoma automatically handles permission errors for directories like /proc, /sys.
Installation & Setup
See installation guide for:
- Quick setup with uv/pip
- Optional performance optimization (Rust/fd)
- Verification and troubleshooting
Documentation
- Installation Guide - Setup and optimization
- Backend Architecture - How the multi-backend system works
- Advanced Usage - DataFrame analysis, pattern matching, backend control
- Performance Benchmarks - Detailed performance analysis and methodology
Project Structure
src/filoma/
โโโ core/ # Backend integrations (fd, Rust)
โโโ directories/ # Directory analysis with 3 backends
โโโ files/ # File profiling and metadata
โโโ images/ # Image analysis (.tif, .png, .npy, .zarr)
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Contributing
Contributions welcome! Please check the issues for planned features and bug reports.
filoma - Fast, multi-backend file and directory analysis for Python.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file filoma-1.4.0.tar.gz.
File metadata
- Download URL: filoma-1.4.0.tar.gz
- Upload date:
- Size: 126.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5829f98b19ee17bf1aaacfd92f3061bd1b50aefdc983ba5da069476458e8000c
|
|
| MD5 |
bdb1179b50ce7f3f4045f262702c006b
|
|
| BLAKE2b-256 |
76cad7fea875cf434051476b257fcc8a6a4b8cbbacfd4eee9802209c7bd4844a
|
Provenance
The following attestation bundles were made for filoma-1.4.0.tar.gz:
Publisher:
publish.yml on kalfasyan/filoma
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filoma-1.4.0.tar.gz -
Subject digest:
5829f98b19ee17bf1aaacfd92f3061bd1b50aefdc983ba5da069476458e8000c - Sigstore transparency entry: 470338044
- Sigstore integration time:
-
Permalink:
kalfasyan/filoma@ae2f6d95bb9c0e360fa39a18eb985c6e8bb331ae -
Branch / Tag:
refs/tags/v1.4.0 - Owner: https://github.com/kalfasyan
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ae2f6d95bb9c0e360fa39a18eb985c6e8bb331ae -
Trigger Event:
push
-
Statement type:
File details
Details for the file filoma-1.4.0-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: filoma-1.4.0-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 364.1 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebb2acdc091f526943161c608564c7c8b3e5d28e7d75d11abc1384efc006997c
|
|
| MD5 |
82a7b222e32b5e8ecd62334e5418d9c0
|
|
| BLAKE2b-256 |
cf3c4898e01fc249836371c98690ff4700ca3f579b0c19d26810ed698212df74
|
Provenance
The following attestation bundles were made for filoma-1.4.0-cp311-cp311-win_amd64.whl:
Publisher:
publish.yml on kalfasyan/filoma
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filoma-1.4.0-cp311-cp311-win_amd64.whl -
Subject digest:
ebb2acdc091f526943161c608564c7c8b3e5d28e7d75d11abc1384efc006997c - Sigstore transparency entry: 470338080
- Sigstore integration time:
-
Permalink:
kalfasyan/filoma@ae2f6d95bb9c0e360fa39a18eb985c6e8bb331ae -
Branch / Tag:
refs/tags/v1.4.0 - Owner: https://github.com/kalfasyan
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ae2f6d95bb9c0e360fa39a18eb985c6e8bb331ae -
Trigger Event:
push
-
Statement type:
File details
Details for the file filoma-1.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: filoma-1.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 545.5 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4762acb2ca4b59e3e86c633a33ba68f8877c7b271a2ae5347f2883ce2992584
|
|
| MD5 |
aa35ac2384ac91bf2e445052dec40938
|
|
| BLAKE2b-256 |
dd924db05f7dd983bcb8a1947ff5e2b1921b02eea7dd4e4c8fa2a04354c97e91
|
Provenance
The following attestation bundles were made for filoma-1.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on kalfasyan/filoma
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filoma-1.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
d4762acb2ca4b59e3e86c633a33ba68f8877c7b271a2ae5347f2883ce2992584 - Sigstore transparency entry: 470338057
- Sigstore integration time:
-
Permalink:
kalfasyan/filoma@ae2f6d95bb9c0e360fa39a18eb985c6e8bb331ae -
Branch / Tag:
refs/tags/v1.4.0 - Owner: https://github.com/kalfasyan
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ae2f6d95bb9c0e360fa39a18eb985c6e8bb331ae -
Trigger Event:
push
-
Statement type:
File details
Details for the file filoma-1.4.0-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: filoma-1.4.0-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 491.1 kB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d54f46fade321a9dd55ca078540a791598357db188ab3102dc36c959b69dc89
|
|
| MD5 |
47277fdac3d7bc4311ce5a477f93bed9
|
|
| BLAKE2b-256 |
8f1a1e78b6ab87f39e4c7790e6e39d33edc430e60113550713525ee2a2b83b0b
|
Provenance
The following attestation bundles were made for filoma-1.4.0-cp311-cp311-macosx_11_0_arm64.whl:
Publisher:
publish.yml on kalfasyan/filoma
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filoma-1.4.0-cp311-cp311-macosx_11_0_arm64.whl -
Subject digest:
6d54f46fade321a9dd55ca078540a791598357db188ab3102dc36c959b69dc89 - Sigstore transparency entry: 470338073
- Sigstore integration time:
-
Permalink:
kalfasyan/filoma@ae2f6d95bb9c0e360fa39a18eb985c6e8bb331ae -
Branch / Tag:
refs/tags/v1.4.0 - Owner: https://github.com/kalfasyan
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ae2f6d95bb9c0e360fa39a18eb985c6e8bb331ae -
Trigger Event:
push
-
Statement type: