Skip to main content

Modular Python tool for profiling files, analyzing directory structures, and inspecting image data

Project description

filoma

PyPI version Code style: ruff Contributions welcome Tests

Fast, multi-backend Python tool for directory analysis and file profiling.

Analyze directory structures, profile files, and inspect image data with automatic performance optimization through Rust, fd, or Python backends.


Documentation: Installation โ€ข Backends โ€ข Advanced Usage โ€ข Benchmarks

Source Code: https://github.com/kalfasyan/filoma


Quick Start

# Install
uv add filoma  # or: pip install filoma
from filoma.directories import DirectoryProfiler
profiler = DirectoryProfiler()
res = profiler.probe("/")
profiler.print_summary(res)

Example output:

Directory Analysis: / (๐Ÿฆ€ Rust (Parallel)) - 29.56s
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Metric                    โ”‚ Value            โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Total Files               โ”‚ 2,186,785        โ”‚
โ”‚ Total Folders             โ”‚ 209,401          โ”‚
โ”‚ Total Size                โ”‚ 135,050,621.82 MBโ”‚
โ”‚ Average Files per Folder  โ”‚ 10.44            โ”‚
โ”‚ Maximum Depth             โ”‚ 21               โ”‚
โ”‚ Empty Folders             โ”‚ 7,930            โ”‚
โ”‚ Analysis Time             โ”‚ 29.56 s          โ”‚
โ”‚ Processing Speed          โ”‚ 81,074 items/sec โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Features

  • ๐Ÿš€ 3 Performance Backends - Automatic selection: Rust (~2.3x faster *), fd (competitive), Python (baseline)
  • ๐Ÿ“Š Directory Analysis - File counts, extensions, empty folders, depth distribution, size statistics
  • ๐Ÿ” Smart File Search - Advanced patterns with regex/glob support via FdSearcher
  • ๐Ÿ“ˆ DataFrame Support - Build Polars DataFrames for advanced analysis and filtering
  • ๐Ÿ–ผ๏ธ Image Analysis - Profile .tif, .png, .npy, .zarr files with metadata and statistics
  • ๐Ÿ“ File Profiling - System metadata, permissions, timestamps, symlink analysis
  • ๐ŸŽจ Rich Terminal Output - Beautiful progress bars and formatted reports

* According to benchmarks

Examples

Directory Analysis

The simplest way to probe a directory and print a summary:

from filoma.directories import DirectoryProfiler

profiler = DirectoryProfiler()
res = profiler.probe("/", max_depth=3)
profiler.print_summary(res)

Async (opt-in) โ€” good for network filesystems

If traversing NFS/SMB, remote mounts, cloud-fuse, etc., enable async mode to parallelize filesystem calls and improve throughput.

# Async (optโ€‘in) scanning for network / highโ€‘latency filesystems
# Enable when traversing NFS/SMB, remote mounts, cloud-fuse, etc.
from filoma.directories import DirectoryProfiler

profiler = DirectoryProfiler(
    use_async=True,
    network_concurrency=32,    # Parallel in-flight filesystem ops
    network_timeout_ms=3000,   # Per op timeout (ms)
    network_retries=2          # Retries for transient errors
)

result = profiler.probe("/mnt/nfs/share")
profiler.print_summary(result)

Tips:

  • Lower network_concurrency if the server throttles you; raise for high-latency links.
  • Increase network_timeout_ms for very slow metadata calls.
  • Retries help with flaky mounts; set to 0 for strict mode.
  • Fallback: omit use_async for local SSDs (sync is usually faster there).

Smart File Search

The FdSearcher class provides advanced file searching with regex and glob support, leveraging the high-performance fd tool when available.

from filoma.directories import FdSearcher

searcher = FdSearcher()

# Find Python files
python_files = searcher.find_files(pattern=r"\.py$", max_depth=2)

# Find by multiple extensions
code_files = searcher.find_by_extension(['py', 'rs', 'js'], directory=".")

# Glob patterns
config_files = searcher.find_files(pattern="*.{json,yaml}", use_glob=True)

DataFrame Analysis

filoma can build Polars DataFrames for advanced analysis and filtering, allowing you to leverage the full power of Polars for downstream tasks.

# Build DataFrame for advanced analysis
profiler = DirectoryProfiler(build_dataframe=True)
result = profiler.probe(".")
df = profiler.get_dataframe(result)

# Add path components and probe
df = df.add_path_components().add_file_stats()
python_files = df.filter_by_extension('.py')
df.save_csv("analysis.csv")

File & Image Profiling

Individual file profiling with metadata and image analysis:

from filoma.files import FileProfiler
from filoma.images import PngProfiler

# File metadata
file_profiler = FileProfiler()

# 1) dict-style (legacy) โ€” returns the same report dict that print_report expects
report = file_profiler.probe("/path/to/file.txt")
file_profiler.print_report(report)

# 2) dataclass-style (recommended) โ€” returns a `Filo` dataclass with attribute access
#    `compute_hash=True` will compute a SHA256 fingerprint (optional/expensive)
filo = file_profiler.probe_filo("/path/to/file.txt", compute_hash=True)
print(filo)               # dataclass repr; access fields like filo.path, filo.sha256
print(filo.sha256)        # full SHA256 (if computed)
print(filo.to_dict())     # convert to plain dict

# Image analysis
img_profiler = PngProfiler()
img_report = img_profiler.probe("/path/to/image.png")
print(img_report)  # Shape, dtype, stats, etc.

Performance

Automatic backend selection for optimal speed:

Backend Speed Use Case
๐Ÿฆ€ Rust ~70K files/sec Large directories, DataFrame building
๐Ÿ” fd ~46K files/sec Pattern matching, network filesystems
๐Ÿ Python ~30K files/sec Universal compatibility, reliable fallback

Cold cache benchmarks on NVMe SSD. See benchmarks for detailed methodology.

System directories: filoma automatically handles permission errors for directories like /proc, /sys.

Installation & Setup

See installation guide for:

  • Quick setup with uv/pip
  • Optional performance optimization (Rust/fd)
  • Verification and troubleshooting

Documentation

Project Structure

src/filoma/
โ”œโ”€โ”€ core/          # Backend integrations (fd, Rust)
โ”œโ”€โ”€ directories/   # Directory analysis with 3 backends
โ”œโ”€โ”€ files/         # File profiling and metadata
โ””โ”€โ”€ images/        # Image analysis (.tif, .png, .npy, .zarr)

License

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

Contributing

Contributions welcome! Please check the issues for planned features and bug reports.


filoma - Fast, multi-backend file and directory analysis for Python.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filoma-1.4.0.tar.gz (126.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

filoma-1.4.0-cp311-cp311-win_amd64.whl (364.1 kB view details)

Uploaded CPython 3.11Windows x86-64

filoma-1.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (545.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

filoma-1.4.0-cp311-cp311-macosx_11_0_arm64.whl (491.1 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file filoma-1.4.0.tar.gz.

File metadata

  • Download URL: filoma-1.4.0.tar.gz
  • Upload date:
  • Size: 126.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for filoma-1.4.0.tar.gz
Algorithm Hash digest
SHA256 5829f98b19ee17bf1aaacfd92f3061bd1b50aefdc983ba5da069476458e8000c
MD5 bdb1179b50ce7f3f4045f262702c006b
BLAKE2b-256 76cad7fea875cf434051476b257fcc8a6a4b8cbbacfd4eee9802209c7bd4844a

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.4.0.tar.gz:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.4.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: filoma-1.4.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 364.1 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for filoma-1.4.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 ebb2acdc091f526943161c608564c7c8b3e5d28e7d75d11abc1384efc006997c
MD5 82a7b222e32b5e8ecd62334e5418d9c0
BLAKE2b-256 cf3c4898e01fc249836371c98690ff4700ca3f579b0c19d26810ed698212df74

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.4.0-cp311-cp311-win_amd64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for filoma-1.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d4762acb2ca4b59e3e86c633a33ba68f8877c7b271a2ae5347f2883ce2992584
MD5 aa35ac2384ac91bf2e445052dec40938
BLAKE2b-256 dd924db05f7dd983bcb8a1947ff5e2b1921b02eea7dd4e4c8fa2a04354c97e91

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.4.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for filoma-1.4.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6d54f46fade321a9dd55ca078540a791598357db188ab3102dc36c959b69dc89
MD5 47277fdac3d7bc4311ce5a477f93bed9
BLAKE2b-256 8f1a1e78b6ab87f39e4c7790e6e39d33edc430e60113550713525ee2a2b83b0b

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.4.0-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page