Skip to main content

Modular Python tool for profiling files, analyzing directory structures, and inspecting image data

Project description

filoma

PyPI version Code style: ruff Contributions welcome Tests

filoma is a modular Python tool for profiling files, analyzing directory structures, and inspecting image data (e.g., .tif, .png, .npy, .zarr). It provides detailed reports on filename patterns, inconsistencies, file counts, empty folders, file system metadata, and image data statistics. The project is designed for easy expansion, testing, CI/CD, Dockerization, and database integration.

Installation

# ๐Ÿš€ RECOMMENDED: Using uv (modern, fast Python package manager)
# Install uv first if you don't have it: curl -LsSf https://astral.sh/uv/install.sh | sh

# For uv projects (recommended - manages dependencies in pyproject.toml):
uv add filoma

# For scripts or non-project environments:
uv pip install filoma

# Traditional method:
pip install filoma

# For maximum performance, also install Rust toolchain:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env
# Then reinstall to build Rust extension:
uv add filoma --force  # or: uv pip install --force-reinstall filoma

Note: Rust installation is optional. filoma works perfectly with pure Python, but gets 5-20x faster with Rust acceleration.

Which Installation Method to Choose?

  • uv add filoma โ†’ Use this if you have a pyproject.toml file (most Python projects)
  • uv pip install filoma โ†’ Use for standalone scripts or when you don't want project dependency management
  • pip install filoma โ†’ Traditional method for older Python environments

Features

  • Directory analysis: Comprehensive directory tree analysis including file counts, folder patterns, empty directories, extension analysis, size statistics, and depth distribution
  • Progress bar & timing: See real-time progress and timing for large directory scans, with beautiful terminal output (using rich).
  • ๐Ÿ“Š DataFrame support: Build Polars DataFrames with all file paths for advanced analysis, filtering, and data manipulation
  • ๐Ÿฆ€ Rust acceleration: Optional Rust backend for 5-20x faster directory analysis - completely automatic and transparent!
  • Image analysis: Analyze .tif, .png, .npy, .zarr files for metadata, stats (min, max, mean, NaNs, etc.), and irregularities
  • File profiling: System metadata (size, permissions, owner, group, timestamps, symlink targets, etc.)
  • Modular, extensible codebase
  • CLI entry point (planned)
  • Ready for testing, CI/CD, Docker, and database integration

Progress Bar & Timing Features

filoma provides a real-time progress bar and timing details for directory analysis, making it easy to track progress on large scans. The progress bar is enabled by default and uses the rich library for beautiful terminal output.

Example:

from filoma.directories import DirectoryProfiler

profiler = DirectoryProfiler(show_progress=True)
result = profiler.analyze("/path/to/large/directory")
profiler.print_summary(result)

# Output includes a progress bar and timing details:
#
# Directory Analysis: /path/to/large/directory (๐Ÿฆ€ Rust) - 0.12s
# โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
# โ”ƒ Metric                   โ”ƒ Value         โ”ƒ
# ...
# โ”ƒ Analysis Time            โ”ƒ 0.12s         โ”ƒ
# โ”ƒ Processing Speed         โ”ƒ 8,000 items/s โ”ƒ
# โ”—โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ปโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”›

Performance Note:

The progress bar introduces minimal overhead (especially when updated every 100 items, as in the default implementation). For benchmarking or maximum speed, you can disable it with show_progress=False.

๐Ÿš€ Automatic Performance Acceleration

filoma includes automatic Rust acceleration for directory analysis:

  • โšก 5-20x faster than pure Python (depending on directory size)
  • ๐Ÿ”ง Zero configuration - works automatically when Rust toolchain is available
  • ๐Ÿ Graceful fallback - uses pure Python when Rust isn't available
  • ๐Ÿ“Š Transparent - same API, same results, just faster!

Quick Setup for Maximum Performance

# Install Rust (one-time setup)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env

# Install filoma with Rust acceleration
uv add filoma          # For uv projects (recommended)
# or: uv pip install filoma  # For scripts/non-project environments
# or: pip install filoma     # Traditional method
# The Rust extension builds automatically during installation!

Performance Examples

from filoma.directories import DirectoryProfiler

profiler = DirectoryProfiler()
# The output shows which backend is used:
# "Directory Analysis: /path (๐Ÿฆ€ Rust)" or "Directory Analysis: /path (๐Ÿ Python)"

result = profiler.analyze("/large/directory")
# Typical speedups:
# - Small dirs (<1K files): 2-5x faster
# - Medium dirs (1K-10K files): 5-10x faster  
# - Large dirs (>10K files): 10-20x faster

No code changes needed - your existing code automatically gets faster! ๐ŸŽ‰

Quick Check: Is Rust Working?

from filoma.directories import DirectoryProfiler

profiler = DirectoryProfiler()
result = profiler.analyze(".")

# Look for the ๐Ÿฆ€ Rust emoji in the report title:
profiler.print_summary(result)
# Output shows: "Directory Analysis: . (๐Ÿฆ€ Rust)" or "Directory Analysis: . (๐Ÿ Python)"

# Or check programmatically:
print(f"Rust acceleration: {'โœ… Active' if profiler.use_rust else 'โŒ Not available'}")

Quick Installation Verification

import filoma
from filoma.directories import DirectoryProfiler

# Check version and basic functionality
print(f"filoma version: {filoma.__version__}")

profiler = DirectoryProfiler()
print(f"Rust acceleration: {'โœ… Active' if profiler.use_rust else 'โŒ Not available'}")

Pro tip:

  • Working on a project? โ†’ Use uv add filoma (manages your pyproject.toml automatically)
  • Running standalone scripts? โ†’ Use uv pip install filoma
  • Need compatibility? โ†’ Use pip install filoma
  • Want the fastest experience? โ†’ Install uv first!

Simple Examples

Directory Analysis

from filoma.directories import DirectoryProfiler

# Automatically uses Rust acceleration when available (๐Ÿฆ€ Rust)
# Falls back to Python implementation when needed (๐Ÿ Python)
profiler = DirectoryProfiler()
result = profiler.analyze("/path/to/directory", max_depth=3)

# Print comprehensive report with rich formatting
# The report title shows which backend was used!
profiler.print_report(result)

# Or access specific data
print(f"Total files: {result['summary']['total_files']}")
print(f"Total folders: {result['summary']['total_folders']}")
print(f"Empty folders: {result['summary']['empty_folder_count']}")
print(f"File extensions: {result['file_extensions']}")
print(f"Common folder names: {result['common_folder_names']}")

DataFrame Analysis (Advanced)

from filoma.directories import DirectoryProfiler
from filoma import DataFrame

# Enable DataFrame building for advanced analysis
profiler = DirectoryProfiler(build_dataframe=True)
result = profiler.analyze("/path/to/directory")

# Get the DataFrame with all file paths
df = profiler.get_dataframe(result)
print(f"Found {len(df)} paths")

# Add path components (parent, name, stem, suffix)
df_enhanced = df.add_path_components()
print(df_enhanced.head())

# Filter by file type
python_files = df.filter_by_extension('.py')
image_files = df.filter_by_extension(['.jpg', '.png', '.tif'])

# Group and analyze
extension_counts = df.group_by_extension()
directory_counts = df.group_by_directory()

# Add file statistics
df_with_stats = df.add_file_stats()  # size, timestamps, etc.

# Add depth information
df_with_depth = df.add_depth_column()

# Export for further analysis
df.save_csv("file_analysis.csv")
df.save_parquet("file_analysis.parquet")

File Profiling

from filoma.files import FileProfiler
profiler = FileProfiler()
report = profiler.profile("/path/to/file.txt")
profiler.print_report(report)  # Rich table output in your terminal
# Output: (Rich table with file metadata and access rights)

Image Analysis

from filoma.images import PngProfiler
profiler = PngProfiler()
report = profiler.analyze("/path/to/image.png")
print(report)
# Output: {'shape': ..., 'dtype': ..., 'min': ..., 'max': ..., 'nans': ..., ...}

Directory Analysis Features

The DirectoryProfiler provides comprehensive analysis of directory structures:

  • Statistics: Total files, folders, size calculations, and depth distribution
  • File Extension Analysis: Count and percentage breakdown of file types
  • Folder Patterns: Identification of common folder naming patterns
  • Empty Directory Detection: Find directories with no files or subdirectories
  • Depth Control: Limit analysis depth with max_depth parameter
  • Rich Output: Beautiful terminal reports with tables and formatting
  • ๐Ÿ“Š DataFrame Support: Optional Polars DataFrame with all file paths for advanced analysis

DataFrame Features

When enabled with build_dataframe=True, you get access to powerful data analysis capabilities:

  • Path Analysis: Automatic extraction of path components (parent, name, stem, suffix)
  • File Statistics: Size, modification times, creation times, file type detection
  • Advanced Filtering: Filter by extensions, patterns, or custom conditions
  • Grouping & Aggregation: Group by extension, directory, or custom fields
  • Export Options: Save results as CSV, Parquet, or access the underlying Polars DataFrame
  • Performance: Works with both Python and Rust implementations seamlessly

Analysis Output Structure

{
    "root_path": "/analyzed/path",
    "summary": {
        "total_files": 150,
        "total_folders": 25,
        "total_size_bytes": 1048576,
        "total_size_mb": 1.0,
        "avg_files_per_folder": 6.0,
        "max_depth": 3,
        "empty_folder_count": 2
    },
    "file_extensions": {".py": 45, ".txt": 30, ".md": 10},
    "common_folder_names": {"src": 3, "tests": 2, "docs": 1},
    "empty_folders": ["/path/to/empty1", "/path/to/empty2"],
    "top_folders_by_file_count": [("/path/with/most/files", 25)],
    "depth_distribution": {0: 1, 1: 5, 2: 12, 3: 7},
    "dataframe": filoma.DataFrame  # When build_dataframe=True
}

DataFrame API Reference

The filoma.DataFrame class provides:

# Path manipulation
df.add_path_components()     # Add parent, name, stem, suffix columns
df.add_depth_column()        # Add directory depth column
df.add_file_stats()          # Add size, timestamps, file type info

# Filtering
df.filter_by_extension('.py')              # Filter by single extension
df.filter_by_extension(['.jpg', '.png'])   # Filter by multiple extensions
df.filter_by_pattern('test')               # Filter by path pattern

# Analysis
df.group_by_extension()      # Group and count by file extension
df.group_by_directory()      # Group and count by parent directory

# Export
df.save_csv("analysis.csv")           # Export to CSV
df.save_parquet("analysis.parquet")   # Export to Parquet
df.to_polars()                        # Get underlying Polars DataFrame

Project Structure

  • src/filoma/directories/ โ€” Directory analysis and structure profiling
  • src/filoma/images/ โ€” Image profilers and analysis
  • src/filoma/files/ โ€” File profiling (system metadata)
  • tests/ โ€” All tests (unit, integration, and scripts) are in this folder

๐Ÿ”ง Advanced: Rust Acceleration Details

For users who want to understand or customize the Rust acceleration:

  • How it works: Core directory traversal implemented in Rust using walkdir crate
  • Compatibility: Same API and output format as Python implementation
  • Setup guide: See RUST_ACCELERATION.md for detailed setup instructions
  • Benchmarking: Includes benchmark tool to test performance on your system
  • Development: Hybrid architecture allows Python-only development while keeping Rust acceleration

Manual Control (Advanced)

# Force Python implementation (useful for debugging)
profiler = DirectoryProfiler(use_rust=False)

# Check which backend is being used
print(f"Using Rust: {profiler.use_rust}")

# Compare performance
import time
start = time.time()
result = profiler.analyze("/path/to/directory")
print(f"Analysis took {time.time() - start:.3f}s")

Future TODO

  • CLI tool for all features
  • More image format support and advanced checks
  • Database integration for storing reports
  • Dockerization and deployment guides
  • CI/CD workflows and badges

filoma is under active development. Contributions and suggestions are welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filoma-1.3.0.tar.gz (85.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

filoma-1.3.0-cp311-cp311-win_amd64.whl (227.6 kB view details)

Uploaded CPython 3.11Windows x86-64

filoma-1.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (380.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

filoma-1.3.0-cp311-cp311-macosx_11_0_arm64.whl (335.8 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file filoma-1.3.0.tar.gz.

File metadata

  • Download URL: filoma-1.3.0.tar.gz
  • Upload date:
  • Size: 85.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for filoma-1.3.0.tar.gz
Algorithm Hash digest
SHA256 d1733ce4a466c0e9e0a972449503fed3a4c2b1081ae63af2cea3433856230f3b
MD5 ab9083b4aee58d8336ef143b52054a9c
BLAKE2b-256 9bdcc77bb367c1f47e13f05ddb13894544f3600ffd172d17fc991487cfe3ae6a

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.3.0.tar.gz:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.3.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: filoma-1.3.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 227.6 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for filoma-1.3.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 d929dcd607a61f9118053fd56a9a55febcf1e779e06888a4b65ce4b5ec5574dc
MD5 ffcc899f56f5588948cf552a1f66a657
BLAKE2b-256 ba16e3e350a7e4c4ed0e5858264ddbf7c114731782805b5002b664fc0fadf148

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.3.0-cp311-cp311-win_amd64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for filoma-1.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 652a4d5ac822e60780461f5ebb0b04e391cae968fecee44c586c14159a6aa9b9
MD5 3fb95bdd634568518ad762a93b001ea9
BLAKE2b-256 3e399cc48634f05ab6edac747ee411921448cc2db804fe3860931be109c1f2a9

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.3.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for filoma-1.3.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2239f02f85826f3336990ea4c5c69c1c72b0b7d22f896bf83da5f6a92c327b1c
MD5 bdb99714e6cbe053afeec41145766444
BLAKE2b-256 165686c4b2cce8ccc537abcf1da2ced7d040fe592c0b197e30570b4ffd6140d6

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.3.0-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page