Skip to main content

Modular Python tool for profiling files, analyzing directory structures, and inspecting image data

Project description

filoma logo

PyPI version Python versions License Ruff Actions status Documentation Status

Fast, multi-backend file/directory profiling and data preparation.

pip install filoma

import filoma as flm

InstallationDocumentationInteractive CLIQuickstartCookbookRoboflow Dataset DemoSource Code

📖 New to Filoma? Check out the Cookbook for practical, copy-paste recipes for common tasks!


filoma helps you analyze file directory trees, inspect file metadata, and prepare your data for exploration. It can achieve this blazingly fast using the best available backend (Rust, fd, or pure Python) ⚡🍃

Filoma Package Overview

Key Features

  • 🚀 High-Performance Backends: Automatic selection of Rust, fd, or Python for the best performance.
  • 📈 DataFrame Integration: Convert scan results to Polars (or pandas) DataFrames for powerful analysis.
  • 📊 Rich Directory Analysis: Get detailed statistics on file counts, extensions, sizes, and more.
  • 🔍 Smart File Search: Use regex and glob patterns to find files with FdFinder.
  • 🖼️ File/Image Profiling: Extract metadata and statistics from various file formats.
  • 🏗️ Architectural Clarity: High-level visual flows for discovery and processing. 📖 Architecture Documentation →
  • 🖥️ Interactive CLI: Beautiful terminal interface for filesystem exploration and DataFrame analysis 📖 CLI Documentation →

Filoma Package Overview


⚡ Quick Start

filoma provides a unified API for all your filesystem analysis needs. Whether you're inspecting a single file or a million-file directory, it stays fast and intuitive.

1. Simple File & Image Profiling

Extract rich metadata and statistics from any file or image with a single call.

import filoma as flm

# Profile any file
info = flm.probe_file("README.md")
print(info)
📄 See Metadata Output
Filo(
    path=PosixPath('README.md'), 
    size=12237, 
    mode_str='-rw-rw-r--', 
    owner='user', 
    modified=datetime.datetime(2025, 12, 30, 22, 45, 53), 
    is_file=True,
    ...
)

For images, probe_image automatically extracts shapes, types, and pixel statistics.

2. Blazingly Fast Directory Analysis

Scan entire directory trees in milliseconds. filoma automatically picks the fastest available backend (Rust → fd → Python).

# Analyze a directory
analysis = flm.probe('.')

# Print a high-level summary
analysis.print_summary()
📂 See Directory Summary Table
 Directory Analysis: /project (🦀 Rust (Parallel)) - 0.60s
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metric                   ┃ Value                ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ Total Files              │ 57,225               │
│ Total Folders            │ 3,427                │
│ Total Size               │ 2,084.90 MB          │
│ Average Files per Folder │ 16.70                │
│ Maximum Depth            │ 14                   │
│ Empty Folders            │ 103                  │
│ Analysis Time            │ 0.60s                │
│ Processing Speed         │ 102,114 items/sec    │
└──────────────────────────┴──────────────────────┘
# Or get a detailed report with extensions and folder stats
analysis.print_report()
📊 See Detailed Directory Report
          File Extensions
┏━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┓
┃ Extension  ┃ Count  ┃ Percentage ┃
┡━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━┩
│ .py        │ 240    │ 12.8%      │
│ .jpg       │ 1,204  │ 64.2%      │
│ .json      │ 431    │ 23.0%      │
│ .svg       │ 28,674 │ 50.1%      │
└────────────┴────────┴────────────┘

          Common Folder Names
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Folder Name   ┃ Occurrences ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ src           │ 1           │
│ tests         │ 1           │
│ docs          │ 1           │
│ notebooks     │ 1           │
└───────────────┴─────────────┘

          Empty Folders (3 found)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Path                                       ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ /project/data/raw/empty_set_A              │
│ /project/logs/old/unused                   │
│ /project/temp/scratch                      │
└────────────────────────────────────────────┘

3. DataFrames & Enrichment

Convert scan results to Polars DataFrames for advanced analysis. Use .enrich() to instantly add path components, file stats, and hierarchy data.

# Scan and get an enriched filoma.DataFrame (Polars)
df = flm.probe_to_df('src', enrich=True)

print(df.head(2))
📊 See Enriched DataFrame Output
filoma.DataFrame with 2 rows
shape: (2, 18)
┌───────────────────┬───────┬────────┬───────────────┬───┬─────────┬───────┬────────┬────────┐
│ path              ┆ depth ┆ parent ┆ name          ┆ … ┆ inode   ┆ nlink ┆ sha256 ┆ xattrs │
│ ---               ┆ ---   ┆ ---    ┆ ---           ┆   ┆ ---     ┆ ---   ┆ ---    ┆ ---    │
│ str               ┆ i64   ┆ str    ┆ str           ┆   ┆ i64     ┆ i64   ┆ str    ┆ str    │
╞═══════════════════╪═══════╪════════╪═══════════════╪═══╪═════════╪═══════╪════════╪════════╡
│ src/async_scan.rs ┆ 1     ┆ src    ┆ async_scan.rs ┆ … ┆ 7601121 ┆ 1     ┆ null   ┆ {}     │
│ src/filoma        ┆ 1     ┆ src    ┆ filoma        ┆ … ┆ 7603126 ┆ 8     ┆ null   ┆ {}     │
└───────────────────┴───────┴────────┴───────────────┴───┴─────────┴───────┴────────┴────────┘

✨ Enriched columns added: parent, name, stem, suffix, size_bytes, modified_time, 
   created_time, is_file, is_dir, owner, group, mode_str, inode, nlink, sha256, xattrs, depth
  • Seamless Pandas Integration: Just use df.pandas for instant conversion.
  • Lazy Loading: import filoma is cheap; heavy dependencies load only when needed.

4. Specialized DataFrame Operations

Filoma's DataFrame extends Polars with specialized filesystem operations, providing quick ways to filter and summarize your data.

# Filter by extensions
df.filter_by_extension([".py", ".rs"])

# Quick frequency analysis (counts)
df.extension_counts()
df.directory_counts()
🔍 See Operation Examples

filter_by_extension([".py", ".rs"])

shape: (3, 1)
┌─────────────────────┐
│ path                │
│ ---                 │
│ str                 │
╞═════════════════════╡
│ src/async_scan.rs   │
│ src/lib.rs          │
│ src/filoma/dedup.py │
└─────────────────────┘

extension_counts() Groups files by extension and returns counts.

shape: (3, 2)
┌────────────┬─────┐
│ extension  ┆ len │
│ ---        ┆ --- │
│ str        ┆ u32 │
╞════════════╪═════╡
│ .py        ┆ 240 │
│ .jpg       ┆ 124 │
│ .json      ┆ 43  │
└────────────┴─────┘

directory_counts() Summarizes file distribution across parent directories.

shape: (3, 2)
┌────────────┬─────┐
│ parent_dir ┆ len │
│ ---        ┆ --- │
│ str        ┆ u32 │
╞════════════╪═════╡
│ src/filoma ┆ 12  │
│ tests      ┆ 8   │
│ docs       ┆ 5   │
└────────────┴─────┘

Performance & Benchmarks

Need to compare backend performance? Check out the comprehensive Benchmarks Guide!

Latest Results:

  • Local SSD (1M files, MacBook Air M4):

    • 🦀 Rust: 7.3s (136K files/sec) - fastest for metadata collection
    • Async: 11.5s (87K files/sec) - strong alternative
    • 🐍 Python: 35.5s (28K files/sec) - reliable baseline
    • os.walk (discovery-only): 0.565s (1.77M files/sec)
  • Network Storage (200k files, cold cache):

    • 🦀 Rust: 2.3s (86K files/sec)
    • Async: 2.8s (70K files/sec)
    • 🐍 Python: 15.1s (13K files/sec)

The Benchmarks Guide includes:

  • 📊 Detailed results across backends and storage types
  • 🔧 Testing methodology and best practices
  • 💡 Backend selection recommendations for your use case

Run your own benchmarks:

python benchmarks/benchmark.py --path /your/directory -n 3 --backend profiling

License

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

Contributing

Contributions welcome! Please check the issues for planned features and bug reports.


filoma - Fast, multi-backend file/directory profiling and data preparation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filoma-1.11.1.tar.gz (2.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

filoma-1.11.1-cp311-cp311-win_amd64.whl (432.4 kB view details)

Uploaded CPython 3.11Windows x86-64

filoma-1.11.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (611.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

filoma-1.11.1-cp311-cp311-macosx_11_0_arm64.whl (554.6 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file filoma-1.11.1.tar.gz.

File metadata

  • Download URL: filoma-1.11.1.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for filoma-1.11.1.tar.gz
Algorithm Hash digest
SHA256 a2674c9c355308149e53c377d8062b8a40e59cf7a14b50681bbbb1e628962138
MD5 fd042cef305efbd7c7ac8ccf77ef5504
BLAKE2b-256 7a29f1141dbde30a8c47c7205dbf7c376cfa0c1d3569f519b71c0a0c395c27fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.11.1.tar.gz:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.11.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: filoma-1.11.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 432.4 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for filoma-1.11.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 74348b8154bedf7f89c7ee96d38085e10946e90ea2ad23891427dd1a95fa86b5
MD5 0e50ec94a95b1fcf4f1b27104b624258
BLAKE2b-256 f58c64ff3e2290f06cdd21e762a1fa2f077e9f7a6e4ff9d7fcb2913624123fef

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.11.1-cp311-cp311-win_amd64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.11.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for filoma-1.11.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 93b5a54d3a7e4d6317755d2667d52ac49bb2b23978fa68a2cea1cb35bd391a66
MD5 fa8a8144b2982800f2916dd26e673941
BLAKE2b-256 17d05a77f0d51fef5d5486bde33bb5ed1ec369714f5520ad4bc39f2c64115e71

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.11.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.11.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for filoma-1.11.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b71a85c905f6559e8ec8c2413f9bdaab5294b399a6cf7a81c2336fb46e6ce9dc
MD5 8a86a2f310c7328f9d49c4455d8b532d
BLAKE2b-256 834e00f97deefceacb26e485aba9de317ac6f0429db3c60e961f48dbbcaecd68

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.11.1-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page