Skip to main content

Modular Python tool for profiling files, analyzing directory structures, and inspecting image data

Project description

filoma logo

PyPI version Python versions License Ruff Actions status Documentation Status

Fast, multi-backend file/directory profiling and data preparation.

pip install filoma

import filoma as flm

InstallationDocumentationAgentic AnalysisInteractive CLIQuickstartCookbookRoboflow Dataset DemoSource Code

📖 New to Filoma? Check out the Cookbook for practical, copy-paste recipes for common tasks!


filoma helps you analyze file directory trees, inspect file metadata, and prepare your data for exploration. It can achieve this blazingly fast using the best available backend (Rust, fd, or pure Python) ⚡🍃

Filoma Package Overview

Key Features

  • 🚀 High-Performance Backends: Automatic selection of Rust, fd, or Python for the best performance.
  • 📈 DataFrame Integration: Convert scan results to Polars (or pandas) DataFrames for powerful analysis.
  • 📊 Rich Directory Analysis: Get detailed statistics on file counts, extensions, sizes, and more.
  • 🔍 Smart File Search: Use regex and glob patterns to find files with FdFinder.
  • 🖼️ File/Image Profiling: Extract metadata and statistics from various file formats.
  • 🧠 Agentic Analysis: Natural language interface for file discovery, deduplication, and metadata inspection. 📖 Brain Guide →
  • 🏗️ Architectural Clarity: High-level visual flows for discovery and processing. 📖 Architecture Documentation →
  • 🖥️ Interactive CLI: Beautiful terminal interface for filesystem exploration and DataFrame analysis 📖 CLI Documentation →

Filoma Package Overview


⚡ Quick Start

filoma provides a unified API for all your filesystem analysis needs. Whether you're inspecting a single file or a million-file directory, it stays fast and intuitive.

1. Simple File & Image Profiling

Extract rich metadata and statistics from any file or image with a single call.

import filoma as flm

# Profile any file
info = flm.probe_file("README.md")
print(info)
📄 See Metadata Output
Filo(
    path=PosixPath('README.md'), 
    size=12237, 
    mode_str='-rw-rw-r--', 
    owner='user', 
    modified=datetime.datetime(2025, 12, 30, 22, 45, 53), 
    is_file=True,
    ...
)

For images, probe_image automatically extracts shapes, types, and pixel statistics.

2. Blazingly Fast Directory Analysis

Scan entire directory trees in milliseconds. filoma automatically picks the fastest available backend (Rust → fd → Python).

# Analyze a directory
analysis = flm.probe('.')

# Print a high-level summary
analysis.print_summary()
📂 See Directory Summary Table
 Directory Analysis: /project (🦀 Rust (Parallel)) - 0.60s
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metric                   ┃ Value                ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ Total Files              │ 57,225               │
│ Total Folders            │ 3,427                │
│ Total Size               │ 2,084.90 MB          │
│ Average Files per Folder │ 16.70                │
│ Maximum Depth            │ 14                   │
│ Empty Folders            │ 103                  │
│ Analysis Time            │ 0.60s                │
│ Processing Speed         │ 102,114 items/sec    │
└──────────────────────────┴──────────────────────┘
# Or get a detailed report with extensions and folder stats
analysis.print_report()
📊 See Detailed Directory Report
          File Extensions
┏━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┓
┃ Extension  ┃ Count  ┃ Percentage ┃
┡━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━┩
│ .py        │ 240    │ 12.8%      │
│ .jpg       │ 1,204  │ 64.2%      │
│ .json      │ 431    │ 23.0%      │
│ .svg       │ 28,674 │ 50.1%      │
└────────────┴────────┴────────────┘

          Common Folder Names
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Folder Name   ┃ Occurrences ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ src           │ 1           │
│ tests         │ 1           │
│ docs          │ 1           │
│ notebooks     │ 1           │
└───────────────┴─────────────┘

          Empty Folders (3 found)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Path                                       ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ /project/data/raw/empty_set_A              │
│ /project/logs/old/unused                   │
│ /project/temp/scratch                      │
└────────────────────────────────────────────┘

3. DataFrames & Enrichment

Convert scan results to Polars DataFrames for advanced analysis. Use .enrich() to instantly add path components, file stats, and hierarchy data.

# Scan and get an enriched filoma.DataFrame (Polars)
df = flm.probe_to_df('src', enrich=True)

print(df.head(2))
📊 See Enriched DataFrame Output
filoma.DataFrame with 2 rows
shape: (2, 18)
┌───────────────────┬───────┬────────┬───────────────┬───┬─────────┬───────┬────────┬────────┐
│ path              ┆ depth ┆ parent ┆ name          ┆ … ┆ inode   ┆ nlink ┆ sha256 ┆ xattrs │
│ ---               ┆ ---   ┆ ---    ┆ ---           ┆   ┆ ---     ┆ ---   ┆ ---    ┆ ---    │
│ str               ┆ i64   ┆ str    ┆ str           ┆   ┆ i64     ┆ i64   ┆ str    ┆ str    │
╞═══════════════════╪═══════╪════════╪═══════════════╪═══╪═════════╪═══════╪════════╪════════╡
│ src/async_scan.rs ┆ 1     ┆ src    ┆ async_scan.rs ┆ … ┆ 7601121 ┆ 1     ┆ null   ┆ {}     │
│ src/filoma        ┆ 1     ┆ src    ┆ filoma        ┆ … ┆ 7603126 ┆ 8     ┆ null   ┆ {}     │
└───────────────────┴───────┴────────┴───────────────┴───┴─────────┴───────┴────────┴────────┘

✨ Enriched columns added: parent, name, stem, suffix, size_bytes, modified_time, 
   created_time, is_file, is_dir, owner, group, mode_str, inode, nlink, sha256, xattrs, depth
  • Seamless Pandas Integration: Just use df.pandas for instant conversion.
  • Lazy Loading: import filoma is cheap; heavy dependencies load only when needed.

4. Specialized DataFrame Operations

Filoma's DataFrame extends Polars with specialized filesystem operations, providing quick ways to filter and summarize your data.

# Filter by extensions
df.filter_by_extension([".py", ".rs"])

# Quick frequency analysis (counts)
df.extension_counts()
df.directory_counts()
🔍 See Operation Examples

filter_by_extension([".py", ".rs"])

shape: (3, 1)
┌─────────────────────┐
│ path                │
│ ---                 │
│ str                 │
╞═════════════════════╡
│ src/async_scan.rs   │
│ src/lib.rs          │
│ src/filoma/dedup.py │
└─────────────────────┘

extension_counts() Groups files by extension and returns counts.

shape: (3, 2)
┌────────────┬─────┐
│ extension  ┆ len │
│ ---        ┆ --- │
│ str        ┆ u32 │
╞════════════╪═════╡
│ .py        ┆ 240 │
│ .jpg       ┆ 124 │
│ .json      ┆ 43  │
└────────────┴─────┘

directory_counts() Summarizes file distribution across parent directories.

shape: (3, 2)
┌────────────┬─────┐
│ parent_dir ┆ len │
│ ---        ┆ --- │
│ str        ┆ u32 │
╞════════════╪═════╡
│ src/filoma ┆ 12  │
│ tests      ┆ 8   │
│ docs       ┆ 5   │
└────────────┴─────┘

5. 🧠 Filoma Brain (Agentic Analysis)

Connect a "brain" to your filesystem. Filoma integrates with PydanticAI to allow you to interact with your files using natural language. The agent has tools to scan directories, find duplicates, and inspect metadata.

from filoma.brain import get_agent

# "Find duplicate images in ./data and tell me how many groups you found"
agent = get_agent()
await agent.run("Find duplicate images...")

Or chat directly from the terminal:

filoma brain chat

📖 Read the Agentic Analysis Guide →

Performance & Benchmarks

Need to compare backend performance? Check out the comprehensive Benchmarks Guide!

Latest Results:

  • Local SSD (1M files, MacBook Air M4):

    • 🦀 Rust: 7.3s (136K files/sec) - fastest for metadata collection
    • Async: 11.5s (87K files/sec) - strong alternative
    • 🐍 Python: 35.5s (28K files/sec) - reliable baseline
    • os.walk (discovery-only): 0.565s (1.77M files/sec)
  • Network Storage (200k files, cold cache):

    • 🦀 Rust: 2.3s (86K files/sec)
    • Async: 2.8s (70K files/sec)
    • 🐍 Python: 15.1s (13K files/sec)

The Benchmarks Guide includes:

  • 📊 Detailed results across backends and storage types
  • 🔧 Testing methodology and best practices
  • 💡 Backend selection recommendations for your use case

Run your own benchmarks:

python benchmarks/benchmark.py --path /your/directory -n 3 --backend profiling

License

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

Contributing

Contributions welcome! Please check the issues for planned features and bug reports.


filoma - Fast, multi-backend file/directory profiling and data preparation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filoma-1.11.5.tar.gz (2.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

filoma-1.11.5-cp311-cp311-win_amd64.whl (458.2 kB view details)

Uploaded CPython 3.11Windows x86-64

filoma-1.11.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (637.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

filoma-1.11.5-cp311-cp311-macosx_11_0_arm64.whl (580.0 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file filoma-1.11.5.tar.gz.

File metadata

  • Download URL: filoma-1.11.5.tar.gz
  • Upload date:
  • Size: 2.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for filoma-1.11.5.tar.gz
Algorithm Hash digest
SHA256 299d325e3547bc75308f7bc22eef1fdcca8e3243433b2b1785a53f917e233c59
MD5 7723bf3f751519d7bc818fd7ed70e438
BLAKE2b-256 58b476fc5b885051256f7dec82dc912553e35df7bbefc11043a7dcac3a26937e

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.11.5.tar.gz:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.11.5-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: filoma-1.11.5-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 458.2 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for filoma-1.11.5-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 15475cb4dd0357bab6fd981fbbcb18ccb85ed1759f77ae724167b716438ca699
MD5 9c6c174ff0e0cb03c460de4e0d6a1064
BLAKE2b-256 52c9ba87317530e3b5f5c4854a98a4bd67fad9c7f91874318f415e96a96f2f27

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.11.5-cp311-cp311-win_amd64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.11.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for filoma-1.11.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1d5e94fe3e7bbaf9fb65dc7ccf2aba5d8f1ac4d28d89ad56690522447ce0d09a
MD5 ab4e883889d43b29199118ab2960acaf
BLAKE2b-256 f12ceebd3deca6941bfd49b59551f7d2e45ead48b57147423382fe640f6aa472

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.11.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.11.5-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for filoma-1.11.5-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5763fabbb799c3af0751d9104caab4d8bdf55d67a95ff42c7315d90ca5e81764
MD5 18dc9a309de34163f3ee92c5e69b530d
BLAKE2b-256 e3cbe65b94a815d70184dc3a1de70b67cd33445ea8d23cf89cd9b996d3505b91

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.11.5-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page