Skip to main content

Modular Python tool for profiling files, analyzing directory structures, and inspecting image data

Project description

filoma logo

PyPI version Python versions License Ruff Actions status Documentation Status

Fast, multi-backend file/directory profiling and data preparation.

pip install filoma

import filoma as flm

InstallationDocumentationAgentic AnalysisInteractive CLIQuickstartCookbookRoboflow Dataset DemoSource Code

📖 New to Filoma? Check out the Cookbook for practical, copy-paste recipes for common tasks!


filoma helps you analyze file directory trees, inspect file metadata, and prepare your data for exploration. It can achieve this blazingly fast using the best available backend (Rust, fd, or pure Python) ⚡🍃

Filoma Package Overview

Key Features

  • 🚀 High-Performance Backends: Automatic selection of Rust, fd, or Python for the best performance.
  • 📈 DataFrame Integration: Convert scan results to Polars (or pandas) DataFrames for powerful analysis.
  • 📊 Rich Directory Analysis: Get detailed statistics on file counts, extensions, sizes, and more.
  • 🔍 Smart File Search: Use regex and glob patterns to find files with FdFinder.
  • 🖼️ File/Image Profiling: Extract metadata and statistics from various file formats.
  • 🧠 Agentic Analysis: Natural language interface for file discovery, deduplication, and metadata inspection. 📖 Brain Guide →
  • 🏗️ Architectural Clarity: High-level visual flows for discovery and processing. 📖 Architecture Documentation →
  • 🖥️ Interactive CLI: Beautiful terminal interface for filesystem exploration and DataFrame analysis 📖 CLI Documentation →

Filoma Package Overview


⚡ Quick Start

filoma provides a unified API for all your filesystem analysis needs. Whether you're inspecting a single file or a million-file directory, it stays fast and intuitive.

1. Simple File & Image Profiling

Extract rich metadata and statistics from any file or image with a single call.

import filoma as flm

# Profile any file
info = flm.probe_file("README.md")
print(info)
📄 See Metadata Output
Filo(
    path=PosixPath('README.md'), 
    size=12237, 
    mode_str='-rw-rw-r--', 
    owner='user', 
    modified=datetime.datetime(2025, 12, 30, 22, 45, 53), 
    is_file=True,
    ...
)

For images, probe_image automatically extracts shapes, types, and pixel statistics.

2. Blazingly Fast Directory Analysis

Scan entire directory trees in milliseconds. filoma automatically picks the fastest available backend (Rust → fd → Python).

# Analyze a directory
analysis = flm.probe('.')

# Print a high-level summary
analysis.print_summary()
📂 See Directory Summary Table
 Directory Analysis: /project (🦀 Rust (Parallel)) - 0.60s
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metric                   ┃ Value                ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ Total Files              │ 57,225               │
│ Total Folders            │ 3,427                │
│ Total Size               │ 2,084.90 MB          │
│ Average Files per Folder │ 16.70                │
│ Maximum Depth            │ 14                   │
│ Empty Folders            │ 103                  │
│ Analysis Time            │ 0.60s                │
│ Processing Speed         │ 102,114 items/sec    │
└──────────────────────────┴──────────────────────┘
# Or get a detailed report with extensions and folder stats
analysis.print_report()
📊 See Detailed Directory Report
          File Extensions
┏━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┓
┃ Extension  ┃ Count  ┃ Percentage ┃
┡━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━┩
│ .py        │ 240    │ 12.8%      │
│ .jpg       │ 1,204  │ 64.2%      │
│ .json      │ 431    │ 23.0%      │
│ .svg       │ 28,674 │ 50.1%      │
└────────────┴────────┴────────────┘

          Common Folder Names
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Folder Name   ┃ Occurrences ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ src           │ 1           │
│ tests         │ 1           │
│ docs          │ 1           │
│ notebooks     │ 1           │
└───────────────┴─────────────┘

          Empty Folders (3 found)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Path                                       ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ /project/data/raw/empty_set_A              │
│ /project/logs/old/unused                   │
│ /project/temp/scratch                      │
└────────────────────────────────────────────┘

3. DataFrames & Enrichment

Convert scan results to Polars DataFrames for advanced analysis. Use .enrich() to instantly add path components, file stats, and hierarchy data.

# Scan and get an enriched filoma.DataFrame (Polars)
df = flm.probe_to_df('src', enrich=True)

print(df.head(2))
📊 See Enriched DataFrame Output
filoma.DataFrame with 2 rows
shape: (2, 18)
┌───────────────────┬───────┬────────┬───────────────┬───┬─────────┬───────┬────────┬────────┐
│ path              ┆ depth ┆ parent ┆ name          ┆ … ┆ inode   ┆ nlink ┆ sha256 ┆ xattrs │
│ ---               ┆ ---   ┆ ---    ┆ ---           ┆   ┆ ---     ┆ ---   ┆ ---    ┆ ---    │
│ str               ┆ i64   ┆ str    ┆ str           ┆   ┆ i64     ┆ i64   ┆ str    ┆ str    │
╞═══════════════════╪═══════╪════════╪═══════════════╪═══╪═════════╪═══════╪════════╪════════╡
│ src/async_scan.rs ┆ 1     ┆ src    ┆ async_scan.rs ┆ … ┆ 7601121 ┆ 1     ┆ null   ┆ {}     │
│ src/filoma        ┆ 1     ┆ src    ┆ filoma        ┆ … ┆ 7603126 ┆ 8     ┆ null   ┆ {}     │
└───────────────────┴───────┴────────┴───────────────┴───┴─────────┴───────┴────────┴────────┘

✨ Enriched columns added: parent, name, stem, suffix, size_bytes, modified_time, 
   created_time, is_file, is_dir, owner, group, mode_str, inode, nlink, sha256, xattrs, depth
  • Seamless Pandas Integration: Just use df.pandas for instant conversion.
  • Lazy Loading: import filoma is cheap; heavy dependencies load only when needed.

4. Specialized DataFrame Operations

Filoma's DataFrame extends Polars with specialized filesystem operations, providing quick ways to filter and summarize your data.

# Filter by extensions
df.filter_by_extension([".py", ".rs"])

# Quick frequency analysis (counts)
df.extension_counts()
df.directory_counts()
🔍 See Operation Examples

filter_by_extension([".py", ".rs"])

shape: (3, 1)
┌─────────────────────┐
│ path                │
│ ---                 │
│ str                 │
╞═════════════════════╡
│ src/async_scan.rs   │
│ src/lib.rs          │
│ src/filoma/dedup.py │
└─────────────────────┘

extension_counts() Groups files by extension and returns counts.

shape: (3, 2)
┌────────────┬─────┐
│ extension  ┆ len │
│ ---        ┆ --- │
│ str        ┆ u32 │
╞════════════╪═════╡
│ .py        ┆ 240 │
│ .jpg       ┆ 124 │
│ .json      ┆ 43  │
└────────────┴─────┘

directory_counts() Summarizes file distribution across parent directories.

shape: (3, 2)
┌────────────┬─────┐
│ parent_dir ┆ len │
│ ---        ┆ --- │
│ str        ┆ u32 │
╞════════════╪═════╡
│ src/filoma ┆ 12  │
│ tests      ┆ 8   │
│ docs       ┆ 5   │
└────────────┴─────┘

5. 🧠 Filoma Brain (Agentic Analysis)

Connect a "brain" to your filesystem. Filoma integrates with PydanticAI to allow you to interact with your files using natural language. The agent has tools to scan directories, find duplicates, and inspect metadata.

from filoma.brain import get_agent

# "Find duplicate images in ./data and tell me how many groups you found"
agent = get_agent()
await agent.run("Find duplicate images...")

Or chat directly from the terminal:

filoma brain chat

📖 Read the Agentic Analysis Guide →

Performance & Benchmarks

Need to compare backend performance? Check out the comprehensive Benchmarks Guide!

Latest Results:

  • Local SSD (1M files, MacBook Air M4):

    • 🦀 Rust: 7.3s (136K files/sec) - fastest for metadata collection
    • Async: 11.5s (87K files/sec) - strong alternative
    • 🐍 Python: 35.5s (28K files/sec) - reliable baseline
    • os.walk (discovery-only): 0.565s (1.77M files/sec)
  • Network Storage (200k files, cold cache):

    • 🦀 Rust: 2.3s (86K files/sec)
    • Async: 2.8s (70K files/sec)
    • 🐍 Python: 15.1s (13K files/sec)

The Benchmarks Guide includes:

  • 📊 Detailed results across backends and storage types
  • 🔧 Testing methodology and best practices
  • 💡 Backend selection recommendations for your use case

Run your own benchmarks:

python benchmarks/benchmark.py --path /your/directory -n 3 --backend profiling

License

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

Contributing

Contributions welcome! Please check the issues for planned features and bug reports.


filoma - Fast, multi-backend file/directory profiling and data preparation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filoma-1.11.4.tar.gz (2.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

filoma-1.11.4-cp311-cp311-win_amd64.whl (445.9 kB view details)

Uploaded CPython 3.11Windows x86-64

filoma-1.11.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (625.2 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

filoma-1.11.4-cp311-cp311-macosx_11_0_arm64.whl (567.9 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file filoma-1.11.4.tar.gz.

File metadata

  • Download URL: filoma-1.11.4.tar.gz
  • Upload date:
  • Size: 2.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for filoma-1.11.4.tar.gz
Algorithm Hash digest
SHA256 7ebf1a436a4c5f8d08dd79fa80248a23697d87f03b23cde2035ea83238870069
MD5 620e534480f20775ef570ba858e7e598
BLAKE2b-256 6914118bb705aac0d76e0f0114a178e2726f8842c2b030e239c6b7e68f2111d7

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.11.4.tar.gz:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.11.4-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: filoma-1.11.4-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 445.9 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for filoma-1.11.4-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 02bbde0cc009c5cc1d550e6d94f65779bb8be56bdb3fea37a8531dd32baf6031
MD5 9cbfd05f830bc086a65f15c9ecf76472
BLAKE2b-256 18352ff9068e039d664cfb33a89914fb02721e347759fdcf751b29ff35b789bd

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.11.4-cp311-cp311-win_amd64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.11.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for filoma-1.11.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fe21a483fa6f2de368b7196fb0834f5e729d4f8e69194978e49fa791a863c503
MD5 c5c933b235b9c6db7be84bd4f1fe5d05
BLAKE2b-256 3f61e7a8562c8e2868f85fa3d29882040f65af8ecaa4f478de64f165cf1b4496

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.11.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.11.4-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for filoma-1.11.4-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 19b9044a1c9b1ccd280eef7ac1a36b65a6b41dfa39d69d685115e0448a77f3cc
MD5 a0dc03c01627057270484ec9a976415a
BLAKE2b-256 f5b30feab1df4bc2ba709e31426639cbb6ea9ccc19095c5cc33041a632220328

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.11.4-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page