Skip to main content

Modular Python tool for profiling files, analyzing directory structures, and inspecting image data

Project description

filoma logo

PyPI version Documentation Status Code style: ruff Contributions welcome Tests

Fast, multi-backend file/directory profiling and data preparation for machine learning workflows.

🚧 Filoma is under active development — new features are being added regularly, APIs may evolve, and I'm always looking for feedback! Think of it as your friendly neighborhood file analysis toolkit that's still learning new tricks. Contributions, bug reports, and feature requests are more than welcome! 🎉

InstallationDocumentationInteractive CLIQuickstartCookbookSource Code


filoma helps you analyze file directory trees, inspect file metadata, and prepare your data for exploration and modelling. It can achieve this blazingly fast using the best available backend (Rust, fd, or pure Python) ⚡🍃

Key Features

  • 🖥️ Interactive CLI: Beautiful terminal interface for filesystem exploration and DataFrame analysis 📖 CLI Documentation →
  • 🚀 High-Performance Backends: Automatic selection of Rust, fd, or Python for the best performance.
  • 📊 Rich Directory Analysis: Get detailed statistics on file counts, extensions, sizes, and more.
  • 🔍 Smart File Search: Use regex and glob patterns to find files with FdFinder.
  • 📈 DataFrame Integration: Convert scan results to Polars (or pandas) DataFrames for powerful analysis.
  • 🖼️ File/Image Profiling: Extract metadata and statistics from various file formats.
  • 🔀 ML-Ready Splits: Create deterministic train/validation/test datasets with ease.

Scope of filoma

filoma workflow diagram

CLI Demo

filoma CLI screenshot

Feature Highlights

Quick, copyable examples showing filoma's standout capabilities and where to learn more.

  • Automatic multi-backend scanning: filoma picks the fastest available backend (Rust → fd → pure Python). You can also force a backend for reproducibility. See the backends docs: docs/backends.md.
import filoma as flm

# filoma will pick Rust > fd > Python depending on availability
analysis = flm.probe('.')
analysis.print_summary()  # Pretty Rich table output
  • Polars-first DataFrame wrapper & enrichment: Returns a filoma.DataFrame (Polars) with helpers to add path components, depth, and file stats for immediate analysis. Docs: docs/dataframe.md.
df = flm.probe_to_df('.', enrich=True)  # returns a filoma.DataFrame
print(df.head())
  • Ultra-fast discovery with fd: When fd is available filoma uses it for very fast file discovery. Advanced usage and patterns: docs/advanced-usage.md.
from filoma.directories.fd_finder import FdFinder

finder = FdFinder()
if finder.is_available():
    files = finder.find_files(pattern=r"\.py$", path='src', max_depth=3)
    print(len(files), 'python files found')
  • ML-ready, deterministic splits: Group-aware, reproducible train/validation/test splitting to avoid leakage. See docs/ml.md for grouping options and examples.
df = flm.probe_to_df('.', enrich=False)
train, val, test = flm.ml.split_data(df, train_val_test=(70,15,15), seed=42)
  • Lightweight, lazy top-level API: Importing filoma is cheap; heavy dependencies load only when used. Quickstart and one-line helpers: docs/quickstart.md.
info = flm.probe_file('README.md')
df = flm.probe_to_df('.')

Installation

Install filoma using uv or pip:

uv pip install filoma

Workflow Demo

This guide follows a typical filoma workflow, from basic file profiling to creating machine learning datasets.

1. Profile a Single File

Start by inspecting a single file. filoma provides a detailed dataclass with metadata.

import filoma as flm

# Profile a file
file_info = flm.probe_file("README.md")

print(f"Path: {file_info.path}")
print(f"Size: {file_info.size} bytes")
print(f"Modified: {file_info.modified}")

For images, probe_image gives you additional details like shape and pixel statistics.

# Profile an image
img_info = flm.probe_image("images/logo.png")
print(f"Type: {img_info.file_type}")
print(f"Shape: {img_info.shape}")

2. Analyze a Directory

Scan an entire directory to get a high-level overview.

# Analyze the current directory
analysis = flm.probe('.')

# Print a beautiful summary table
analysis.print_summary()
Directory Analysis: /project (🦀 Rust (Parallel)) - 0.27s
Total Files: 17,330    Total Folders: 2,427    Analysis Time: 0.27 s

3. Convert to a DataFrame

For detailed analysis, convert the scan results into a Polars DataFrame.

# Scan a directory and get a DataFrame
df = flm.probe_to_df('.')

print(df.head())

4. Enrich Your Data

Add more context to your DataFrame, like file depth and path components, with the enrich() method.

# The DataFrame returned by flm.probe_to_df is a filoma.DataFrame
# with extra capabilities.
df_enriched = df.enrich()

print(df_enriched.head())

5. Create ML-Ready Splits

filoma makes it easy to split your files into training, validation, and test sets for machine learning. You can even group files by parts of their path to prevent data leakage.

# Split the data, grouping by parent directory
train, val, test = flm.ml.split_data(df, feature='path_parts', path_parts=(-2,), seed=42)

print(f"Train: {len(train)}, Validation: {len(val)}, Test: {len(test)}")

License

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

Contributing

Contributions welcome! Please check the issues for planned features and bug reports.


filoma - Fast, multi-backend file/directory profiling and data preparation for Python.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filoma-1.9.5.tar.gz (16.9 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

filoma-1.9.5-cp311-cp311-win_amd64.whl (415.8 kB view details)

Uploaded CPython 3.11Windows x86-64

filoma-1.9.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (594.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

filoma-1.9.5-cp311-cp311-macosx_11_0_arm64.whl (539.5 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file filoma-1.9.5.tar.gz.

File metadata

  • Download URL: filoma-1.9.5.tar.gz
  • Upload date:
  • Size: 16.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for filoma-1.9.5.tar.gz
Algorithm Hash digest
SHA256 1779460c8ff5fa4af3ea92d886782a9e202dda120c54c326fcba8a790a235f84
MD5 a0ca4f8b27b7f7576c4300dc1f1d8d1e
BLAKE2b-256 991cf522a8062e2e22dfcabd543a558012e8bb8d2610890a10440ec8ead5087f

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.9.5.tar.gz:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.9.5-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: filoma-1.9.5-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 415.8 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for filoma-1.9.5-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 ab7c3f6dc785b186204aaadab263cf5091615309bd0d4092f53d1d78f9ff6a3e
MD5 22e24c8b0226f479c1964618600bd000
BLAKE2b-256 d9148e8169f237a074eead88a831f950d92bd6ffd6c1bda6ca72d460a917a84f

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.9.5-cp311-cp311-win_amd64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.9.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for filoma-1.9.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a88a7a01b30058c299b7a96929f634b54b898f5d82d3f0e6de280da93034507f
MD5 b1a6255da614aad469c0f62f24aee947
BLAKE2b-256 201aca6e7860cb9de1aa0af059a178d600c0299613e383c0cef2a17da0eb6712

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.9.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.9.5-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for filoma-1.9.5-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d12f86f26240f8485c1a5ca7e5db26b87461c6521107d560c0c77efd1b346775
MD5 b0d04a954b9cea7edacbb910e46afaa1
BLAKE2b-256 ea4b5bfea69e94d54dd2ea070ed5523d73bdbb646d3d62a79429bc52a2b7cd73

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.9.5-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page