Modular Python tool for profiling files, analyzing directory structures, and inspecting image data
Project description
Fast, multi-backend file/directory profiling and data preparation for machine learning workflows.
🚧 Filoma is under active development — new features are being added regularly, APIs may evolve, and I'm always looking for feedback! Think of it as your friendly neighborhood file analysis toolkit that's still learning new tricks. Contributions, bug reports, and feature requests are more than welcome! 🎉
Installation • Documentation • Interactive CLI • Quickstart • Cookbook • Source Code
filoma helps you analyze file directory trees, inspect file metadata, and prepare your data for exploration and modelling. It can achieve this blazingly fast using the best available backend (Rust, fd, or pure Python) ⚡🍃
Key Features
- 🖥️ Interactive CLI: Beautiful terminal interface for filesystem exploration and DataFrame analysis 📖 CLI Documentation →
- 🚀 High-Performance Backends: Automatic selection of Rust,
fd, or Python for the best performance. - 📊 Rich Directory Analysis: Get detailed statistics on file counts, extensions, sizes, and more.
- 🔍 Smart File Search: Use regex and glob patterns to find files with
FdFinder. - 📈 DataFrame Integration: Convert scan results to Polars (or pandas) DataFrames for powerful analysis.
- 🖼️ File/Image Profiling: Extract metadata and statistics from various file formats.
- 🔀 ML-Ready Splits: Create deterministic train/validation/test datasets with ease.
Scope of filoma
CLI Demo
Feature Highlights
Quick, copyable examples showing filoma's standout capabilities and where to learn more.
- Automatic multi-backend scanning: filoma picks the fastest available backend (Rust →
fd→ pure Python). You can also force a backend for reproducibility. See the backends docs:docs/backends.md.
import filoma as flm
# filoma will pick Rust > fd > Python depending on availability
analysis = flm.probe('.')
analysis.print_summary()
- Polars-first DataFrame wrapper & enrichment: Returns a
filoma.DataFrame(Polars) with helpers to add path components, depth, and file stats for immediate analysis. Docs:docs/dataframe.md.
df = flm.probe_to_df('.', enrich=True) # returns a filoma.DataFrame
print(df.head())
- Ultra-fast discovery with
fd: Whenfdis available filoma uses it for very fast file discovery. Advanced usage and patterns:docs/advanced-usage.md.
if flm.fd.is_available():
files = flm.fd.find(pattern=r"\\.py$", path='src', max_depth=3)
print(len(files), 'python files found')
- ML-ready, deterministic splits: Group-aware, reproducible train/validation/test splitting to avoid leakage. See
docs/ml.mdfor grouping options and examples.
df = flm.probe_to_df('.', enrich=False)
train, val, test = flm.ml.split_data(df, train_val_test=(70,15,15), seed=42)
- Lightweight, lazy top-level API: Importing
filomais cheap; heavy dependencies load only when used. Quickstart and one-line helpers:docs/quickstart.md.
info = flm.probe_file('README.md')
df = flm.probe_to_df('.')
Installation
Install filoma using uv or pip:
uv pip install filoma
Workflow Demo
This guide follows a typical filoma workflow, from basic file profiling to creating machine learning datasets.
1. Profile a Single File
Start by inspecting a single file. filoma provides a detailed dataclass with metadata.
import filoma as flm
# Profile a file
file_info = flm.probe_file("README.md")
print(f"Path: {file_info.path}")
print(f"Size: {file_info.size_str}")
print(f"Modified: {file_info.modified}")
For images, probe_image gives you additional details like shape and pixel statistics.
# Profile an image
img_info = flm.probe_image("images/logo.png")
print(f"Type: {img_info.file_type}")
print(f"Shape: {img_info.shape}")
2. Analyze a Directory
Scan an entire directory to get a high-level overview.
# Analyze the current directory
analysis = flm.probe('.')
# Print a summary report
analysis.print_summary()
Directory Analysis: /project (🦀 Rust (Parallel)) - 0.27s
Total Files: 17,330 Total Folders: 2,427 Analysis Time: 0.27 s
3. Convert to a DataFrame
For detailed analysis, convert the scan results into a Polars DataFrame.
# Scan a directory and get a DataFrame
df = flm.probe_to_df('.')
print(df.head())
4. Enrich Your Data
Add more context to your DataFrame, like file depth and path components, with the enrich() method.
# The DataFrame returned by flm.probe_to_df is a filoma.DataFrame
# with extra capabilities.
df_enriched = df.enrich()
print(df_enriched.head())
5. Create ML-Ready Splits
filoma makes it easy to split your files into training, validation, and test sets for machine learning. You can even group files by parts of their path to prevent data leakage.
# Split the data, grouping by parent directory
train, val, test = flm.ml.split_data(df, how='parts', parts=(-2,), seed=42)
print(f"Train: {len(train)}, Validation: {len(val)}, Test: {len(test)}")
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Contributing
Contributions welcome! Please check the issues for planned features and bug reports.
filoma - Fast, multi-backend file/directory profiling and data preparation for Python.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file filoma-1.9.4.tar.gz.
File metadata
- Download URL: filoma-1.9.4.tar.gz
- Upload date:
- Size: 16.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00057558d726bb0d24f33bdbcf5c08652c770191d6cfae8f7f253eb9e963db06
|
|
| MD5 |
be12e7467b7b8b5ba260a8d38547665c
|
|
| BLAKE2b-256 |
bfbcfd552073bad806744e474c29466f14c27e2acb740c886fa2edc4cf2e3426
|
Provenance
The following attestation bundles were made for filoma-1.9.4.tar.gz:
Publisher:
publish.yml on kalfasyan/filoma
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filoma-1.9.4.tar.gz -
Subject digest:
00057558d726bb0d24f33bdbcf5c08652c770191d6cfae8f7f253eb9e963db06 - Sigstore transparency entry: 731886901
- Sigstore integration time:
-
Permalink:
kalfasyan/filoma@59476b0b85601e3ec6a6f2d1e057b2ae4a1b742a -
Branch / Tag:
refs/tags/v1.9.4 - Owner: https://github.com/kalfasyan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@59476b0b85601e3ec6a6f2d1e057b2ae4a1b742a -
Trigger Event:
push
-
Statement type:
File details
Details for the file filoma-1.9.4-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: filoma-1.9.4-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 415.7 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
350c9387b43f54663d682951fa4dde5f3ca77e91ff2ec0ea948f0ce347a40718
|
|
| MD5 |
06c08645e85cbff058c40a6b99ae8533
|
|
| BLAKE2b-256 |
18fbec3daadf2f1223bcaae4c09f9520177da775d2c6fdb76a7a2de30d761c3e
|
Provenance
The following attestation bundles were made for filoma-1.9.4-cp311-cp311-win_amd64.whl:
Publisher:
publish.yml on kalfasyan/filoma
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filoma-1.9.4-cp311-cp311-win_amd64.whl -
Subject digest:
350c9387b43f54663d682951fa4dde5f3ca77e91ff2ec0ea948f0ce347a40718 - Sigstore transparency entry: 731886907
- Sigstore integration time:
-
Permalink:
kalfasyan/filoma@59476b0b85601e3ec6a6f2d1e057b2ae4a1b742a -
Branch / Tag:
refs/tags/v1.9.4 - Owner: https://github.com/kalfasyan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@59476b0b85601e3ec6a6f2d1e057b2ae4a1b742a -
Trigger Event:
push
-
Statement type:
File details
Details for the file filoma-1.9.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: filoma-1.9.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 594.4 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d25a092126e7a7d3be5a8733284be43db6f1a67182f8939c08255567e3fe78f
|
|
| MD5 |
8c76f0c1e2082f4556a529de9b2ed097
|
|
| BLAKE2b-256 |
a98e442b7d2307b725a4df5a4b5d24a489d1e7b343dbba305eeaef657dcbdc3b
|
Provenance
The following attestation bundles were made for filoma-1.9.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on kalfasyan/filoma
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filoma-1.9.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
4d25a092126e7a7d3be5a8733284be43db6f1a67182f8939c08255567e3fe78f - Sigstore transparency entry: 731886903
- Sigstore integration time:
-
Permalink:
kalfasyan/filoma@59476b0b85601e3ec6a6f2d1e057b2ae4a1b742a -
Branch / Tag:
refs/tags/v1.9.4 - Owner: https://github.com/kalfasyan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@59476b0b85601e3ec6a6f2d1e057b2ae4a1b742a -
Trigger Event:
push
-
Statement type:
File details
Details for the file filoma-1.9.4-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: filoma-1.9.4-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 539.4 kB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e95c658b9c76bdc4229ffaa786cbf98721dcb6e0fb6fef1ba1fc6a575d620d3f
|
|
| MD5 |
a4e5e13c9e1b1568bcc30b22112227f4
|
|
| BLAKE2b-256 |
52c2c26285edd2e43ebc7028eebd940370ef14e0bfd12994ad422fe7f51e7cda
|
Provenance
The following attestation bundles were made for filoma-1.9.4-cp311-cp311-macosx_11_0_arm64.whl:
Publisher:
publish.yml on kalfasyan/filoma
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filoma-1.9.4-cp311-cp311-macosx_11_0_arm64.whl -
Subject digest:
e95c658b9c76bdc4229ffaa786cbf98721dcb6e0fb6fef1ba1fc6a575d620d3f - Sigstore transparency entry: 731886902
- Sigstore integration time:
-
Permalink:
kalfasyan/filoma@59476b0b85601e3ec6a6f2d1e057b2ae4a1b742a -
Branch / Tag:
refs/tags/v1.9.4 - Owner: https://github.com/kalfasyan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@59476b0b85601e3ec6a6f2d1e057b2ae4a1b742a -
Trigger Event:
push
-
Statement type: