Modular Python tool for profiling files, analyzing directory structures, and inspecting image data
Project description
Fast, multi-backend file/directory profiling and data preparation.
pip install filoma
Installation • Documentation • Interactive CLI • Quickstart • Cookbook • Roboflow Dataset Demo • Source Code
📖 New to Filoma? Check out the Cookbook for practical, copy-paste recipes for common tasks!
filoma helps you analyze file directory trees, inspect file metadata, and prepare your data for exploration. It can achieve this blazingly fast using the best available backend (Rust, fd, or pure Python) ⚡🍃
Key Features
- 🚀 High-Performance Backends: Automatic selection of Rust,
fd, or Python for the best performance. - 📈 DataFrame Integration: Convert scan results to Polars (or pandas) DataFrames for powerful analysis.
- 📊 Rich Directory Analysis: Get detailed statistics on file counts, extensions, sizes, and more.
- 🔍 Smart File Search: Use regex and glob patterns to find files with
FdFinder. - 🖼️ File/Image Profiling: Extract metadata and statistics from various file formats.
- 🏗️ Architectural Clarity: High-level visual flows for discovery and processing. 📖 Architecture Documentation →
- 🖥️ Interactive CLI: Beautiful terminal interface for filesystem exploration and DataFrame analysis 📖 CLI Documentation →
⚡ Quick Start & Capabilities
filoma provides a unified API for all your filesystem analysis needs. Whether you're inspecting a single file or a million-file directory, it stays fast and intuitive.
1. Simple File & Image Profiling
Extract rich metadata and statistics from any file or image with a single call.
import filoma as flm
# Profile any file
info = flm.probe_file("README.md")
print(info)
📄 See Metadata Output
Filo(
path=PosixPath('README.md'),
size=12237,
mode_str='-rw-rw-r--',
owner='user',
modified=datetime.datetime(2025, 12, 30, 22, 45, 53),
is_file=True,
...
)
For images, probe_image automatically extracts shapes, types, and pixel statistics.
2. Blazingly Fast Directory Analysis
Scan entire directory trees in milliseconds. filoma automatically picks the fastest available backend (Rust → fd → Python).
# Analyze a directory
analysis = flm.probe('.')
# Print a high-level summary
analysis.print_summary()
📂 See Directory Summary Table
Directory Analysis: /project (🦀 Rust (Parallel)) - 0.60s
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metric ┃ Value ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ Total Files │ 57,225 │
│ Total Folders │ 3,427 │
│ Total Size │ 2,084.90 MB │
│ Average Files per Folder │ 16.70 │
│ Maximum Depth │ 14 │
│ Empty Folders │ 103 │
│ Analysis Time │ 0.60s │
│ Processing Speed │ 102,114 items/sec │
└──────────────────────────┴──────────────────────┘
# Or get a detailed report with extensions and folder stats
analysis.print_report()
📊 See Detailed Directory Report
File Extensions
┏━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┓
┃ Extension ┃ Count ┃ Percentage ┃
┡━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━┩
│ .py │ 240 │ 12.8% │
│ .jpg │ 1,204 │ 64.2% │
│ .json │ 431 │ 23.0% │
│ .svg │ 28,674 │ 50.1% │
└────────────┴────────┴────────────┘
Common Folder Names
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Folder Name ┃ Occurrences ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ src │ 1 │
│ tests │ 1 │
│ docs │ 1 │
│ notebooks │ 1 │
└───────────────┴─────────────┘
Empty Folders (3 found)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Path ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ /project/data/raw/empty_set_A │
│ /project/logs/old/unused │
│ /project/temp/scratch │
└────────────────────────────────────────────┘
3. DataFrames & Enrichment
Convert scan results to Polars DataFrames for advanced analysis. Use .enrich() to instantly add path components, file stats, and hierarchy data.
# Scan and get an enriched filoma.DataFrame (Polars)
df = flm.probe_to_df('src', enrich=True)
print(df.head(2))
📊 See Enriched DataFrame Output
filoma.DataFrame with 2 rows
shape: (2, 18)
┌───────────────────┬───────┬────────┬───────────────┬───┬─────────┬───────┬────────┬────────┐
│ path ┆ depth ┆ parent ┆ name ┆ … ┆ inode ┆ nlink ┆ sha256 ┆ xattrs │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ str ┆ ┆ i64 ┆ i64 ┆ str ┆ str │
╞═══════════════════╪═══════╪════════╪═══════════════╪═══╪═════════╪═══════╪════════╪════════╡
│ src/async_scan.rs ┆ 1 ┆ src ┆ async_scan.rs ┆ … ┆ 7601121 ┆ 1 ┆ null ┆ {} │
│ src/filoma ┆ 1 ┆ src ┆ filoma ┆ … ┆ 7603126 ┆ 8 ┆ null ┆ {} │
└───────────────────┴───────┴────────┴───────────────┴───┴─────────┴───────┴────────┴────────┘
✨ Enriched columns added: parent, name, stem, suffix, size_bytes, modified_time,
created_time, is_file, is_dir, owner, group, mode_str, inode, nlink, sha256, xattrs, depth
- Seamless Pandas Integration: Just use
df.pandasfor instant conversion. - Lazy Loading:
import filomais cheap; heavy dependencies load only when needed.
4. Specialized DataFrame Operations
Filoma's DataFrame extends Polars with specialized filesystem operations, providing quick ways to filter and summarize your data.
# Filter by extensions
df.filter_by_extension([".py", ".rs"])
# Quick frequency analysis (counts)
df.extension_counts()
df.directory_counts()
🔍 See Operation Examples
filter_by_extension([".py", ".rs"])
shape: (3, 1)
┌─────────────────────┐
│ path │
│ --- │
│ str │
╞═════════════════════╡
│ src/async_scan.rs │
│ src/lib.rs │
│ src/filoma/dedup.py │
└─────────────────────┘
extension_counts()
Groups files by extension and returns counts.
shape: (3, 2)
┌────────────┬─────┐
│ extension ┆ len │
│ --- ┆ --- │
│ str ┆ u32 │
╞════════════╪═════╡
│ .py ┆ 240 │
│ .jpg ┆ 124 │
│ .json ┆ 43 │
└────────────┴─────┘
directory_counts()
Summarizes file distribution across parent directories.
shape: (3, 2)
┌────────────┬─────┐
│ parent_dir ┆ len │
│ --- ┆ --- │
│ str ┆ u32 │
╞════════════╪═════╡
│ src/filoma ┆ 12 │
│ tests ┆ 8 │
│ docs ┆ 5 │
└────────────┴─────┘
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Contributing
Contributions welcome! Please check the issues for planned features and bug reports.
filoma - Fast, multi-backend file/directory profiling and data preparation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file filoma-1.10.2.tar.gz.
File metadata
- Download URL: filoma-1.10.2.tar.gz
- Upload date:
- Size: 4.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e11498b350bdf11def0626f621b02273c96242dffa69230f62e581558e4cc2be
|
|
| MD5 |
184074cdd854112e3944dcdd0d1b8019
|
|
| BLAKE2b-256 |
61ceef20695f79d1a1a9df4161a6a59808f40c05e34a358005b6da7432af84f7
|
Provenance
The following attestation bundles were made for filoma-1.10.2.tar.gz:
Publisher:
publish.yml on kalfasyan/filoma
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filoma-1.10.2.tar.gz -
Subject digest:
e11498b350bdf11def0626f621b02273c96242dffa69230f62e581558e4cc2be - Sigstore transparency entry: 782503021
- Sigstore integration time:
-
Permalink:
kalfasyan/filoma@1ee4c7cef5094f42a2fd8080b6cf8ea261df3572 -
Branch / Tag:
refs/tags/v1.10.2 - Owner: https://github.com/kalfasyan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1ee4c7cef5094f42a2fd8080b6cf8ea261df3572 -
Trigger Event:
push
-
Statement type:
File details
Details for the file filoma-1.10.2-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: filoma-1.10.2-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 402.9 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8bc7d5c759196219966a517697eea5253a9545022ff5ce6d85fc06f2218ff613
|
|
| MD5 |
78dd7cd63e9bbd9a667b0809a005acd0
|
|
| BLAKE2b-256 |
a371b080e24be22578085fac8cf58306b6a91734063b5ac4800211e2e9ab6ef5
|
Provenance
The following attestation bundles were made for filoma-1.10.2-cp311-cp311-win_amd64.whl:
Publisher:
publish.yml on kalfasyan/filoma
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filoma-1.10.2-cp311-cp311-win_amd64.whl -
Subject digest:
8bc7d5c759196219966a517697eea5253a9545022ff5ce6d85fc06f2218ff613 - Sigstore transparency entry: 782503031
- Sigstore integration time:
-
Permalink:
kalfasyan/filoma@1ee4c7cef5094f42a2fd8080b6cf8ea261df3572 -
Branch / Tag:
refs/tags/v1.10.2 - Owner: https://github.com/kalfasyan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1ee4c7cef5094f42a2fd8080b6cf8ea261df3572 -
Trigger Event:
push
-
Statement type:
File details
Details for the file filoma-1.10.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: filoma-1.10.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 580.5 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
703a35dc141f02a8e1675181828e4c0cc5f8d7a3f75f0a6cd195c686c986b100
|
|
| MD5 |
e9920ff30dbeacc441acf055cb6cf1ce
|
|
| BLAKE2b-256 |
6efa7e3cd893dab0b8d4e8467e7d01a6418adaed2dcc9bde351ded4faa39c1a0
|
Provenance
The following attestation bundles were made for filoma-1.10.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on kalfasyan/filoma
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filoma-1.10.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
703a35dc141f02a8e1675181828e4c0cc5f8d7a3f75f0a6cd195c686c986b100 - Sigstore transparency entry: 782503025
- Sigstore integration time:
-
Permalink:
kalfasyan/filoma@1ee4c7cef5094f42a2fd8080b6cf8ea261df3572 -
Branch / Tag:
refs/tags/v1.10.2 - Owner: https://github.com/kalfasyan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1ee4c7cef5094f42a2fd8080b6cf8ea261df3572 -
Trigger Event:
push
-
Statement type:
File details
Details for the file filoma-1.10.2-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: filoma-1.10.2-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 528.1 kB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
397640c0b427bc65850f19df02c53264e2d37c07f163c60ea1479b68e174ac4b
|
|
| MD5 |
36f6d9ed053a11912cc8b04e224bb7f3
|
|
| BLAKE2b-256 |
27687c6a5071428e17e1e9655aa91e89e5f949ec6af3d9be691cbd50cb4dd736
|
Provenance
The following attestation bundles were made for filoma-1.10.2-cp311-cp311-macosx_11_0_arm64.whl:
Publisher:
publish.yml on kalfasyan/filoma
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filoma-1.10.2-cp311-cp311-macosx_11_0_arm64.whl -
Subject digest:
397640c0b427bc65850f19df02c53264e2d37c07f163c60ea1479b68e174ac4b - Sigstore transparency entry: 782503039
- Sigstore integration time:
-
Permalink:
kalfasyan/filoma@1ee4c7cef5094f42a2fd8080b6cf8ea261df3572 -
Branch / Tag:
refs/tags/v1.10.2 - Owner: https://github.com/kalfasyan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1ee4c7cef5094f42a2fd8080b6cf8ea261df3572 -
Trigger Event:
push
-
Statement type: