Skip to main content

Modular Python tool for profiling files, analyzing directory structures, and inspecting image data

Project description

filoma logo

PyPI version Code style: ruff Contributions welcome Tests

Fast, multi-backend file/directory profiling and data preparation for machine learning workflows.

InstallationQuickstartCookbookSource Code


filoma helps you analyze file directory trees, inspect file metadata, and prepare your data for exploration and modelling. It can achieve this blazingly fast using the best available backend (Rust, fd, or pure Python) ⚡🍃


Key Features

  • 🚀 High-Performance Backends: Automatic selection of Rust, fd, or Python for the best performance.
  • 📊 Rich Directory Analysis: Get detailed statistics on file counts, extensions, sizes, and more.
  • 🔍 Smart File Search: Use regex and glob patterns to find files with FdFinder.
  • 📈 DataFrame Integration: Convert scan results to Polars DataFrames for powerful analysis.
  • 🖼️ Image Profiling: Extract metadata and statistics from various image formats.
  • 🔀 ML-Ready Splits: Create deterministic train/validation/test datasets with ease.

Installation

Install filoma using uv or pip:

uv pip install filoma

Workflow Demo

This guide follows a typical filoma workflow, from basic file profiling to creating machine learning datasets.

1. Profile a Single File

Start by inspecting a single file. filoma provides a detailed dataclass with metadata.

from filoma import probe_file

# Profile a file
file_info = probe_file("README.md")

print(f"Path: {file_info.path}")
print(f"Size: {file_info.size_str}")
print(f"Modified: {file_info.modified}")

For images, probe_image gives you additional details like shape and pixel statistics.

from filoma import probe_image

# Profile an image
img_info = probe_image("images/logo.png")
print(f"Type: {img_info.file_type}")
print(f"Shape: {img_info.shape}")

2. Analyze a Directory

Scan an entire directory to get a high-level overview.

from filoma import probe

# Analyze the current directory
analysis = probe('.')

# Print a summary report
analysis.print_summary()
Directory Analysis: /project (🦀 Rust (Parallel)) - 0.27s
Total Files: 17,330    Total Folders: 2,427    Analysis Time: 0.27 s

3. Convert to a DataFrame

For detailed analysis, convert the scan results into a Polars DataFrame.

from filoma import probe_to_df

# Scan a directory and get a DataFrame
df = probe_to_df('.')

print(df.head())

4. Enrich Your Data

Add more context to your DataFrame, like file depth and path components, with the enrich() method.

# The DataFrame returned by probe_to_df is a filoma.DataFrame
# with extra capabilities.
df_enriched = df.enrich()

print(df_enriched.head())

5. Create ML-Ready Splits

filoma makes it easy to split your files into training, validation, and test sets for machine learning. You can even group files by parts of their path to prevent data leakage.

from filoma import ml

# Split the data, grouping by parent directory
train, val, test = ml.split_data(df, how='parts', parts=(-2,), seed=42)

print(f"Train: {len(train)}, Validation: {len(val)}, Test: {len(test)}")

License

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

Contributing

Contributions welcome! Please check the issues for planned features and bug reports.


filoma - Fast, multi-backend file/directory profiling and data preparation for Python.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filoma-1.7.4.tar.gz (5.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

filoma-1.7.4-cp311-cp311-win_amd64.whl (389.0 kB view details)

Uploaded CPython 3.11Windows x86-64

filoma-1.7.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (570.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

filoma-1.7.4-cp311-cp311-macosx_11_0_arm64.whl (516.7 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file filoma-1.7.4.tar.gz.

File metadata

  • Download URL: filoma-1.7.4.tar.gz
  • Upload date:
  • Size: 5.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for filoma-1.7.4.tar.gz
Algorithm Hash digest
SHA256 f22862473c6d6cd67addb155f697aa2eb1d4656f282a60bf16196d49c4b0deaa
MD5 2eb937e07d07626e7a386e7b4dbb95ef
BLAKE2b-256 725154fc6f512f76c87e3534e2d96e6f602e2e4236cf461f7fc5b0d142f9e806

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.7.4.tar.gz:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.7.4-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: filoma-1.7.4-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 389.0 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for filoma-1.7.4-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 bbd7777ac18d30c510649e990d20c7926a712ba5e9d8c8ec5bd61c94680a22cb
MD5 e1d9c8b5e1a82416fa79ba40c2e1f841
BLAKE2b-256 bd5b02318e79ae9f072e5f473510329bc5da9a72061bdb0dcde9c43c5b7225d1

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.7.4-cp311-cp311-win_amd64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.7.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for filoma-1.7.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e826c855231e380c4634a12627382172852c7f898969353cc51fe87507e083b0
MD5 96a9ddc4cb748bfe414ac067f148948a
BLAKE2b-256 8adbad7f6d4efd2eb746fdbc87897f6039dd0eb55fc99f9a0046874ae6af3969

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.7.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filoma-1.7.4-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for filoma-1.7.4-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 52a118dfd1cddd05bc34c6b13009f540b44d9f815a28998cf1522866bd247fba
MD5 9d6507de7d6653dc70fc85de9173b4f1
BLAKE2b-256 6ed52e66d10d686b335d789b87bf79dbad62dffb25581127983efc1721072f39

See more details on using hashes here.

Provenance

The following attestation bundles were made for filoma-1.7.4-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on kalfasyan/filoma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page