Skip to main content

Batch image directory processing pipelines with reusable steps

Project description

flowimds logo

flowimds

PyPI Publish workflow status License Python Versions

Flowimds delivers reusable image-processing pipelines for entire directories—compose steps and let the tool handle the batch work for you.

Japanese version

✨ Highlights

  • ♻️ Batch processing at scale — Traverse entire directories with optional recursive scanning.
  • 🗂️ Structure-aware outputs — Mirror the input folder layout when preserving directory structures.
  • 🧩 Rich step library — Combine resizing, grayscale conversion, rotations, flips, binarisation, denoising, and custom steps.
  • 🔄 Flexible execution modes — Operate on folders, explicit file lists, or in-memory NumPy arrays.
  • 🧪 Deterministic fixtures — Recreate test data whenever needed for reproducible pipelines.
  • 🤖 Expanding step roadmap — More transformations, including AI-assisted steps, are planned.
  • 📁 Flattened outputs available — Optionally disable structure preservation to write everything into a single directory.

🚀 Quick start

All primary classes are re-exported from the package root, so pipelines can be described through a concise namespace:

# Import the flowimds package
import flowimds as fi

# Define the pipeline
# Args:
#   steps: sequence of pipeline steps
#   worker_count: number of parallel workers (default: ~70% of CPU cores)
#   log: whether to show progress bar (default: False)
pipeline = fi.Pipeline(
    steps=[
        fi.ResizeStep((128, 128)),
        fi.GrayscaleStep(),
    ],
)

# Run the pipeline
# Args:
#   input_path: directory to scan for images
#   recursive: whether to traverse subdirectories (default: False)
result = pipeline.run(input_path="samples/input", recursive=True)

# Save the results
# Args:
#   output_path: destination directory
#   preserve_structure: whether to mirror the input tree (default: False)
result.save("samples/output", preserve_structure=True)

# Inspect the result
# Fields:
#   processed_count: number of successfully processed images
#   failed_count: number of images that failed to process
#   failed_files: paths of the images that failed
print(f"Processed {result.processed_count} images")

📦 Installation

  • Python 3.12+
  • uv or pip for dependency management
  • uv is recommended

uv

uv add flowimds

pip

pip install flowimds

From source

git clone https://github.com/mori-318/flowimds.git
cd flowimds
uv sync

📚 Documentation

🔬 Benchmarks

Compare the legacy (v0.2.1-) and current (v1.0.2+) pipeline implementations with the bundled helper script. Running via uv keeps dependencies and the virtual environment consistent:

# count: number of synthetic images to generate (default `5000`)
# workers: maximum worker threads (`0` auto-detects CPU cores)
uv run python scripts/benchmark_pipeline.py --count 5000 --workers 8
  • --count: number of synthetic images to generate (default 5000).
  • --workers: maximum worker threads (0 auto-detects CPU cores).
  • --seed: specify the seed (default 42) for reproducible comparisons.

The script prints processing times for each pipeline variant and cleans up temporary outputs afterward.

🆘 Support

Questions and bug reports are welcome via the GitHub issue tracker.

🤝 Contributing

We follow a GitFlow-based workflow to keep the library stable while enabling parallel development:

  • main — release-ready code (tagged as vX.Y.Z).
  • develop — staging area for the next release.
  • feature/ — focused branches for scoped work.
  • release/ — branches dedicated to preparing releases.
  • hotfix/ — branches for urgent fixes.
  • docs/ — branches for documentation updates.

For contribution flow details, see docs/CONTRIBUTING.md or the Japanese guide docs/CONTRIBUTING_ja.md.

🛠️ Development

# Install dependencies
uv sync --all-extras --dev

# Lint and format (apply fixes when needed)
uv run black .
uv run ruff format .

# Lint and format (verify)
uv run black --check .
uv run ruff check .
uv run ruff format --check .

# Regenerate deterministic fixtures when needed
uv run python scripts/generate_test_data.py

# Run tests
uv run pytest

Docker powered environment

You can standardize the development environment inside containers built from docker/Dockerfile. Dependencies are installed with uv sync --all-extras --dev during build, so any uv command (e.g., uv run pytest) is reproducible.

Two typical workflows exist:

  1. Run the suite once in a disposable container (container exits when tests finish):

    docker compose -f docker/docker-compose.yml up --build
    
  2. Open an interactive shell for iterative work (recommended while developing):

    # Build the image (no-op if cached)
    docker compose -f docker/docker-compose.yml build
    
    # Start an interactive container with a clean shell
    docker compose -f docker/docker-compose.yml run --rm app bash
    
    # Inside the container (already at /app)
    uv sync --all-extras --dev   # install deps into the mounted .venv
    uv run pytest
    uv run black --check .
    

docker compose exec app ... works only while a container started with up is still running. Because the default command runs uv run pytest and exits immediately, use run --rm app bash whenever you need an interactive session.

Dev Container

A VS Code Dev Container configuration is provided under .devcontainer/. If you use the Dev Containers extension, you can open this repository in a container and work inside a reproducible Docker-based development environment.

Using with VS Code

  1. Install and start Docker.

  2. Install the "Dev Containers" extension in VS Code (if you do not have it yet).

  3. Open this repository in VS Code and run "Dev Containers: Reopen in Container" from the command palette.

  4. Inside the container, install dependencies and run the usual development commands:

    uv sync --all-extras --dev
    uv run pytest
    uv run black --check .
    

📄 License

This project is released under the MIT License.

📌 Project status

Stable releases are already published on PyPI (v1.0.2), and we continue to iterate toward upcoming updates. Watch the repository for new tags and changelog announcements.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowimds-1.0.2.tar.gz (26.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flowimds-1.0.2-py3-none-any.whl (23.8 kB view details)

Uploaded Python 3

File details

Details for the file flowimds-1.0.2.tar.gz.

File metadata

  • Download URL: flowimds-1.0.2.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for flowimds-1.0.2.tar.gz
Algorithm Hash digest
SHA256 61d29dddf376542699c401729409eca5c111f345fb3dfb78341a0738290fa77b
MD5 5bf64056fb810ec8a96b0396ae3e1464
BLAKE2b-256 a6dd63c5c5e0a72b37e4605da98828bb3509340868800efc1adcd4136c0c5e57

See more details on using hashes here.

Provenance

The following attestation bundles were made for flowimds-1.0.2.tar.gz:

Publisher: publish.yml on mori-318/flowimds

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flowimds-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: flowimds-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 23.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for flowimds-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 03bfe5855d3f39fedeb01728b9a2bc618a985e97ebde90510cebe305109dc362
MD5 d55f335b2f0af6117c0c0412d99b48ae
BLAKE2b-256 7c95057084263e3d2d812cc7002cb9e3fae282034c3a270e8869ac42653070b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for flowimds-1.0.2-py3-none-any.whl:

Publisher: publish.yml on mori-318/flowimds

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page