AI-powered image filename generator using Google Gemini - Transform generic image files into descriptive, SEO-friendly names

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Matija2209

These details have not been verified by PyPI

Project description

Image Filename AI

Overview

This application uses AI (Gemini) to automatically rename image files based on their content and generate descriptive alt text. It supports both flat and nested folder structures, making it perfect for organizing project-based image collections.

Features

AI-powered image analysis: Uses Google's Gemini model to understand image content
Intelligent filename generation: Creates descriptive, SEO-friendly filenames
Alt text generation: Generates accessible alt text for images
Nested folder support: Preserves directory structure for project-based organization
Image processing: Resize and reformat images during processing
Multiple logging modes: Flexible logging options for different use cases
Language support: Generate filenames and alt text in multiple languages

Requirements

Python: 3.11+ (tested on 3.11, 3.12, 3.13)
Google Cloud Platform: Project with Vertex AI enabled
Service Account: With required permissions (see Authentication section)

Installation

Option 1: Install from PyPI (Recommended)

# Install the core CLI tool
pip install image-filename-ai

# Or install with API dependencies
pip install "image-filename-ai[api]"

# Or install with development dependencies  
pip install "image-filename-ai[dev]"

Option 2: Local Development

Clone the repository:

git clone https://github.com/matija2209/image-filename-ai.git
cd image-filename-ai

Create virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install in development mode:

pip install -e ".[dev,api]"

Set up credentials (see Authentication section below)

Option 3: Docker (Recommended for API)

Clone the repository
Copy .env.example to .env and configure
Run with Docker Compose:

docker-compose up --build

Authentication & Credentials

Choose one of the following methods:

Method 1: Environment Variable (Recommended)

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-key.json"

Method 2: Place credentials in repo root

Place your serviceAccountKey.json file in the project root directory (automatically gitignored).

For Docker Usage

Uncomment the volume mount in compose.yml:

volumes:
  - ./serviceAccountKey.json:/app/credentials/credentials.json:ro

Required GCP Permissions

Your service account needs:

aiplatform.endpoints.predict (Vertex AI predictions)
storage.objects.get (read images from GCS)
storage.objects.create (create processed images)
firestore.documents.read/write (if using job tracking)

Usage

CLI Usage (Local Processing)

For a full, step-by-step CLI tutorial, see: CLI_GUIDE.md

For minimal GCP setup steps, see: GCP_SETUP.md

Basic command:

python cli.py --input-dir input --output-dir output --lang en

With custom settings:

python cli.py \
  --input-dir ./images \
  --output-dir ./processed \
  --lang de \
  --log-mode nested \
  --max-size 1920 \
  --quality 85 \
  --format webp

API Usage (Docker/Server)

Start the API server:

# Using Docker Compose (recommended)
docker-compose up

# Or locally
uvicorn app.main:app --host 0.0.0.0 --port 8000

Access the API:

Interactive docs: http://localhost:8000/docs
API endpoint: http://localhost:8000/api/v1/process
Health check: http://localhost:8000/

⚠️ Note: The API is currently unauthenticated - suitable for development only.

Environment Configuration

Copy .env.example to .env and adjust:

cp .env.example .env
# Edit .env with your settings

Key environment variables:

# Core GCP settings (used by both CLI and API)
PROJECT_ID=your-gcp-project-id
LOCATION=us-central1
MODEL_NAME=gemini-2.0-flash-exp
GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json

# CLI-specific settings (optional)
MAX_RETRIES=5           # Number of retry attempts
BASE_RETRY_DELAY=10     # Base delay between retries (seconds)
MAX_RETRY_DELAY=300     # Maximum delay cap (seconds)
RATE_LIMIT_DELAY=60     # Extra delay for rate limit errors

# Docker settings
COMPOSE_PORT_API=8000   # Port mapping for Docker Compose

📝 Note: The CLI automatically loads .env file from the project root if present.

Advanced Options

python cli.py \
  --input-dir input/laneks \
  --output-dir output/laneks \
  --lang sl \
  --format webp \
  --max-width 1920 \
  --log-mode project_level

Arguments

--input-dir: Directory containing input images (default: "input")
--output-dir: Base directory for processed images and logs (default: "output")
--lang: Target language code (e.g., 'en', 'sl', 'de') (default: "en")
--format: Output image format - jpg, png, webp, avif (default: original format)
--max-width: Maximum width in pixels for output images (default: original size)
--log-mode: Logging mode for results (default: "per_folder")

Logging Modes

The application supports three different logging modes to suit different organizational needs:

`per_folder` (Default)

Creates results.json and results.csv files in each folder where images are processed.

output/
├── project1/
│   ├── results.json
│   ├── results.csv
│   └── renamed-images...
└── project2/
    ├── results.json
    ├── results.csv
    └── renamed-images...

`project_level`

Creates one log file per top-level project folder.

output/
├── project1/
│   ├── results.json
│   ├── results.csv
│   ├── subfolder1/renamed-images...
│   └── subfolder2/renamed-images...
└── project2/
    ├── results.json
    ├── results.csv
    └── renamed-images...

`central`

Creates a single log file in the main output directory.

output/
├── results.json
├── results.csv
├── project1/renamed-images...
└── project2/renamed-images...

`flat`

Flattens the output structure - all processed images go directly to the main output directory with a single central log file. Perfect for processing deeply nested input folders when you want a simple flat output structure.

output/
├── results.json
├── results.csv
├── descriptive-name-1.webp
├── descriptive-name-2.webp
├── descriptive-name-3.webp
└── descriptive-name-4.webp

Note: In flat mode, filename conflicts are automatically resolved by adding a counter suffix (e.g., name-1.webp, name-2.webp).

Nested Folder Support

The application automatically preserves your input directory structure in the output:

Input Structure:

input/
├── laneks/
│   ├── projekt1/
│   │   ├── image1.jpg
│   │   └── image2.jpg
│   └── projekt2/
│       └── image3.jpg
└── other-client/
    └── flat-images/
        └── image4.jpg

Output Structure:

output/
├── laneks/
│   ├── projekt1/
│   │   ├── descriptive-name-1.webp
│   │   └── descriptive-name-2.webp
│   └── projekt2/
│       └── descriptive-name-3.webp
└── other-client/
    └── flat-images/
        └── descriptive-name-4.webp

This makes it perfect for:

Project-based workflows: Each client/project maintains its own folder structure
Mixed structures: Support both flat folders and deeply nested hierarchies
Team collaboration: Preserve organizational structure that teams are familiar with

Authentication

Set up Google Cloud authentication by placing your service account key file as serviceAccountKey.json in the project root, or use other Google Cloud authentication methods.

API Documentation

For web API usage, see API_DOCUMENTATION.md.

Examples

See EXAMPLE_DATA.json for sample API responses and data structures.

git pull && docker compose build && export GOOGLE_APPLICATION_CREDENTIALS='filename-ai-21694d9b8f6c.json' && docker compose up

FastAPI Application

A FastAPI application for processing images stored in Google Cloud Storage.

Features (FastAPI)

Process images from Google Cloud Storage
Generate descriptive, SEO-friendly filenames
Create alt text for accessibility and SEO
Support for multiple languages
REST API for easy integration

Docker Setup

Build and start the container:

docker compose build
docker compose up -d

Alternatively, pass the credentials path at runtime:

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-key.json"
docker compose up -e GOOGLE_APPLICATION_CREDENTIALS

To run with specific environment variables:

docker compose run -e GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/credentials.json" api

Requirements

Python 3.9+
Google Cloud Project with Vertex AI API enabled
Google Cloud credentials configured

Setup

Clone the repository
Install dependencies:
```
pip install -r requirements.txt
```

Configure the application (optional): Create a .env file in the project root with:

PROJECT_ID=your-gcp-project-id
LOCATION=us-central1
MODEL_NAME=gemini-2.0-flash-exp

Usage

Start the server:
```
python run.py
```
Access the API documentation at http://localhost:8000/docs

Make API requests:

curl -X POST http://localhost:8000/api/v1/process \
  -H "Content-Type: application/json" \
  -d '{
    "gcs_input_path": "gs://your-bucket/images",
    "language_code": "en"
  }'

API Endpoints

GET / - Health check endpoint
POST /api/v1/process - Process images from GCS bucket

Configuration (FastAPI)

The application can be configured using environment variables or a .env file:

PROJECT_ID - Google Cloud project ID
LOCATION - Google Cloud region
MODEL_NAME - Gemini model to use
HOST - Server host (default: 0.0.0.0)
PORT - Server port (default: 8000)

Command-Line Interface (CLI)

A CLI script (cli.py) for processing local image files.

Features (CLI)

Process images recursively from a local input directory.
Generate descriptive, SEO-friendly filenames using Vertex AI Gemini.
Create alt text for accessibility and SEO using Vertex AI Gemini.
Support for multiple languages for filenames and alt text.
Optionally convert images to different formats (JPG, PNG, WEBP, AVIF).
Optionally resize images to a maximum width, preserving aspect ratio.
Mirrors the input directory structure in the output directory.
Logs processing results to JSON and CSV files within each output subdirectory.

Requirements (CLI)

Python 3.9+
Google Cloud Project with Vertex AI API enabled
Google Cloud credentials configured (e.g., via gcloud auth application-default login)
Dependencies installed: pip install -r requirements.txt (Ensure Pillow is included for image processing)

Usage (CLI)

Run the script from the project root directory.

python cli.py --input-dir <path/to/input> --output-dir <path/to/output> [options]

Arguments:

--input-dir: Path to the directory containing input images (default: input).
--output-dir: Path to the base directory for processed images and logs (default: output). The script will maintain the subdirectory structure from the input directory.
--lang: Target language code for filename/alt text (e.g., 'en', 'sl', 'de') (default: en).
--format: Optional output image format ('jpg', 'png', 'webp', 'avif'). If omitted, the original format is kept.
--max-width: Optional maximum width in pixels for output images. Aspect ratio is preserved. If omitted, the original size is kept.

Examples:

Basic usage (English, keep original format/size):

python cli.py --input-dir path/to/your/images --output-dir processed/images

Process images, translate to German, resize to 800px max width:

python cli.py --input-dir images_raw --output-dir images_processed --lang de --max-width 800

Process images, convert to WEBP format:

python cli.py --input-dir photos --output-dir web_ready --format webp

Process specific subfolder, convert to AVIF (see Known Issues), max width 900px:

python cli.py --input-dir input/specific_folder --output-dir output --format avif --max-width 900

Known Issues

AVIF Conversion: There is a known issue when using the --format avif option with the CLI tool (cli.py). The underlying Pillow library might raise an error (Error processing image: 'AVIF') during the save operation, causing images to be skipped. This might be related to specific image modes (e.g., RGBA) or Pillow's AVIF encoder capabilities.
- Troubleshooting (macOS): AVIF support in Pillow often depends on the libavif system library. If you encounter errors with AVIF:
  1. Install the library using Homebrew: brew install libavif
  2. Reinstall Pillow from source within your virtual environment to ensure it detects libavif: pip install --force-reinstall --no-cache-dir --no-binary Pillow Pillow
- Using other formats like JPG, PNG, or WEBP is recommended if AVIF conversion fails or the troubleshooting steps are not feasible.

License

MIT

Practical Examples

Example 1: Process a single project folder

# Process images from a specific project, resize to max 1920px width, convert to WebP
python cli.py \
  --input-dir input/laneks/projekt2 \
  --output-dir output/laneks/projekt2 \
  --lang en \
  --format webp \
  --max-width 1920 \
  --log-mode per_folder

Example 2: Process all projects for a client with project-level logs

# Process all projects for the 'laneks' client, create one log per project
python cli.py \
  --input-dir input/laneks \
  --output-dir output/laneks \
  --lang sl \
  --format webp \
  --max-width 1920 \
  --log-mode project_level

Example 3: Batch process multiple clients with central logging

# Process everything with a single centralized log file
python cli.py \
  --input-dir input \
  --output-dir output \
  --lang en \
  --format avif \
  --max-width 1600 \
  --log-mode central

Example 4: Keep original format but resize

# Just resize images without changing format
python cli.py \
  --input-dir input/large-images \
  --output-dir output/resized \
  --max-width 800 \
  --log-mode per_folder

Example 5: Flatten deeply nested structure

# Process deeply nested folders but output everything to a flat structure
python cli.py \
  --input-dir input/complex-nested-structure \
  --output-dir output/flattened \
  --lang en \
  --format webp \
  --max-width 1600 \
  --log-mode flat

Common Use Cases

Photography Studios

Input: Client folders with project subfolders
Settings: --log-mode project_level --format webp --max-width 2048
Result: Each project gets its own log, images optimized for web

E-commerce

Input: Product category folders
Settings: --log-mode central --format webp --max-width 1200
Result: All products processed with central tracking

Web Development

Input: Mixed folder structures
Settings: --format avif --max-width 1920 --log-mode per_folder
Result: Modern format with excellent compression, detailed logs

Digital Asset Management

Input: Complex nested folder structures from various sources
Settings: --log-mode flat --format webp --max-width 1600
Result: All assets in one flat directory with descriptive names, single tracking log

# Simple renaming in English
python cli.py --input-dir input/photos --output-dir output/renamed --lang en

# German language with WebP conversion and resizing
python cli.py --input-dir input/photos --output-dir output/optimized \
  --lang de --format webp --max-width 1024

# Project-level logging for organized results
python cli.py --input-dir input/company-photos --output-dir output/processed \
  --lang en --log-mode project_level

📖 Command Line Options

Option	Description	Default
`--input-dir`	Directory containing input images	`input`
`--output-dir`	Base directory for processed images	`output`
`--lang`	Target language code (en, de, sl, fr, etc.)	`en`
`--format`	Output format (jpg, png, webp, avif)	Original
`--max-width`	Maximum width in pixels	Original
`--log-mode`	Logging mode (central, project_level, per_folder, flat)	`per_folder`
`--max-retries`	Maximum retry attempts for API calls	`5`

📊 Logging Modes

`per_folder` (Default)

Creates results.json and results.csv in each output subdirectory.

`project_level`

Creates one log file per top-level project folder.

`central`

Single log file in the main output directory.

`flat`

Flattens directory structure with central logging.

🔄 Resume Functionality

The tool automatically resumes interrupted processing:

Scans existing logs: Checks all results.json files in output directory
Identifies processed files: Uses original_filename field for tracking
Skips completed work: Only processes new or failed images
Handles rate limits: Exponential backoff with up to 5 retry attempts

Example resume scenario:

# First run - processes 20 files, hits rate limit
python cli.py --input-dir photos --output-dir output --lang de

# Resume run - skips 20 completed files, continues with remaining
python cli.py --input-dir photos --output-dir output --lang de

🛠️ Advanced Configuration

Retry Logic

Base delay: 10 seconds, doubles with each retry
Rate limit delay: Additional 60 seconds for quota errors
Maximum delay: Capped at 5 minutes
Smart detection: Recognizes various rate limiting error messages

Image Processing

Supported formats: JPG, JPEG, PNG, WebP
Output formats: JPG, PNG, WebP, AVIF
Resizing: Maintains aspect ratio when using --max-width
Quality: WebP output at 90% quality

📁 Project Structure

image-filename-ai/
├── cli.py                    # Main application
├── app/
│   └── utils/
│       ├── ai_handler.py     # Gemini AI integration
│       ├── file_utils.py     # File operations and logging
│       └── image_processor.py # Image processing and conversion
├── input/                    # Your source images
└── output/                   # Generated results
    ├── project1/
    │   ├── results.json      # Processing log
    │   ├── results.csv       # CSV export
    │   └── *.webp           # Renamed images
    └── project2/
        └── ...

📈 Example Output

Generated Filenames

IMG_1234.jpg → sunset-mountain-landscape-golden-hour.webp
photo.png → office-desk-computer-workspace-clean.webp
image.jpg → family-portrait-garden-summer-happy.webp

Log Entry

{
  "timestamp": "2025-05-25 09:21:31",
  "original_path": "input/photos/IMG_1234.jpg",
  "new_path": "output/photos/sunset-mountain-landscape.webp",
  "original_filename": "IMG_1234.jpg",
  "new_filename": "sunset-mountain-landscape.webp",
  "alt_text": "A beautiful sunset over mountain peaks with golden light illuminating the landscape."
}

🌍 Language Support

The tool supports any language supported by Gemini AI. Common examples:

--lang en - English
--lang de - German (Deutsch)
--lang sl - Slovenian
--lang fr - French
--lang es - Spanish
--lang it - Italian
--lang pt - Portuguese

🔧 Development & Testing

Running Tests

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=app --cov=cli

# Test specific module
pytest tests/test_cli.py -v

Code Quality

# Format code
black .

# Lint code
ruff check .

# Fix linting issues
ruff check . --fix

Development Setup

# Install development dependencies (included in requirements.txt)
pip install -r requirements.txt

# Run API in development mode with auto-reload
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Architecture

CLI Mode: Direct local processing using Gemini API

Input: Local image directories
Output: Processed images with generated names
Use case: Batch processing, one-time organization

API Mode: Web service for on-demand processing

Input: GCS bucket URLs or direct uploads
Output: Background job processing with status tracking
Use case: Integration with other systems, web applications

📋 Production TODO

Add API authentication (API keys, JWT, OAuth)
Add rate limiting per client/endpoint
Add input validation and sanitization
Add comprehensive logging and monitoring
Add image virus scanning before processing
Add batch processing for large image sets
Add webhook notifications for job completion
Add cost monitoring for Vertex AI usage
Package CLI as standalone executable (PyInstaller)
Add retry logic for failed AI requests
Add progress bars for CLI processing

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Matija2209

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Aug 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

image_filename_ai-0.1.0.tar.gz (45.7 kB view details)

Uploaded Aug 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

image_filename_ai-0.1.0-py3-none-any.whl (33.8 kB view details)

Uploaded Aug 12, 2025 Python 3

File details

Details for the file image_filename_ai-0.1.0.tar.gz.

File metadata

Download URL: image_filename_ai-0.1.0.tar.gz
Upload date: Aug 12, 2025
Size: 45.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for image_filename_ai-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`9a9bc6c8fc4177107da5234382c323dc1926c140c1134a7a95410d5554846c9d`
MD5	`5ef9367e05992ee5a8c169afe5625df6`
BLAKE2b-256	`6bd13e86c4fc258616090391aa26a68f2c2e394ba76bf752bf55b840b3327897`

See more details on using hashes here.

Provenance

The following attestation bundles were made for image_filename_ai-0.1.0.tar.gz:

Publisher: pypi-publish.yml on matija2209/image-filename-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: image_filename_ai-0.1.0.tar.gz
- Subject digest: 9a9bc6c8fc4177107da5234382c323dc1926c140c1134a7a95410d5554846c9d
- Sigstore transparency entry: 385369554
- Sigstore integration time: Aug 12, 2025
Source repository:
- Permalink: matija2209/image-filename-ai@5cda342c1a2ed88e1ce713dc28e97b048422c4cc
- Branch / Tag: refs/tags/v0.0.2
- Owner: https://github.com/matija2209
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@5cda342c1a2ed88e1ce713dc28e97b048422c4cc
- Trigger Event: release

File details

Details for the file image_filename_ai-0.1.0-py3-none-any.whl.

File metadata

Download URL: image_filename_ai-0.1.0-py3-none-any.whl
Upload date: Aug 12, 2025
Size: 33.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for image_filename_ai-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a0ae8a3617045e9ecfc87ec17e3a4a9009d687cbf3b00a450631228f78249194`
MD5	`56a901a202ddb6ad6569401e6bb07eb7`
BLAKE2b-256	`05cde52ae0e226d794a4f6a059dbe1a4803a83620306aebec28ecbf352e97c4c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for image_filename_ai-0.1.0-py3-none-any.whl:

Publisher: pypi-publish.yml on matija2209/image-filename-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: image_filename_ai-0.1.0-py3-none-any.whl
- Subject digest: a0ae8a3617045e9ecfc87ec17e3a4a9009d687cbf3b00a450631228f78249194
- Sigstore transparency entry: 385369567
- Sigstore integration time: Aug 12, 2025
Source repository:
- Permalink: matija2209/image-filename-ai@5cda342c1a2ed88e1ce713dc28e97b048422c4cc
- Branch / Tag: refs/tags/v0.0.2
- Owner: https://github.com/matija2209
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@5cda342c1a2ed88e1ce713dc28e97b048422c4cc
- Trigger Event: release

image-filename-ai 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Image Filename AI

Overview

Features

Requirements

Installation

Option 1: Install from PyPI (Recommended)

Option 2: Local Development

Option 3: Docker (Recommended for API)

Authentication & Credentials

Method 1: Environment Variable (Recommended)

Method 2: Place credentials in repo root

For Docker Usage

Required GCP Permissions

Usage

CLI Usage (Local Processing)

API Usage (Docker/Server)

Environment Configuration

Advanced Options

Arguments

Logging Modes

per_folder (Default)

project_level

central

flat

Nested Folder Support

Authentication

API Documentation

Examples

FastAPI Application

Features (FastAPI)

Docker Setup

Requirements

Setup

Usage

API Endpoints

Configuration (FastAPI)

Command-Line Interface (CLI)

Features (CLI)

Requirements (CLI)

Usage (CLI)

Known Issues

License

Practical Examples

Example 1: Process a single project folder

Example 2: Process all projects for a client with project-level logs

Example 3: Batch process multiple clients with central logging

Example 4: Keep original format but resize

Example 5: Flatten deeply nested structure

Common Use Cases

Photography Studios

E-commerce

Web Development

Digital Asset Management

📖 Command Line Options

📊 Logging Modes

per_folder (Default)

project_level

central

flat

🔄 Resume Functionality

🛠️ Advanced Configuration

Retry Logic

Image Processing

📁 Project Structure

📈 Example Output

Generated Filenames

Log Entry

🌍 Language Support

🔧 Development & Testing

Running Tests

`per_folder` (Default)

`project_level`

`central`

`flat`

`per_folder` (Default)

`project_level`

`central`

`flat`