Skip to main content

AI-powered image filename generator using Google Gemini - Transform generic image files into descriptive, SEO-friendly names

Project description

Image Filename AI

Overview

This application uses AI (Gemini) to automatically rename image files based on their content and generate descriptive alt text. It supports both flat and nested folder structures, making it perfect for organizing project-based image collections.

Features

  • AI-powered image analysis: Uses Google's Gemini model to understand image content
  • Intelligent filename generation: Creates descriptive, SEO-friendly filenames
  • Alt text generation: Generates accessible alt text for images
  • Nested folder support: Preserves directory structure for project-based organization
  • Image processing: Resize and reformat images during processing
  • Multiple logging modes: Flexible logging options for different use cases
  • Language support: Generate filenames and alt text in multiple languages

Requirements

  • Python: 3.11+ (tested on 3.11, 3.12, 3.13)
  • Google Cloud Platform: Project with Vertex AI enabled
  • Service Account: With required permissions (see Authentication section)

Installation

Option 1: Install from PyPI (Recommended)

# Install the core CLI tool
pip install image-filename-ai

# Or install with API dependencies
pip install "image-filename-ai[api]"

# Or install with development dependencies  
pip install "image-filename-ai[dev]"

Option 2: Local Development

  1. Clone the repository:
git clone https://github.com/matija2209/image-filename-ai.git
cd image-filename-ai
  1. Create virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install in development mode:
pip install -e ".[dev,api]"
  1. Set up credentials (see Authentication section below)

Option 3: Docker (Recommended for API)

  1. Clone the repository
  2. Copy .env.example to .env and configure
  3. Run with Docker Compose:
docker-compose up --build

Authentication & Credentials

Choose one of the following methods:

Method 1: Environment Variable (Recommended)

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-key.json"

Method 2: Place credentials in repo root

Place your serviceAccountKey.json file in the project root directory (automatically gitignored).

For Docker Usage

Uncomment the volume mount in compose.yml:

volumes:
  - ./serviceAccountKey.json:/app/credentials/credentials.json:ro

Required GCP Permissions

Your service account needs:

  • aiplatform.endpoints.predict (Vertex AI predictions)
  • storage.objects.get (read images from GCS)
  • storage.objects.create (create processed images)
  • firestore.documents.read/write (if using job tracking)

Usage

CLI Usage (Local Processing)

For a full, step-by-step CLI tutorial, see: CLI_GUIDE.md

For minimal GCP setup steps, see: GCP_SETUP.md

Basic command:

python cli.py --input-dir input --output-dir output --lang en

With custom settings:

python cli.py \
  --input-dir ./images \
  --output-dir ./processed \
  --lang de \
  --log-mode nested \
  --max-size 1920 \
  --quality 85 \
  --format webp

API Usage (Docker/Server)

Start the API server:

# Using Docker Compose (recommended)
docker-compose up

# Or locally
uvicorn app.main:app --host 0.0.0.0 --port 8000

Access the API:

โš ๏ธ Note: The API is currently unauthenticated - suitable for development only.

Environment Configuration

Copy .env.example to .env and adjust:

cp .env.example .env
# Edit .env with your settings

Key environment variables:

# Core GCP settings (used by both CLI and API)
PROJECT_ID=your-gcp-project-id
LOCATION=us-central1
MODEL_NAME=gemini-2.0-flash-exp
GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json

# CLI-specific settings (optional)
MAX_RETRIES=5           # Number of retry attempts
BASE_RETRY_DELAY=10     # Base delay between retries (seconds)
MAX_RETRY_DELAY=300     # Maximum delay cap (seconds)
RATE_LIMIT_DELAY=60     # Extra delay for rate limit errors

# Docker settings
COMPOSE_PORT_API=8000   # Port mapping for Docker Compose

๐Ÿ“ Note: The CLI automatically loads .env file from the project root if present.

Advanced Options

python cli.py \
  --input-dir input/laneks \
  --output-dir output/laneks \
  --lang sl \
  --format webp \
  --max-width 1920 \
  --log-mode project_level

Arguments

  • --input-dir: Directory containing input images (default: "input")
  • --output-dir: Base directory for processed images and logs (default: "output")
  • --lang: Target language code (e.g., 'en', 'sl', 'de') (default: "en")
  • --format: Output image format - jpg, png, webp, avif (default: original format)
  • --max-width: Maximum width in pixels for output images (default: original size)
  • --log-mode: Logging mode for results (default: "per_folder")

Logging Modes

The application supports three different logging modes to suit different organizational needs:

per_folder (Default)

Creates results.json and results.csv files in each folder where images are processed.

output/
โ”œโ”€โ”€ project1/
โ”‚   โ”œโ”€โ”€ results.json
โ”‚   โ”œโ”€โ”€ results.csv
โ”‚   โ””โ”€โ”€ renamed-images...
โ””โ”€โ”€ project2/
    โ”œโ”€โ”€ results.json
    โ”œโ”€โ”€ results.csv
    โ””โ”€โ”€ renamed-images...

project_level

Creates one log file per top-level project folder.

output/
โ”œโ”€โ”€ project1/
โ”‚   โ”œโ”€โ”€ results.json
โ”‚   โ”œโ”€โ”€ results.csv
โ”‚   โ”œโ”€โ”€ subfolder1/renamed-images...
โ”‚   โ””โ”€โ”€ subfolder2/renamed-images...
โ””โ”€โ”€ project2/
    โ”œโ”€โ”€ results.json
    โ”œโ”€โ”€ results.csv
    โ””โ”€โ”€ renamed-images...

central

Creates a single log file in the main output directory.

output/
โ”œโ”€โ”€ results.json
โ”œโ”€โ”€ results.csv
โ”œโ”€โ”€ project1/renamed-images...
โ””โ”€โ”€ project2/renamed-images...

flat

Flattens the output structure - all processed images go directly to the main output directory with a single central log file. Perfect for processing deeply nested input folders when you want a simple flat output structure.

output/
โ”œโ”€โ”€ results.json
โ”œโ”€โ”€ results.csv
โ”œโ”€โ”€ descriptive-name-1.webp
โ”œโ”€โ”€ descriptive-name-2.webp
โ”œโ”€โ”€ descriptive-name-3.webp
โ””โ”€โ”€ descriptive-name-4.webp

Note: In flat mode, filename conflicts are automatically resolved by adding a counter suffix (e.g., name-1.webp, name-2.webp).

Nested Folder Support

The application automatically preserves your input directory structure in the output:

Input Structure:

input/
โ”œโ”€โ”€ laneks/
โ”‚   โ”œโ”€โ”€ projekt1/
โ”‚   โ”‚   โ”œโ”€โ”€ image1.jpg
โ”‚   โ”‚   โ””โ”€โ”€ image2.jpg
โ”‚   โ””โ”€โ”€ projekt2/
โ”‚       โ””โ”€โ”€ image3.jpg
โ””โ”€โ”€ other-client/
    โ””โ”€โ”€ flat-images/
        โ””โ”€โ”€ image4.jpg

Output Structure:

output/
โ”œโ”€โ”€ laneks/
โ”‚   โ”œโ”€โ”€ projekt1/
โ”‚   โ”‚   โ”œโ”€โ”€ descriptive-name-1.webp
โ”‚   โ”‚   โ””โ”€โ”€ descriptive-name-2.webp
โ”‚   โ””โ”€โ”€ projekt2/
โ”‚       โ””โ”€โ”€ descriptive-name-3.webp
โ””โ”€โ”€ other-client/
    โ””โ”€โ”€ flat-images/
        โ””โ”€โ”€ descriptive-name-4.webp

This makes it perfect for:

  • Project-based workflows: Each client/project maintains its own folder structure
  • Mixed structures: Support both flat folders and deeply nested hierarchies
  • Team collaboration: Preserve organizational structure that teams are familiar with

Authentication

Set up Google Cloud authentication by placing your service account key file as serviceAccountKey.json in the project root, or use other Google Cloud authentication methods.

API Documentation

For web API usage, see API_DOCUMENTATION.md.

Examples

See EXAMPLE_DATA.json for sample API responses and data structures.

git pull && docker compose build && export GOOGLE_APPLICATION_CREDENTIALS='filename-ai-21694d9b8f6c.json' && docker compose up

FastAPI Application

A FastAPI application for processing images stored in Google Cloud Storage.

Features (FastAPI)

  • Process images from Google Cloud Storage
  • Generate descriptive, SEO-friendly filenames
  • Create alt text for accessibility and SEO
  • Support for multiple languages
  • REST API for easy integration

Docker Setup

  1. Build and start the container:

    docker compose build
    docker compose up -d
    
  2. Alternatively, pass the credentials path at runtime:

    export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-key.json"
    docker compose up -e GOOGLE_APPLICATION_CREDENTIALS
    
  3. To run with specific environment variables:

    docker compose run -e GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/credentials.json" api
    

Requirements

  • Python 3.9+
  • Google Cloud Project with Vertex AI API enabled
  • Google Cloud credentials configured

Setup

  1. Clone the repository
  2. Install dependencies:
    pip install -r requirements.txt
    
  3. Configure the application (optional): Create a .env file in the project root with:
    PROJECT_ID=your-gcp-project-id
    LOCATION=us-central1
    MODEL_NAME=gemini-2.0-flash-exp
    

Usage

  1. Start the server:
    python run.py
    
  2. Access the API documentation at http://localhost:8000/docs
  3. Make API requests:
    curl -X POST http://localhost:8000/api/v1/process \
      -H "Content-Type: application/json" \
      -d '{
        "gcs_input_path": "gs://your-bucket/images",
        "language_code": "en"
      }'
    

API Endpoints

  • GET / - Health check endpoint
  • POST /api/v1/process - Process images from GCS bucket

Configuration (FastAPI)

The application can be configured using environment variables or a .env file:

  • PROJECT_ID - Google Cloud project ID
  • LOCATION - Google Cloud region
  • MODEL_NAME - Gemini model to use
  • HOST - Server host (default: 0.0.0.0)
  • PORT - Server port (default: 8000)

Command-Line Interface (CLI)

A CLI script (cli.py) for processing local image files.

Features (CLI)

  • Process images recursively from a local input directory.
  • Generate descriptive, SEO-friendly filenames using Vertex AI Gemini.
  • Create alt text for accessibility and SEO using Vertex AI Gemini.
  • Support for multiple languages for filenames and alt text.
  • Optionally convert images to different formats (JPG, PNG, WEBP, AVIF).
  • Optionally resize images to a maximum width, preserving aspect ratio.
  • Mirrors the input directory structure in the output directory.
  • Logs processing results to JSON and CSV files within each output subdirectory.

Requirements (CLI)

  • Python 3.9+
  • Google Cloud Project with Vertex AI API enabled
  • Google Cloud credentials configured (e.g., via gcloud auth application-default login)
  • Dependencies installed: pip install -r requirements.txt (Ensure Pillow is included for image processing)

Usage (CLI)

Run the script from the project root directory.

python cli.py --input-dir <path/to/input> --output-dir <path/to/output> [options]

Arguments:

  • --input-dir: Path to the directory containing input images (default: input).
  • --output-dir: Path to the base directory for processed images and logs (default: output). The script will maintain the subdirectory structure from the input directory.
  • --lang: Target language code for filename/alt text (e.g., 'en', 'sl', 'de') (default: en).
  • --format: Optional output image format ('jpg', 'png', 'webp', 'avif'). If omitted, the original format is kept.
  • --max-width: Optional maximum width in pixels for output images. Aspect ratio is preserved. If omitted, the original size is kept.

Examples:

  1. Basic usage (English, keep original format/size):

    python cli.py --input-dir path/to/your/images --output-dir processed/images
    
  2. Process images, translate to German, resize to 800px max width:

    python cli.py --input-dir images_raw --output-dir images_processed --lang de --max-width 800
    
  3. Process images, convert to WEBP format:

    python cli.py --input-dir photos --output-dir web_ready --format webp
    
  4. Process specific subfolder, convert to AVIF (see Known Issues), max width 900px:

    python cli.py --input-dir input/specific_folder --output-dir output --format avif --max-width 900
    

Known Issues

  • AVIF Conversion: There is a known issue when using the --format avif option with the CLI tool (cli.py). The underlying Pillow library might raise an error (Error processing image: 'AVIF') during the save operation, causing images to be skipped. This might be related to specific image modes (e.g., RGBA) or Pillow's AVIF encoder capabilities.
    • Troubleshooting (macOS): AVIF support in Pillow often depends on the libavif system library. If you encounter errors with AVIF:
      1. Install the library using Homebrew: brew install libavif
      2. Reinstall Pillow from source within your virtual environment to ensure it detects libavif: pip install --force-reinstall --no-cache-dir --no-binary Pillow Pillow
    • Using other formats like JPG, PNG, or WEBP is recommended if AVIF conversion fails or the troubleshooting steps are not feasible.

License

MIT

Practical Examples

Example 1: Process a single project folder

# Process images from a specific project, resize to max 1920px width, convert to WebP
python cli.py \
  --input-dir input/laneks/projekt2 \
  --output-dir output/laneks/projekt2 \
  --lang en \
  --format webp \
  --max-width 1920 \
  --log-mode per_folder

Example 2: Process all projects for a client with project-level logs

# Process all projects for the 'laneks' client, create one log per project
python cli.py \
  --input-dir input/laneks \
  --output-dir output/laneks \
  --lang sl \
  --format webp \
  --max-width 1920 \
  --log-mode project_level

Example 3: Batch process multiple clients with central logging

# Process everything with a single centralized log file
python cli.py \
  --input-dir input \
  --output-dir output \
  --lang en \
  --format avif \
  --max-width 1600 \
  --log-mode central

Example 4: Keep original format but resize

# Just resize images without changing format
python cli.py \
  --input-dir input/large-images \
  --output-dir output/resized \
  --max-width 800 \
  --log-mode per_folder

Example 5: Flatten deeply nested structure

# Process deeply nested folders but output everything to a flat structure
python cli.py \
  --input-dir input/complex-nested-structure \
  --output-dir output/flattened \
  --lang en \
  --format webp \
  --max-width 1600 \
  --log-mode flat

Common Use Cases

Photography Studios

  • Input: Client folders with project subfolders
  • Settings: --log-mode project_level --format webp --max-width 2048
  • Result: Each project gets its own log, images optimized for web

E-commerce

  • Input: Product category folders
  • Settings: --log-mode central --format webp --max-width 1200
  • Result: All products processed with central tracking

Web Development

  • Input: Mixed folder structures
  • Settings: --format avif --max-width 1920 --log-mode per_folder
  • Result: Modern format with excellent compression, detailed logs

Digital Asset Management

  • Input: Complex nested folder structures from various sources
  • Settings: --log-mode flat --format webp --max-width 1600
  • Result: All assets in one flat directory with descriptive names, single tracking log
# Simple renaming in English
python cli.py --input-dir input/photos --output-dir output/renamed --lang en

# German language with WebP conversion and resizing
python cli.py --input-dir input/photos --output-dir output/optimized \
  --lang de --format webp --max-width 1024

# Project-level logging for organized results
python cli.py --input-dir input/company-photos --output-dir output/processed \
  --lang en --log-mode project_level

๐Ÿ“– Command Line Options

Option Description Default
--input-dir Directory containing input images input
--output-dir Base directory for processed images output
--lang Target language code (en, de, sl, fr, etc.) en
--format Output format (jpg, png, webp, avif) Original
--max-width Maximum width in pixels Original
--log-mode Logging mode (central, project_level, per_folder, flat) per_folder
--max-retries Maximum retry attempts for API calls 5

๐Ÿ“Š Logging Modes

per_folder (Default)

Creates results.json and results.csv in each output subdirectory.

project_level

Creates one log file per top-level project folder.

central

Single log file in the main output directory.

flat

Flattens directory structure with central logging.

๐Ÿ”„ Resume Functionality

The tool automatically resumes interrupted processing:

  1. Scans existing logs: Checks all results.json files in output directory
  2. Identifies processed files: Uses original_filename field for tracking
  3. Skips completed work: Only processes new or failed images
  4. Handles rate limits: Exponential backoff with up to 5 retry attempts

Example resume scenario:

# First run - processes 20 files, hits rate limit
python cli.py --input-dir photos --output-dir output --lang de

# Resume run - skips 20 completed files, continues with remaining
python cli.py --input-dir photos --output-dir output --lang de

๐Ÿ› ๏ธ Advanced Configuration

Retry Logic

  • Base delay: 10 seconds, doubles with each retry
  • Rate limit delay: Additional 60 seconds for quota errors
  • Maximum delay: Capped at 5 minutes
  • Smart detection: Recognizes various rate limiting error messages

Image Processing

  • Supported formats: JPG, JPEG, PNG, WebP
  • Output formats: JPG, PNG, WebP, AVIF
  • Resizing: Maintains aspect ratio when using --max-width
  • Quality: WebP output at 90% quality

๐Ÿ“ Project Structure

image-filename-ai/
โ”œโ”€โ”€ cli.py                    # Main application
โ”œโ”€โ”€ app/
โ”‚   โ””โ”€โ”€ utils/
โ”‚       โ”œโ”€โ”€ ai_handler.py     # Gemini AI integration
โ”‚       โ”œโ”€โ”€ file_utils.py     # File operations and logging
โ”‚       โ””โ”€โ”€ image_processor.py # Image processing and conversion
โ”œโ”€โ”€ input/                    # Your source images
โ””โ”€โ”€ output/                   # Generated results
    โ”œโ”€โ”€ project1/
    โ”‚   โ”œโ”€โ”€ results.json      # Processing log
    โ”‚   โ”œโ”€โ”€ results.csv       # CSV export
    โ”‚   โ””โ”€โ”€ *.webp           # Renamed images
    โ””โ”€โ”€ project2/
        โ””โ”€โ”€ ...

๐Ÿ“ˆ Example Output

Generated Filenames

  • IMG_1234.jpg โ†’ sunset-mountain-landscape-golden-hour.webp
  • photo.png โ†’ office-desk-computer-workspace-clean.webp
  • image.jpg โ†’ family-portrait-garden-summer-happy.webp

Log Entry

{
  "timestamp": "2025-05-25 09:21:31",
  "original_path": "input/photos/IMG_1234.jpg",
  "new_path": "output/photos/sunset-mountain-landscape.webp",
  "original_filename": "IMG_1234.jpg",
  "new_filename": "sunset-mountain-landscape.webp",
  "alt_text": "A beautiful sunset over mountain peaks with golden light illuminating the landscape."
}

๐ŸŒ Language Support

The tool supports any language supported by Gemini AI. Common examples:

  • --lang en - English
  • --lang de - German (Deutsch)
  • --lang sl - Slovenian
  • --lang fr - French
  • --lang es - Spanish
  • --lang it - Italian
  • --lang pt - Portuguese

๐Ÿ”ง Development & Testing

Running Tests

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=app --cov=cli

# Test specific module
pytest tests/test_cli.py -v

Code Quality

# Format code
black .

# Lint code
ruff check .

# Fix linting issues
ruff check . --fix

Development Setup

# Install development dependencies (included in requirements.txt)
pip install -r requirements.txt

# Run API in development mode with auto-reload
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Architecture

CLI Mode: Direct local processing using Gemini API

  • Input: Local image directories
  • Output: Processed images with generated names
  • Use case: Batch processing, one-time organization

API Mode: Web service for on-demand processing

  • Input: GCS bucket URLs or direct uploads
  • Output: Background job processing with status tracking
  • Use case: Integration with other systems, web applications

๐Ÿ“‹ Production TODO

  • Add API authentication (API keys, JWT, OAuth)
  • Add rate limiting per client/endpoint
  • Add input validation and sanitization
  • Add comprehensive logging and monitoring
  • Add image virus scanning before processing
  • Add batch processing for large image sets
  • Add webhook notifications for job completion
  • Add cost monitoring for Vertex AI usage
  • Package CLI as standalone executable (PyInstaller)
  • Add retry logic for failed AI requests
  • Add progress bars for CLI processing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

image_filename_ai-0.1.0.tar.gz (45.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

image_filename_ai-0.1.0-py3-none-any.whl (33.8 kB view details)

Uploaded Python 3

File details

Details for the file image_filename_ai-0.1.0.tar.gz.

File metadata

  • Download URL: image_filename_ai-0.1.0.tar.gz
  • Upload date:
  • Size: 45.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for image_filename_ai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9a9bc6c8fc4177107da5234382c323dc1926c140c1134a7a95410d5554846c9d
MD5 5ef9367e05992ee5a8c169afe5625df6
BLAKE2b-256 6bd13e86c4fc258616090391aa26a68f2c2e394ba76bf752bf55b840b3327897

See more details on using hashes here.

Provenance

The following attestation bundles were made for image_filename_ai-0.1.0.tar.gz:

Publisher: pypi-publish.yml on matija2209/image-filename-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file image_filename_ai-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for image_filename_ai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a0ae8a3617045e9ecfc87ec17e3a4a9009d687cbf3b00a450631228f78249194
MD5 56a901a202ddb6ad6569401e6bb07eb7
BLAKE2b-256 05cde52ae0e226d794a4f6a059dbe1a4803a83620306aebec28ecbf352e97c4c

See more details on using hashes here.

Provenance

The following attestation bundles were made for image_filename_ai-0.1.0-py3-none-any.whl:

Publisher: pypi-publish.yml on matija2209/image-filename-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page