Skip to main content

Fast image processing CLI with AI-powered tagging and embeddings

Project description

Photon

Photon

AI-powered image processing pipeline written in Rust
Analyze, embed, and tag images locally using SigLIP — no cloud required.

Quick Start  •  Usage  •  How It Works  •  Configuration  •  Library Usage

License Rust


Photon takes images as input and outputs structured JSON: 768-dim vector embeddings, semantic tags, EXIF metadata, content hashes, and thumbnails. It's a pure processing pipeline — no database, no server, no cloud dependency. Process locally, store wherever you want.

image.jpg ──▶ Photon ──▶ { embedding, tags, metadata, hash, thumbnail }

Features

  • SigLIP Embeddings — 768-dimensional vectors for semantic similarity search, powered by ONNX Runtime
  • Zero-Shot Tagging — 68,000+ term vocabulary (WordNet + curated visual terms) scored locally via SigLIP
  • EXIF Extraction — Camera, GPS coordinates, datetime, ISO, aperture, focal length
  • Content Hashing — BLAKE3 cryptographic hash + perceptual hash for deduplication and similarity
  • Thumbnails — WebP generation with configurable size and quality
  • LLM Descriptions — BYOK enrichment via Ollama, Anthropic, OpenAI, Hyperbolic
  • Batch Processing — Parallel workers with progress bar and skip-existing support
  • Single Binary — No Python, no Docker, no runtime dependencies

Quick Start

# Build from source
git clone https://github.com/hejijunhao/photon.git
cd photon
cargo build --release

# Download the SigLIP model (~350 MB, one-time)
cargo run --release -- models download

# Process a single image
cargo run --release -- process photo.jpg

# Process an entire directory
cargo run --release -- process ./photos/ --format jsonl --output results.jsonl

Usage

Process Images

# Single image → JSON to stdout
photon process image.jpg

# Directory → JSONL file (one JSON object per line)
photon process ./photos/ --format jsonl --output results.jsonl

# Parallel processing with 8 workers
photon process ./photos/ --parallel 8 --output results.jsonl

# Skip already-processed images on re-runs
photon process ./photos/ --output results.jsonl --skip-existing

# Higher quality embeddings (384px model, slower but more detailed)
photon process image.jpg --quality high

LLM Descriptions (BYOK)

# Local via Ollama
photon process image.jpg --llm ollama --llm-model llama3.2-vision

# Anthropic API
photon process image.jpg --llm anthropic --llm-model claude-sonnet-4-5-20250929

# OpenAI API
photon process image.jpg --llm openai --llm-model gpt-4o-mini

# Batch with LLM enrichment
photon process ./photos/ --format jsonl --output results.jsonl --llm anthropic

Control What Gets Generated

# Metadata and hashes only (no AI)
photon process image.jpg --no-embedding --no-tagging

# Skip thumbnail generation
photon process image.jpg --no-thumbnail

# Custom thumbnail size
photon process image.jpg --thumbnail-size 128

Manage Models

photon models download    # Download SigLIP models from HuggingFace
photon models list        # Show installed models and status
photon models path        # Show model storage directory

Configuration

photon config init        # Create config file with defaults
photon config show        # Display current settings
photon config path        # Show config file location

How It Works

Photon runs a sequential pipeline where each stage is independent and optional:

 Input        ┌──────────┐ ┌──────┐ ┌──────┐ ┌───────────┐ ┌───────┐ ┌───────┐
 image.jpg ──▶│ Validate │▶│Decode│▶│ EXIF │▶│   Hash    │▶│Thumb- │▶│ Embed │──▶ ...
              │          │ │      │ │      │ │BLAKE3+pHash│ │ nail  │ │SigLIP │
              └──────────┘ └──────┘ └──────┘ └───────────┘ └───────┘ └───────┘

 ... ──▶ ┌──────────┐ ┌─────────────┐        Output
         │Zero-Shot │▶│  LLM Enrich │──▶  Structured JSON
         │  Tags    │ │  (BYOK)     │     { embedding, tags,
         │ (SigLIP) │ │             │       metadata, hash, ... }
         └──────────┘ └─────────────┘
Stage What it does Speed
Validate Check file exists, size limits, format detection via magic bytes <1ms
Decode Load image pixels (JPEG, PNG, WebP, GIF, TIFF, BMP, AVIF) ~5ms
EXIF Extract camera, GPS, datetime, shooting parameters ~2ms
Hash BLAKE3 content hash (dedup) + perceptual hash (similarity) ~3ms
Thumbnail Aspect-preserving resize to WebP, base64 encoded ~5ms
Embed SigLIP vision encoder → 768-dim L2-normalized vector ~200ms
Tag Dot product against 68K vocabulary, SigLIP sigmoid scoring ~2ms

Output Format

Each processed image produces a JSON object:

{
  "file_path": "/photos/beach.jpg",
  "file_name": "beach.jpg",
  "content_hash": "a7f3b2c1d4e5...",
  "width": 4032,
  "height": 3024,
  "format": "jpeg",
  "file_size": 2458624,
  "embedding": [0.023, -0.156, 0.089, "... 768 floats"],
  "tags": [
    { "name": "beach", "confidence": 0.94, "category": "scene" },
    { "name": "ocean", "confidence": 0.87, "category": "scene" },
    { "name": "tropical", "confidence": 0.76, "category": "style" }
  ],
  "exif": {
    "captured_at": "2024-07-15T14:32:00",
    "camera_model": "iPhone 15 Pro",
    "gps_latitude": 25.7617,
    "gps_longitude": -80.1918
  },
  "thumbnail": "base64-encoded-webp...",
  "perceptual_hash": "d4c3b2a1..."
}

Use --format jsonl for batch processing — one JSON object per line, streamed as each image completes.

Configuration

Photon uses a layered configuration system: code defaults < config file < CLI flags.

photon config init    # Creates ~/.photon/config.toml (or platform-appropriate path)

Key settings in config.toml:

[processing]
parallel_workers = 4
supported_formats = ["jpg", "jpeg", "png", "webp", "heic", "raw", "cr2", "nef", "arw"]

[limits]
max_file_size_mb = 100
max_image_dimension = 10000
embed_timeout_ms = 30000

[embedding]
model = "siglip-base-patch16"         # or "siglip-base-patch16-384" for higher quality

[thumbnail]
enabled = true
size = 256

[tagging]
enabled = true
max_tags = 15

[logging]
level = "info"                        # error, warn, info, debug, trace

Library Usage

Photon's processing engine lives in the photon-core crate and can be embedded directly in Rust applications:

use photon_core::{Config, ImageProcessor};
use std::path::Path;

#[tokio::main]
async fn main() -> photon_core::Result<()> {
    let config = Config::load()?;
    let mut processor = ImageProcessor::new(&config);

    // Load AI components (optional — pipeline works without them)
    processor.load_embedding(&config)?;
    processor.load_tagging(&config)?;

    let result = processor.process(Path::new("photo.jpg")).await?;

    println!("Hash:      {}", result.content_hash);
    println!("Embedding: {} dimensions", result.embedding.len());
    println!("Tags:      {:?}", result.tags.iter().map(|t| &t.name).collect::<Vec<_>>());

    Ok(())
}

Add to your Cargo.toml:

[dependencies]
photon-core = { git = "https://github.com/hejijunhao/photon.git" }
tokio = { version = "1", features = ["full"] }

Integrating with Your Backend

Photon is designed to feed into your own storage and search infrastructure. Pipe the output to your ingestion scripts:

# Stream results into your backend
photon process ./photos/ --format jsonl | your-ingestion-script

# Or process to file, then ingest
photon process ./photos/ --format jsonl --output results.jsonl
python ingest.py results.jsonl

Example — storing embeddings in PostgreSQL with pgvector:

import subprocess, json

result = subprocess.run(
    ["photon", "process", "photo.jpg"],
    capture_output=True, text=True
)
data = json.loads(result.stdout)

db.execute(
    "INSERT INTO images (path, hash, embedding, tags) VALUES (%s, %s, %s, %s)",
    [data["file_path"], data["content_hash"], data["embedding"], json.dumps(data["tags"])]
)

Architecture

photon/
├── crates/
│   ├── photon/              # CLI binary (thin clap wrapper)
│   └── photon-core/         # Embeddable library
│       └── src/
│           ├── pipeline/    # Processing stages (decode, metadata, hash, thumbnail)
│           ├── embedding/   # SigLIP vision encoder (ONNX Runtime)
│           ├── tagging/     # Zero-shot classification (68K vocabulary)
│           └── output.rs    # JSON/JSONL serialization
├── data/vocabulary/         # WordNet nouns + supplemental visual terms
├── tests/fixtures/          # Test images
└── docs/                    # Phase plans and changelogs

Two-crate design: photon-core contains all processing logic and can be used as a library. photon is a thin CLI that calls into it. This means you can embed Photon's pipeline directly in your Rust application without pulling in CLI dependencies.

Project Status

Phase Status
Foundation (CLI, config, logging) Complete
Image pipeline (decode, EXIF, hashing, thumbnails) Complete
SigLIP embedding (768-dim vectors via ONNX) Complete
Zero-shot tagging (68K vocabulary, self-organizing pools) Complete
LLM enrichment (BYOK descriptions) Complete
Polish & release (progress bar, skip-existing, benchmarks) Complete

Requirements

  • Rust 2021 edition (stable)
  • ~350 MB disk for SigLIP model (downloaded on first models download)
  • Tested on macOS (Apple Silicon) and Linux (aarch64/x86_64)

Contributing

Contributions are welcome. Please open an issue to discuss significant changes before submitting a PR.

cargo test              # Run all tests (120+ across workspace)
cargo clippy            # Lint
cargo fmt               # Format
cargo bench -p photon-core  # Run benchmarks

License

Dual-licensed under MIT or Apache 2.0, at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

photon_imager-0.7.10-py3-none-manylinux_2_39_x86_64.whl (15.8 MB view details)

Uploaded Python 3manylinux: glibc 2.39+ x86-64

photon_imager-0.7.10-py3-none-manylinux_2_39_aarch64.whl (14.9 MB view details)

Uploaded Python 3manylinux: glibc 2.39+ ARM64

photon_imager-0.7.10-py3-none-macosx_11_0_arm64.whl (12.8 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

File details

Details for the file photon_imager-0.7.10-py3-none-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for photon_imager-0.7.10-py3-none-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 20488b396f47af31cc3d1130968e234dd20e80b0479fc7b012c38795a7d0e7a5
MD5 757c9af041729851ee9fb82f641de1f5
BLAKE2b-256 963f2c86c6888f00c490f26b0c08c9f2fd452d09d5d847e820e00fbde7effb25

See more details on using hashes here.

Provenance

The following attestation bundles were made for photon_imager-0.7.10-py3-none-manylinux_2_39_x86_64.whl:

Publisher: pypi.yml on hejijunhao/photon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file photon_imager-0.7.10-py3-none-manylinux_2_39_aarch64.whl.

File metadata

File hashes

Hashes for photon_imager-0.7.10-py3-none-manylinux_2_39_aarch64.whl
Algorithm Hash digest
SHA256 2330e6c52c9ace49691834cee494a51a5533aa2628618a9bb12399410bf4e4ef
MD5 9a993029d13fd6402624cdd3d004ca33
BLAKE2b-256 a7730a64194d7c68780e0e9df346de0138a535420ba76b861aeb5530debec4de

See more details on using hashes here.

Provenance

The following attestation bundles were made for photon_imager-0.7.10-py3-none-manylinux_2_39_aarch64.whl:

Publisher: pypi.yml on hejijunhao/photon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file photon_imager-0.7.10-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for photon_imager-0.7.10-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e6efb7943a19d6410f02349ba173013e9e71ea334f1f8db67652dfb548b72e80
MD5 38f99a65e0d8ccda1de05a8e234bc5f0
BLAKE2b-256 8a5446caf04f1c658ff7a3ed6a8dacc2ff24f925adc1372acd9a4806c3c1be87

See more details on using hashes here.

Provenance

The following attestation bundles were made for photon_imager-0.7.10-py3-none-macosx_11_0_arm64.whl:

Publisher: pypi.yml on hejijunhao/photon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page