Fast image processing CLI with AI-powered tagging and embeddings
Project description
Photon
AI-powered image processing pipeline written in Rust
Analyze, embed, and tag images locally using SigLIP — no cloud required.
Quick Start • Usage • How It Works • Configuration • Library Usage
Photon takes images as input and outputs structured JSON: 768-dim vector embeddings, semantic tags, EXIF metadata, content hashes, and thumbnails. It's a pure processing pipeline — no database, no server, no cloud dependency. Process locally, store wherever you want.
image.jpg ──▶ Photon ──▶ { embedding, tags, metadata, hash, thumbnail }
Features
- SigLIP Embeddings — 768-dimensional vectors for semantic similarity search, powered by ONNX Runtime
- Zero-Shot Tagging — 68,000+ term vocabulary (WordNet + curated visual terms) scored locally via SigLIP
- EXIF Extraction — Camera, GPS coordinates, datetime, ISO, aperture, focal length
- Content Hashing — BLAKE3 cryptographic hash + perceptual hash for deduplication and similarity
- Thumbnails — WebP generation with configurable size and quality
- LLM Descriptions — BYOK enrichment via Ollama, Anthropic, OpenAI, Hyperbolic
- Batch Processing — Parallel workers with progress bar and skip-existing support
- Single Binary — No Python, no Docker, no runtime dependencies
Quick Start
# Build from source
git clone https://github.com/hejijunhao/photon.git
cd photon
cargo build --release
# Download the SigLIP model (~350 MB, one-time)
cargo run --release -- models download
# Process a single image
cargo run --release -- process photo.jpg
# Process an entire directory
cargo run --release -- process ./photos/ --format jsonl --output results.jsonl
Usage
Process Images
# Single image → JSON to stdout
photon process image.jpg
# Directory → JSONL file (one JSON object per line)
photon process ./photos/ --format jsonl --output results.jsonl
# Parallel processing with 8 workers
photon process ./photos/ --parallel 8 --output results.jsonl
# Skip already-processed images on re-runs
photon process ./photos/ --output results.jsonl --skip-existing
# Higher quality embeddings (384px model, slower but more detailed)
photon process image.jpg --quality high
LLM Descriptions (BYOK)
# Local via Ollama
photon process image.jpg --llm ollama --llm-model llama3.2-vision
# Anthropic API
photon process image.jpg --llm anthropic --llm-model claude-sonnet-4-5-20250929
# OpenAI API
photon process image.jpg --llm openai --llm-model gpt-4o-mini
# Batch with LLM enrichment
photon process ./photos/ --format jsonl --output results.jsonl --llm anthropic
Control What Gets Generated
# Metadata and hashes only (no AI)
photon process image.jpg --no-embedding --no-tagging
# Skip thumbnail generation
photon process image.jpg --no-thumbnail
# Custom thumbnail size
photon process image.jpg --thumbnail-size 128
Manage Models
photon models download # Download SigLIP models from HuggingFace
photon models list # Show installed models and status
photon models path # Show model storage directory
Configuration
photon config init # Create config file with defaults
photon config show # Display current settings
photon config path # Show config file location
How It Works
Photon runs a sequential pipeline where each stage is independent and optional:
Input ┌──────────┐ ┌──────┐ ┌──────┐ ┌───────────┐ ┌───────┐ ┌───────┐
image.jpg ──▶│ Validate │▶│Decode│▶│ EXIF │▶│ Hash │▶│Thumb- │▶│ Embed │──▶ ...
│ │ │ │ │ │ │BLAKE3+pHash│ │ nail │ │SigLIP │
└──────────┘ └──────┘ └──────┘ └───────────┘ └───────┘ └───────┘
... ──▶ ┌──────────┐ ┌─────────────┐ Output
│Zero-Shot │▶│ LLM Enrich │──▶ Structured JSON
│ Tags │ │ (BYOK) │ { embedding, tags,
│ (SigLIP) │ │ │ metadata, hash, ... }
└──────────┘ └─────────────┘
| Stage | What it does | Speed |
|---|---|---|
| Validate | Check file exists, size limits, format detection via magic bytes | <1ms |
| Decode | Load image pixels (JPEG, PNG, WebP, GIF, TIFF, BMP, AVIF) | ~5ms |
| EXIF | Extract camera, GPS, datetime, shooting parameters | ~2ms |
| Hash | BLAKE3 content hash (dedup) + perceptual hash (similarity) | ~3ms |
| Thumbnail | Aspect-preserving resize to WebP, base64 encoded | ~5ms |
| Embed | SigLIP vision encoder → 768-dim L2-normalized vector | ~200ms |
| Tag | Dot product against 68K vocabulary, SigLIP sigmoid scoring | ~2ms |
Output Format
Each processed image produces a JSON object:
{
"file_path": "/photos/beach.jpg",
"file_name": "beach.jpg",
"content_hash": "a7f3b2c1d4e5...",
"width": 4032,
"height": 3024,
"format": "jpeg",
"file_size": 2458624,
"embedding": [0.023, -0.156, 0.089, "... 768 floats"],
"tags": [
{ "name": "beach", "confidence": 0.94, "category": "scene" },
{ "name": "ocean", "confidence": 0.87, "category": "scene" },
{ "name": "tropical", "confidence": 0.76, "category": "style" }
],
"exif": {
"captured_at": "2024-07-15T14:32:00",
"camera_model": "iPhone 15 Pro",
"gps_latitude": 25.7617,
"gps_longitude": -80.1918
},
"thumbnail": "base64-encoded-webp...",
"perceptual_hash": "d4c3b2a1..."
}
Use --format jsonl for batch processing — one JSON object per line, streamed as each image completes.
Configuration
Photon uses a layered configuration system: code defaults < config file < CLI flags.
photon config init # Creates ~/.photon/config.toml (or platform-appropriate path)
Key settings in config.toml:
[processing]
parallel_workers = 4
supported_formats = ["jpg", "jpeg", "png", "webp", "heic", "raw", "cr2", "nef", "arw"]
[limits]
max_file_size_mb = 100
max_image_dimension = 10000
embed_timeout_ms = 30000
[embedding]
model = "siglip-base-patch16" # or "siglip-base-patch16-384" for higher quality
[thumbnail]
enabled = true
size = 256
[tagging]
enabled = true
max_tags = 15
[logging]
level = "info" # error, warn, info, debug, trace
Library Usage
Photon's processing engine lives in the photon-core crate and can be embedded directly in Rust applications:
use photon_core::{Config, ImageProcessor};
use std::path::Path;
#[tokio::main]
async fn main() -> photon_core::Result<()> {
let config = Config::load()?;
let mut processor = ImageProcessor::new(&config);
// Load AI components (optional — pipeline works without them)
processor.load_embedding(&config)?;
processor.load_tagging(&config)?;
let result = processor.process(Path::new("photo.jpg")).await?;
println!("Hash: {}", result.content_hash);
println!("Embedding: {} dimensions", result.embedding.len());
println!("Tags: {:?}", result.tags.iter().map(|t| &t.name).collect::<Vec<_>>());
Ok(())
}
Add to your Cargo.toml:
[dependencies]
photon-core = { git = "https://github.com/hejijunhao/photon.git" }
tokio = { version = "1", features = ["full"] }
Integrating with Your Backend
Photon is designed to feed into your own storage and search infrastructure. Pipe the output to your ingestion scripts:
# Stream results into your backend
photon process ./photos/ --format jsonl | your-ingestion-script
# Or process to file, then ingest
photon process ./photos/ --format jsonl --output results.jsonl
python ingest.py results.jsonl
Example — storing embeddings in PostgreSQL with pgvector:
import subprocess, json
result = subprocess.run(
["photon", "process", "photo.jpg"],
capture_output=True, text=True
)
data = json.loads(result.stdout)
db.execute(
"INSERT INTO images (path, hash, embedding, tags) VALUES (%s, %s, %s, %s)",
[data["file_path"], data["content_hash"], data["embedding"], json.dumps(data["tags"])]
)
Architecture
photon/
├── crates/
│ ├── photon/ # CLI binary (thin clap wrapper)
│ └── photon-core/ # Embeddable library
│ └── src/
│ ├── pipeline/ # Processing stages (decode, metadata, hash, thumbnail)
│ ├── embedding/ # SigLIP vision encoder (ONNX Runtime)
│ ├── tagging/ # Zero-shot classification (68K vocabulary)
│ └── output.rs # JSON/JSONL serialization
├── data/vocabulary/ # WordNet nouns + supplemental visual terms
├── tests/fixtures/ # Test images
└── docs/ # Phase plans and changelogs
Two-crate design: photon-core contains all processing logic and can be used as a library. photon is a thin CLI that calls into it. This means you can embed Photon's pipeline directly in your Rust application without pulling in CLI dependencies.
Project Status
| Phase | Status |
|---|---|
| Foundation (CLI, config, logging) | Complete |
| Image pipeline (decode, EXIF, hashing, thumbnails) | Complete |
| SigLIP embedding (768-dim vectors via ONNX) | Complete |
| Zero-shot tagging (68K vocabulary, self-organizing pools) | Complete |
| LLM enrichment (BYOK descriptions) | Complete |
| Polish & release (progress bar, skip-existing, benchmarks) | Complete |
Requirements
- Rust 2021 edition (stable)
- ~350 MB disk for SigLIP model (downloaded on first
models download) - Tested on macOS (Apple Silicon) and Linux (aarch64/x86_64)
Contributing
Contributions are welcome. Please open an issue to discuss significant changes before submitting a PR.
cargo test # Run all tests (120+ across workspace)
cargo clippy # Lint
cargo fmt # Format
cargo bench -p photon-core # Run benchmarks
License
Dual-licensed under MIT or Apache 2.0, at your option.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file photon_imager-0.7.10-py3-none-manylinux_2_39_x86_64.whl.
File metadata
- Download URL: photon_imager-0.7.10-py3-none-manylinux_2_39_x86_64.whl
- Upload date:
- Size: 15.8 MB
- Tags: Python 3, manylinux: glibc 2.39+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20488b396f47af31cc3d1130968e234dd20e80b0479fc7b012c38795a7d0e7a5
|
|
| MD5 |
757c9af041729851ee9fb82f641de1f5
|
|
| BLAKE2b-256 |
963f2c86c6888f00c490f26b0c08c9f2fd452d09d5d847e820e00fbde7effb25
|
Provenance
The following attestation bundles were made for photon_imager-0.7.10-py3-none-manylinux_2_39_x86_64.whl:
Publisher:
pypi.yml on hejijunhao/photon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
photon_imager-0.7.10-py3-none-manylinux_2_39_x86_64.whl -
Subject digest:
20488b396f47af31cc3d1130968e234dd20e80b0479fc7b012c38795a7d0e7a5 - Sigstore transparency entry: 953477267
- Sigstore integration time:
-
Permalink:
hejijunhao/photon@db97617e024ec4a13a6d74bafa4fea9920229209 -
Branch / Tag:
refs/tags/v0.7.10 - Owner: https://github.com/hejijunhao
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@db97617e024ec4a13a6d74bafa4fea9920229209 -
Trigger Event:
push
-
Statement type:
File details
Details for the file photon_imager-0.7.10-py3-none-manylinux_2_39_aarch64.whl.
File metadata
- Download URL: photon_imager-0.7.10-py3-none-manylinux_2_39_aarch64.whl
- Upload date:
- Size: 14.9 MB
- Tags: Python 3, manylinux: glibc 2.39+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2330e6c52c9ace49691834cee494a51a5533aa2628618a9bb12399410bf4e4ef
|
|
| MD5 |
9a993029d13fd6402624cdd3d004ca33
|
|
| BLAKE2b-256 |
a7730a64194d7c68780e0e9df346de0138a535420ba76b861aeb5530debec4de
|
Provenance
The following attestation bundles were made for photon_imager-0.7.10-py3-none-manylinux_2_39_aarch64.whl:
Publisher:
pypi.yml on hejijunhao/photon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
photon_imager-0.7.10-py3-none-manylinux_2_39_aarch64.whl -
Subject digest:
2330e6c52c9ace49691834cee494a51a5533aa2628618a9bb12399410bf4e4ef - Sigstore transparency entry: 953477266
- Sigstore integration time:
-
Permalink:
hejijunhao/photon@db97617e024ec4a13a6d74bafa4fea9920229209 -
Branch / Tag:
refs/tags/v0.7.10 - Owner: https://github.com/hejijunhao
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@db97617e024ec4a13a6d74bafa4fea9920229209 -
Trigger Event:
push
-
Statement type:
File details
Details for the file photon_imager-0.7.10-py3-none-macosx_11_0_arm64.whl.
File metadata
- Download URL: photon_imager-0.7.10-py3-none-macosx_11_0_arm64.whl
- Upload date:
- Size: 12.8 MB
- Tags: Python 3, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6efb7943a19d6410f02349ba173013e9e71ea334f1f8db67652dfb548b72e80
|
|
| MD5 |
38f99a65e0d8ccda1de05a8e234bc5f0
|
|
| BLAKE2b-256 |
8a5446caf04f1c658ff7a3ed6a8dacc2ff24f925adc1372acd9a4806c3c1be87
|
Provenance
The following attestation bundles were made for photon_imager-0.7.10-py3-none-macosx_11_0_arm64.whl:
Publisher:
pypi.yml on hejijunhao/photon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
photon_imager-0.7.10-py3-none-macosx_11_0_arm64.whl -
Subject digest:
e6efb7943a19d6410f02349ba173013e9e71ea334f1f8db67652dfb548b72e80 - Sigstore transparency entry: 953477258
- Sigstore integration time:
-
Permalink:
hejijunhao/photon@db97617e024ec4a13a6d74bafa4fea9920229209 -
Branch / Tag:
refs/tags/v0.7.10 - Owner: https://github.com/hejijunhao
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@db97617e024ec4a13a6d74bafa4fea9920229209 -
Trigger Event:
push
-
Statement type: