Skip to main content

CLIP-powered multimodal image search engine with web interface and CLI tools

Project description

Folder Vision - CLIP Image Search ๐Ÿ”

A powerful multimodal image search engine built with OpenAI's CLIP (Contrastive Language-Image Pre-training) model. Search through your image collections using natural language descriptions or find similar images using other images as queries.

Features โœจ

  • ๐Ÿ” Text-to-Image Search: Find images using natural language descriptions
  • ๐Ÿ–ผ๏ธ Image-to-Image Search: Find similar images using another image as a query
  • ๐ŸŒ Web Interface: Beautiful, intuitive web UI for easy searching
  • โšก CLI Interface: Command-line tools for batch processing and automation
  • ๐Ÿ’พ Smart Caching: Automatic embedding caching for fast subsequent searches
  • ๐Ÿš€ FastAPI Backend: Modern, high-performance web API
  • ๐Ÿ“ฑ Responsive Design: Works on desktop, tablet, and mobile devices

Quick Start ๐Ÿš€

Installation

# Clone or download the repository
cd folder-vision

# Install dependencies
pip install -e .

Web Interface

Start the web server:

fv serve --port 8000

Then open your browser to http://localhost:8000 and enjoy the visual interface!

Command Line Usage

Index your images:

fv index /path/to/your/images

Search with text:

fv search-text "a red car in the city"

Search with an image:

fv search-image /path/to/query_image.jpg

How It Works ๐Ÿง 

Folder Vision uses OpenAI's CLIP model to understand both images and text in the same semantic space. This allows for:

  1. Image Indexing: Convert all your images into high-dimensional vector embeddings
  2. Text Understanding: Convert your search queries into comparable vectors
  3. Similarity Matching: Find the most similar images using cosine similarity
  4. Fast Retrieval: Use cached embeddings for instant search results

Web Interface Features ๐ŸŒ

The web interface provides:

  • ๐Ÿ“ Folder Indexing: Point to any folder and index all images automatically
  • ๐Ÿ”ค Text Search: Type natural language descriptions to find matching images
  • ๐Ÿ–ผ๏ธ Visual Search: Upload an image to find similar ones in your collection
  • ๐Ÿ“Š Statistics: View indexing statistics and model information
  • ๐ŸŽจ Visual Results: See thumbnail previews with similarity scores
  • ๐Ÿ“ฑ Responsive Design: Works perfectly on all devices

CLI Commands ๐Ÿ’ป

Serve Web Interface

# Start web server (default: http://0.0.0.0:8000)
fv serve

# Custom host and port
fv serve --host localhost --port 3000

# Development mode with auto-reload
fv serve --reload

Index Images

# Index all images in a folder
fv index /path/to/images

# Index without saving cache
fv index /path/to/images --no-cache

Search Commands

# Text search (natural language)
fv search-text "sunset over mountains"
fv search-text "a cat sleeping on a couch" --top-k 5

# Image search (visual similarity)
fv search-image /path/to/query.jpg
fv search-image query.png --top-k 20

# JSON output for scripting
fv search-text "dogs playing" --format json

Statistics

# View search engine statistics
fv stats

API Endpoints ๐Ÿ”Œ

The FastAPI backend provides these endpoints:

  • GET / - Web interface
  • POST /index - Index a folder
  • GET /search/text - Search by text query
  • POST /search/image - Search by image upload
  • GET /image/{path} - Serve image files
  • GET /stats - Get statistics
  • GET /health - Health check

Supported Image Formats ๐Ÿ“ธ

  • JPEG (.jpg, .jpeg)
  • PNG (.png)
  • BMP (.bmp)
  • GIF (.gif)
  • TIFF (.tiff)
  • WebP (.webp)

Performance & Optimization โšก

  • GPU Support: Automatically uses GPU if available (CUDA)
  • Batch Processing: Efficient batch encoding of images
  • Smart Caching: Embeddings are cached to disk for instant reloading
  • Memory Management: Processes large collections without memory issues
  • Concurrent Processing: Handles multiple search requests simultaneously

Example Use Cases ๐Ÿ’ก

Personal Photo Management

# Index your photo library
fv index ~/Pictures

# Find vacation photos
fv search-text "beach vacation sunset"

# Find similar photos to a favorite shot
fv search-image ~/Pictures/favorite_sunset.jpg

Digital Asset Management

# Index product images
fv index /company/product_photos

# Find specific product types
fv search-text "red athletic shoes"
fv search-text "office furniture desk"

Creative Workflows

# Index design assets
fv index /projects/design_assets

# Find inspiration
fv search-text "minimalist logo design"
fv search-text "modern interior architecture"

System Requirements ๐Ÿ–ฅ๏ธ

  • Python: 3.9 or higher
  • Memory: 4GB RAM minimum, 8GB+ recommended for large collections
  • Storage: Additional space for embedding cache files
  • GPU: Optional but recommended for faster processing (CUDA-compatible)

Model Information ๐Ÿค–

  • Base Model: OpenAI CLIP ViT-B/32
  • Embedding Dimension: 512
  • Input Resolution: 224x224 pixels
  • Vocabulary: 49,408 tokens

Advanced Configuration โš™๏ธ

Environment Variables

# Set custom cache directory
export CLIP_CACHE_DIR=/path/to/cache

# Disable GPU usage
export CUDA_VISIBLE_DEVICES=""

Custom Model

You can use different CLIP models by modifying the code:

# In clip_search.py
search_engine = CLIPImageSearch(model_name="openai/clip-vit-large-patch14")

Troubleshooting ๐Ÿ”ง

Common Issues

"No images indexed" error:

  • Make sure to run fv index <folder_path> first
  • Check that the folder contains supported image formats

Slow indexing:

  • Enable GPU acceleration if available
  • Process smaller batches of images
  • Use SSD storage for better I/O performance

Memory errors:

  • Reduce batch size in the code
  • Process images in smaller folders
  • Increase system RAM

Web interface not loading:

  • Check if port is already in use
  • Try a different port: fv serve --port 8080
  • Check firewall settings

Development ๐Ÿ‘ฉโ€๐Ÿ’ป

Project Structure

folder-vision/
โ”œโ”€โ”€ folder_vision/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ app.py           # FastAPI web application
โ”‚   โ”œโ”€โ”€ cli.py           # Command-line interface
โ”‚   โ””โ”€โ”€ clip_search.py   # CLIP search engine
โ”œโ”€โ”€ requirements.txt     # Python dependencies
โ”œโ”€โ”€ pyproject.toml      # Project configuration
โ””โ”€โ”€ README.md           # This file

Running Tests

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

License ๐Ÿ“„

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments ๐Ÿ™

  • OpenAI for the incredible CLIP model
  • Hugging Face for the transformers library
  • FastAPI for the excellent web framework
  • PyTorch for the deep learning foundation

Support ๐Ÿ’ฌ

If you encounter any issues or have questions:

  1. Check the troubleshooting section above
  2. Search existing issues on GitHub
  3. Create a new issue with detailed information
  4. Include system information and error messages

Made with โค๏ธ by the Folder Vision Team

Start exploring your images in a whole new way! ๐Ÿš€

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

folder_vision-1.0.0.tar.gz (69.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

folder_vision-1.0.0-py3-none-any.whl (70.4 kB view details)

Uploaded Python 3

File details

Details for the file folder_vision-1.0.0.tar.gz.

File metadata

  • Download URL: folder_vision-1.0.0.tar.gz
  • Upload date:
  • Size: 69.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for folder_vision-1.0.0.tar.gz
Algorithm Hash digest
SHA256 de91188956f3684fca31100e85e9f2eae51123cbf5aa5cff0a18126ed6496ea4
MD5 4dc9be2fa7757fdf1cf0b34a17c86bf9
BLAKE2b-256 a71c7557c4432956565d657eb96dec011e4dbcf6a547b2e218f10aa6d39728c8

See more details on using hashes here.

File details

Details for the file folder_vision-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: folder_vision-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 70.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for folder_vision-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 61a59102bfa3095042734d4a627939b1c4a2775246113f34b4056d0228d70e73
MD5 348727b3f8476ceb5c82954961d29acf
BLAKE2b-256 a32ff1fbdef6e20360944b433552bb63b4d1e616f9d6022b4a13889157ab29f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page