CLIP-powered multimodal image search engine with web interface and CLI tools
Project description
Folder Vision - CLIP Image Search ๐
A powerful multimodal image search engine built with OpenAI's CLIP (Contrastive Language-Image Pre-training) model. Search through your image collections using natural language descriptions or find similar images using other images as queries.
Features โจ
- ๐ Text-to-Image Search: Find images using natural language descriptions
- ๐ผ๏ธ Image-to-Image Search: Find similar images using another image as a query
- ๐ Web Interface: Beautiful, intuitive web UI for easy searching
- โก CLI Interface: Command-line tools for batch processing and automation
- ๐พ Smart Caching: Automatic embedding caching for fast subsequent searches
- ๐ FastAPI Backend: Modern, high-performance web API
- ๐ฑ Responsive Design: Works on desktop, tablet, and mobile devices
Quick Start ๐
Installation
# Clone or download the repository
cd folder-vision
# Install dependencies
pip install -e .
Web Interface
Start the web server:
fv serve --port 8000
Then open your browser to http://localhost:8000 and enjoy the visual interface!
Command Line Usage
Index your images:
fv index /path/to/your/images
Search with text:
fv search-text "a red car in the city"
Search with an image:
fv search-image /path/to/query_image.jpg
How It Works ๐ง
Folder Vision uses OpenAI's CLIP model to understand both images and text in the same semantic space. This allows for:
- Image Indexing: Convert all your images into high-dimensional vector embeddings
- Text Understanding: Convert your search queries into comparable vectors
- Similarity Matching: Find the most similar images using cosine similarity
- Fast Retrieval: Use cached embeddings for instant search results
Web Interface Features ๐
The web interface provides:
- ๐ Folder Indexing: Point to any folder and index all images automatically
- ๐ค Text Search: Type natural language descriptions to find matching images
- ๐ผ๏ธ Visual Search: Upload an image to find similar ones in your collection
- ๐ Statistics: View indexing statistics and model information
- ๐จ Visual Results: See thumbnail previews with similarity scores
- ๐ฑ Responsive Design: Works perfectly on all devices
CLI Commands ๐ป
Serve Web Interface
# Start web server (default: http://0.0.0.0:8000)
fv serve
# Custom host and port
fv serve --host localhost --port 3000
# Development mode with auto-reload
fv serve --reload
Index Images
# Index all images in a folder
fv index /path/to/images
# Index without saving cache
fv index /path/to/images --no-cache
Search Commands
# Text search (natural language)
fv search-text "sunset over mountains"
fv search-text "a cat sleeping on a couch" --top-k 5
# Image search (visual similarity)
fv search-image /path/to/query.jpg
fv search-image query.png --top-k 20
# JSON output for scripting
fv search-text "dogs playing" --format json
Statistics
# View search engine statistics
fv stats
API Endpoints ๐
The FastAPI backend provides these endpoints:
GET /- Web interfacePOST /index- Index a folderGET /search/text- Search by text queryPOST /search/image- Search by image uploadGET /image/{path}- Serve image filesGET /stats- Get statisticsGET /health- Health check
Supported Image Formats ๐ธ
- JPEG (.jpg, .jpeg)
- PNG (.png)
- BMP (.bmp)
- GIF (.gif)
- TIFF (.tiff)
- WebP (.webp)
Performance & Optimization โก
- GPU Support: Automatically uses GPU if available (CUDA)
- Batch Processing: Efficient batch encoding of images
- Smart Caching: Embeddings are cached to disk for instant reloading
- Memory Management: Processes large collections without memory issues
- Concurrent Processing: Handles multiple search requests simultaneously
Example Use Cases ๐ก
Personal Photo Management
# Index your photo library
fv index ~/Pictures
# Find vacation photos
fv search-text "beach vacation sunset"
# Find similar photos to a favorite shot
fv search-image ~/Pictures/favorite_sunset.jpg
Digital Asset Management
# Index product images
fv index /company/product_photos
# Find specific product types
fv search-text "red athletic shoes"
fv search-text "office furniture desk"
Creative Workflows
# Index design assets
fv index /projects/design_assets
# Find inspiration
fv search-text "minimalist logo design"
fv search-text "modern interior architecture"
System Requirements ๐ฅ๏ธ
- Python: 3.9 or higher
- Memory: 4GB RAM minimum, 8GB+ recommended for large collections
- Storage: Additional space for embedding cache files
- GPU: Optional but recommended for faster processing (CUDA-compatible)
Model Information ๐ค
- Base Model: OpenAI CLIP ViT-B/32
- Embedding Dimension: 512
- Input Resolution: 224x224 pixels
- Vocabulary: 49,408 tokens
Advanced Configuration โ๏ธ
Environment Variables
# Set custom cache directory
export CLIP_CACHE_DIR=/path/to/cache
# Disable GPU usage
export CUDA_VISIBLE_DEVICES=""
Custom Model
You can use different CLIP models by modifying the code:
# In clip_search.py
search_engine = CLIPImageSearch(model_name="openai/clip-vit-large-patch14")
Troubleshooting ๐ง
Common Issues
"No images indexed" error:
- Make sure to run
fv index <folder_path>first - Check that the folder contains supported image formats
Slow indexing:
- Enable GPU acceleration if available
- Process smaller batches of images
- Use SSD storage for better I/O performance
Memory errors:
- Reduce batch size in the code
- Process images in smaller folders
- Increase system RAM
Web interface not loading:
- Check if port is already in use
- Try a different port:
fv serve --port 8080 - Check firewall settings
Development ๐ฉโ๐ป
Project Structure
folder-vision/
โโโ folder_vision/
โ โโโ __init__.py
โ โโโ app.py # FastAPI web application
โ โโโ cli.py # Command-line interface
โ โโโ clip_search.py # CLIP search engine
โโโ requirements.txt # Python dependencies
โโโ pyproject.toml # Project configuration
โโโ README.md # This file
Running Tests
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
License ๐
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments ๐
- OpenAI for the incredible CLIP model
- Hugging Face for the transformers library
- FastAPI for the excellent web framework
- PyTorch for the deep learning foundation
Support ๐ฌ
If you encounter any issues or have questions:
- Check the troubleshooting section above
- Search existing issues on GitHub
- Create a new issue with detailed information
- Include system information and error messages
Made with โค๏ธ by the Folder Vision Team
Start exploring your images in a whole new way! ๐
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file folder_vision-1.0.0.tar.gz.
File metadata
- Download URL: folder_vision-1.0.0.tar.gz
- Upload date:
- Size: 69.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de91188956f3684fca31100e85e9f2eae51123cbf5aa5cff0a18126ed6496ea4
|
|
| MD5 |
4dc9be2fa7757fdf1cf0b34a17c86bf9
|
|
| BLAKE2b-256 |
a71c7557c4432956565d657eb96dec011e4dbcf6a547b2e218f10aa6d39728c8
|
File details
Details for the file folder_vision-1.0.0-py3-none-any.whl.
File metadata
- Download URL: folder_vision-1.0.0-py3-none-any.whl
- Upload date:
- Size: 70.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61a59102bfa3095042734d4a627939b1c4a2775246113f34b4056d0228d70e73
|
|
| MD5 |
348727b3f8476ceb5c82954961d29acf
|
|
| BLAKE2b-256 |
a32ff1fbdef6e20360944b433552bb63b4d1e616f9d6022b4a13889157ab29f3
|