Face detection and LLM-based face refinement using YOLO and multimodal LLMs (Gemini, LLaVA).
Project description
MMLLM Face Refinement
This project implements a novel approach to face detection by combining traditional face detection methods with multimodal large language models (MMLLM) for refinement and false positive elimination.
Approach
- Initial Detection: Use YOLOv11-Face to detect potential faces in images
- Refinement: Each detected face is then analyzed using multimodal LLMs:
- Gemini API for cloud-based analysis
- LLaVA-NeXT (local model) for on-device analysis
- False Positive Elimination: The LLMs determine if the detection is actually a face
- Bounding Box Refinement: The LLMs can suggest refinements to the bounding boxes
Requirements
- Python 3.8+
- See
requirements.txtfor all dependencies - YOLOv11-Face model file (yolov11l-face.pt) in the models directory
Installation
Automatic Installation
Linux/macOS
# Clone the repository
git clone https://github.com/JonathanLehner/mmllm-face-refinement.git
cd mmllm-face-refinement
# Run the installation script
chmod +x install.sh
./install.sh
Windows
# Clone the repository
git clone https://github.com/yourusername/mmllm-face-refinement.git
cd mmllm-face-refinement
# Run the installation script
install.bat
Manual Installation
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate.bat
-
Install the package:
pip install -e .
-
Download the YOLOv11-Face model:
- Get the model from akanametov/yolo-face
- Place it in the
modelsdirectory asyolov11l-face.pt
-
Set up API keys:
cp .env.example .env # Edit .env file with your API keys
Usage
- Place input images in the
inputfolder - Run the main script:
python main.py - Results will be saved in the
outputfolder
For debugging purposes, you can run the test script with debug enabled:
python test.py --image input/your_image.jpg --debug
Configuration
You can adjust parameters in config.yaml:
- YOLO confidence threshold
- Bounding box padding
- LLM endpoint selection
- Output formatting options
- Debug settings
Debug Mode
The system includes a debug mode that saves intermediate results:
debug:
enabled: true # Enable debug mode
save_raw_detections: true # Save initial YOLO detections
save_intermediate_steps: true # Save cropped faces before LLM analysis
When debug mode is enabled, the following files are saved to the output/debug directory:
- Images with YOLO detections visualized
- JSON files with detection coordinates
- Cropped face images before LLM processing
- JSON files with detection metadata
References
This project utilizes the following models and repositories:
- YOLOv11-Face: State-of-the-art face detection model from akanametov/yolo-face
- LLaVA-NeXT: Local multimodal LLM from LLaVA-VL/LLaVA-NeXT
- Gemini API: Google's multimodal generative AI model
Upload to pipy
- python -m build
- pip install dist/mmllm_face_refinement-0.1.0-py3-none-any.whl
- python -m twine upload dist/*
Note
This is a research project demonstrating the use of multimodal LLMs for improving traditional computer vision tasks.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mmllm_face_refinement-0.1.12.tar.gz.
File metadata
- Download URL: mmllm_face_refinement-0.1.12.tar.gz
- Upload date:
- Size: 14.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
155c1d03121d8ee4b7f7aeec47676807d6ac0c947e5e9bc4ec22cda42a74a0aa
|
|
| MD5 |
967d4c71d24e974b3296a0b1063418e3
|
|
| BLAKE2b-256 |
32e3797f1ae6aaee940b2526e7bd6bdba5b35599cfe6dadfb93f9438d59f24cd
|
File details
Details for the file mmllm_face_refinement-0.1.12-py3-none-any.whl.
File metadata
- Download URL: mmllm_face_refinement-0.1.12-py3-none-any.whl
- Upload date:
- Size: 14.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe4be56a1492e23bd9fdad57eaed4099c272e327b54298f7df88f48b7460e3cb
|
|
| MD5 |
10909592724da2bbbc69729b3b1241ae
|
|
| BLAKE2b-256 |
f6bc29a04d42d98113ae62a4bec4b68a58327519e65a64e86decc2cb2a579ec9
|