Modular multimodal pipeline for vision-to-LLM integration
Project description
🧠 ModuMuse
Modular Multimodal Intelligence
Plug any Hugging Face LLM and vision encoder together via a learnable projector.
Supports zero-shot inference today, and adapter-based fine-tuning tomorrow.
🚀 Features
- 🔌 Plug-and-play architecture for combining LLMs and vision encoders
- 🧠 Supports popular models like Qwen, Mistral, LLaMA, CLIP, XCLIP, SAM
- 🧪 Zero-shot inference with learnable projector modules
- 🛠️ Adapter-based fine-tuning (coming soon)
- 📊 Easy benchmarking and visualization tools
📦 Installation
pip install modu-muse
🧬 Quick Start
from modu_muse import Pipeline
pipe = Pipeline(
llm_name="mistralai/Mistral-7B-Instruct-v0.2",
vision_name="openai/clip-vit-base-patch16"
)
result = pipe.infer("path/to/image.jpg", "Describe the scene.")
print(result)
🧠 Architecture
[Image/Video] → [Vision Encoder] → [Projector] → [LLM]
- Vision encoder extracts features
- Projector maps visual features to LLM-compatible embeddings
- LLM generates text conditioned on visual context
🛠️ Fine-Tuning (Coming Soon)
Train your own projector using paired image-text datasets:
python train_adapter.py \
--model llm=Qwen1.5 vision=xclip \
--dataset_path ./data/relevance_dataset \
--output_dir ./checkpoints
📁 Project Structure
modu_muse/
├── pipeline.py # Main multimodal pipeline
├── projector.py # Vision-to-LLM projector
├── models/
│ ├── llm.py # LLM loader
│ ├── vision.py # Vision encoder loader
├── examples/
│ └── quick_start.py # Demo script
🤝 Contributing
We welcome contributions! Whether it's new model support, training scripts, or documentation improvements—open a PR or start a discussion.
📜 License
This project is licensed under the MIT License.
© 2025 Wissem Elkarous
🌐 Resources
ModuMuse: Where vision meets language.
```Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file modu_muse-0.1.0.tar.gz.
File metadata
- Download URL: modu_muse-0.1.0.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ad692f8580492ab734f302adfc19faab9a425b50ec01027da21dd905c88e382
|
|
| MD5 |
e14ce93ded59090d446a06adf1db2f40
|
|
| BLAKE2b-256 |
76b1d45fbfaca3fbf78637edd98b533ae9728869cf2ae9120032914d4074247b
|
File details
Details for the file modu_muse-0.1.0-py3-none-any.whl.
File metadata
- Download URL: modu_muse-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7375cf81b0c70ecc392b601cb9fe25a7fcccdd2168d56248f54b985bce35134b
|
|
| MD5 |
914e111876df1ddd364955dae7ca32de
|
|
| BLAKE2b-256 |
a99347fa8ad6c0f745b3bf7dc51c1bc67523b073f07e2bacedd3342cbfb6d8f3
|