Skip to main content

Modular multimodal pipeline for vision-to-LLM integration

Project description

🧠 ModuMuse

Modular Multimodal Intelligence
Plug any Hugging Face LLM and vision encoder together via a learnable projector.
Supports zero-shot inference today, and adapter-based fine-tuning tomorrow.

GitHub stars PyPI version License Python version


🚀 Features

  • 🔌 Plug-and-play architecture for combining LLMs and vision encoders
  • 🧠 Supports popular models like Qwen, Mistral, LLaMA, CLIP, XCLIP, SAM
  • 🧪 Zero-shot inference with learnable projector modules
  • 🛠️ Adapter-based fine-tuning (coming soon)
  • 📊 Easy benchmarking and visualization tools

📦 Installation

pip install modu-muse

🧬 Quick Start

from modu_muse import Pipeline

pipe = Pipeline(
    llm_name="mistralai/Mistral-7B-Instruct-v0.2",
    vision_name="openai/clip-vit-base-patch16"
)

result = pipe.infer("path/to/image.jpg", "Describe the scene.")
print(result)

🧠 Architecture

[Image/Video] → [Vision Encoder] → [Projector] → [LLM]
  • Vision encoder extracts features
  • Projector maps visual features to LLM-compatible embeddings
  • LLM generates text conditioned on visual context

🛠️ Fine-Tuning (Coming Soon)

Train your own projector using paired image-text datasets:

python train_adapter.py \
  --model llm=Qwen1.5 vision=xclip \
  --dataset_path ./data/relevance_dataset \
  --output_dir ./checkpoints

📁 Project Structure

modu_muse/
├── pipeline.py          # Main multimodal pipeline
├── projector.py         # Vision-to-LLM projector
├── models/
│   ├── llm.py           # LLM loader
│   ├── vision.py        # Vision encoder loader
├── examples/
│   └── quick_start.py   # Demo script

🤝 Contributing

We welcome contributions! Whether it's new model support, training scripts, or documentation improvements—open a PR or start a discussion.


📜 License

This project is licensed under the MIT License.
© 2025 Wissem Elkarous


🌐 Resources


ModuMuse: Where vision meets language.

```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modu_muse-0.1.1.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

modu_muse-0.1.1-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file modu_muse-0.1.1.tar.gz.

File metadata

  • Download URL: modu_muse-0.1.1.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for modu_muse-0.1.1.tar.gz
Algorithm Hash digest
SHA256 035041f5aeb85b356bcfed81e3a4f0da89e7029841c45245fe639824b981f6db
MD5 ef08b83e9bd193e25ea18efd0ee04b68
BLAKE2b-256 6b4eb220ba1644c1f110286891525ecb7b22264f0e37f202dca41c6f8c636491

See more details on using hashes here.

File details

Details for the file modu_muse-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: modu_muse-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for modu_muse-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 30a7be162c75d733f3bf047fe0c00c70cbb00fdd6b0c01beb673da7355f2f226
MD5 614b010379bb0d70b31c1f6e9be8ab9a
BLAKE2b-256 d80e3d2df58b73ef8471c7c65c8aac98e6e8fb22318004760ea6a9a25681bd82

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page