Modular Multimodal Intelligent Reformatting and Augmentation Generation Engine - Advanced platform for processing datasets using generative models including vision-language models.

Project description

MMIRAGE

MMIRAGE, which stands for Modular Multimodal Intelligent Reformatting and Augmentation Generation Engine, is an advanced platform designed to streamline the processing of datasets using generative models, including vision-language models (VLMs). It is engineered to handle large-scale data reformatting and augmentation tasks with efficiency and precision. By leveraging state-of-the-art generative models, MMIRAGE enables users to perform complex dataset transformations, ensuring compatibility across various formats and schemas. Its multi-node support and parallel processing capabilities make it an ideal choice for scenarios demanding substantial computational power, such as distributed training and inference workflows. MMIRAGE not only simplifies the integration of powerful language models but also provides a customizable framework for diverse use cases, from reformatting conversational datasets to generating Q/A pairs from plain text.

How to install

To install the library, you can clone it from GitHub and then use pip to install it directly. It is recommended to have already installed torch and sglang to take advantage of GPU acceleration.

git clone git@github.com:EPFLiGHT/MMIRAGE.git
pip install -e ./MMIRAGE

For testing and scripts that make use of the library, it is advised to create a .env file:

./scripts/generate_env.sh

Key features

Multimodal Support: Process both text and images with vision-language models
Easily configurable with a YAML file which configures the following parameters:
- The prompt to the LLM (using Jinja2 templating)
- Variables with the name and their JMESPath key to a JSON
- Image inputs for multimodal processing
Parallelizable with multi-node support
- The training pipeline uses distributed inference with sharding
Support a variety of LLMs and VLMs (Vision-Language Models)
Support any dataset schemas (configurable with the YAML format)
The ability to either output a JSON (or any other structured format) or plain text
Modular architecture with pluggable processors, loaders, and writers

Example usage

Text-only: Reformatting dataset

Suppose you have a dataset with samples of the following format

{ 
    "conversations" : [{"role": "user", "content": "Describe the image"}, {"role": "assistant", "content": "This is a badly formmatted answer"}],
    "modalities" : ["<the images>"]
}

The dataset contains assistant answers that are badly formatted. The goal would be to use a LLM to format our answer in Markdown. With MMIRAGE, it would be as simple as defining a YAML configuration file:

processors:
  - type: llm
    server_args:
      model_path: Qwen/Qwen3-8B
      tp_size: 4
      trust_remote_code: true
    default_sampling_params:
      temperature: 0.1
      top_p: 1.0
      max_new_tokens: 384

loading_params:
  datasets:
    - path: /path/to/dataset
      type: loadable
      output_dir: /path/to/output/shards
  num_shards: "$SLURM_ARRAY_TASK_COUNT"
  shard_id: "$SLURM_ARRAY_TASK_ID"
  batch_size: 64

processing_params:
  inputs:
    - name: assistant_answer
      key: conversations[1].content
    - name: user_prompt
      key: conversations[0].content
    - name: modalities
      key: modalities

  outputs:
    - name: formatted_answer
      type: llm
      output_type: plain
      prompt: | 
        Reformat the answer in a markdown format without adding anything else:
        {{ assistant_answer }}
      
  remove_columns: false
  output_schema:
    conversations:
      - role: user
        content: "{{ user_prompt }}"
      - role: assistant
        content: "{{ formatted_answer }}"
    modalities: "{{ modalities }}"

Configuration explanation:

processors: List of processor configurations. Currently supports llm type for LLM-based generation.
loading_params: Parameters for loading and sharding datasets.
- datasets: List of dataset configurations with path, type, and output directory.
processing_params:
- inputs: Variables extracted from the input dataset using JMESPath queries.
- outputs: Variables created by processors. Prompts use Jinja2 templating ({{ variable }}).
- output_schema: Defines the structure of output samples.

Multimodal: Processing images with VLMs

MMIRAGE supports multimodal processing with vision-language models:

processors:
  - type: llm
    server_args:
      model_path: Qwen/Qwen2-VL-7B-Instruct
      tp_size: 4
      trust_remote_code: true
    chat_template: qwen2-vl  # Required for VLMs
    default_sampling_params:
      temperature: 0.1
      top_p: 0.95
      max_new_tokens: 768

loading_params:
  datasets:
    - path: /path/to/image/dataset
      type: loadable
      output_dir: /path/to/output/shards
  num_shards: "$SLURM_ARRAY_TASK_COUNT"
  shard_id: "$SLURM_ARRAY_TASK_ID"
  batch_size: 32

processing_params:
  inputs:
    - name: medical_image
      key: image
      type: image  # Mark as image input
      image_base_path: /path/to/images  # Base directory for relative paths
    - name: original_caption
      key: caption
      type: text

  outputs:
    - name: enhanced_caption
      type: llm
      output_type: plain
      prompt: |
        Describe the medical image in detail.
        Original caption for context: {{ original_caption }}
        
  remove_columns: false
  output_schema:
    image: "{{ medical_image }}"
    caption: "{{ enhanced_caption }}"
    original_caption: "{{ original_caption }}"

Key multimodal features:

chat_template: Specify the VLM chat template (e.g., qwen2-vl)
type: image: Mark input variables as images
image_base_path: Base directory for resolving relative image paths
Supports PIL Images, URLs, and file paths

Architecture

MMIRAGE uses a modular architecture:

mmirage/
├── config/           # Configuration loading and validation
├── core/
│   ├── loader/       # Dataset loaders (JSONL, HuggingFace)
│   ├── process/      # Processors (LLM, etc.) and variable system
│   │   └── processors/
│   │       └── llm/  # LLM processor with multimodal support
│   └── writer/       # Output rendering with Jinja2
├── shard_process.py  # Main processing script
└── merge_shards.py   # Shard merging utility

Useful tools

Jinja2 for template processing: link
JMESPath for JSON queries: link
SGLang for fast inference: link
Performance paper: link

Project details

Release history Release notifications | RSS feed

0.1.4

Mar 25, 2026

This version

0.1.3

Feb 8, 2026

0.1.2

Jan 29, 2026

0.1.1

Jan 28, 2026

0.1.0

Jan 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mmirage-0.1.3.tar.gz (21.2 kB view details)

Uploaded Feb 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mmirage-0.1.3-py3-none-any.whl (30.2 kB view details)

Uploaded Feb 8, 2026 Python 3

File details

Details for the file mmirage-0.1.3.tar.gz.

File metadata

Download URL: mmirage-0.1.3.tar.gz
Upload date: Feb 8, 2026
Size: 21.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mmirage-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`7342fcd00aaab0ab663b9c4604bd8e1fb44676722d068c598c443ce4bda0e090`
MD5	`93ca74ac2234287579c3babb4db1b9de`
BLAKE2b-256	`bf182f1c346a108cf59d1211c44e63bdd88fcc48626bbfc8ce8c188ca61c9e65`

See more details on using hashes here.

File details

Details for the file mmirage-0.1.3-py3-none-any.whl.

File metadata

Download URL: mmirage-0.1.3-py3-none-any.whl
Upload date: Feb 8, 2026
Size: 30.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mmirage-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cdd54859c31b03d2ce6557e6fd3a6cdbb54fda2ad15e6173ce464679adecd9b7`
MD5	`2b8f4b592f67b146f48088f784522bea`
BLAKE2b-256	`01dc5197bfbe7b44eba684c39618f7e911e6ed1039a968de567ea0ec8f32bb6a`

See more details on using hashes here.

mmirage 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

MMIRAGE

How to install

Key features

Example usage

Text-only: Reformatting dataset

Multimodal: Processing images with VLMs

Architecture

Useful tools

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes