OmniTrace: multimodal attribution for generative models

Project description

OmniTrace

This is the official repository for the paper: "OmniTrace: A Unified Framework for Generation-Time Attribution in Omni-Modal LLMs"

📄 Paper | 🌐 Project Page | 🤗 Demo | 📦 PyPI

🧠 Overview

OmniTrace is a plug-and-play framework for generation-time attribution in multimodal large language models (text, image, audio, video).

Works for decoder-only multimodal LLMs
Supports text, image, audio, and video
Provides generation-time attribution
Plug-and-play across different backends (Qwen, MiniCPM)
No retraining required

🚀 Installation

🔧 Step 1: Install Backend Environments (Required)

OmniTrace relies on multimodal backends. Please follow the official setup instructions:

Qwen2.5-Omni: https://github.com/QwenLM/Qwen2.5-Omni
MiniCPM-o: https://github.com/OpenBMB/MiniCPM-o

We recommend creating a dedicated conda environment for each backend. Then install dependencies following the official repositories above.

📦 Step 2: Install OmniTrace

Option 1: Install from PyPI (recommended)

pip install omnitrace

Option 2: Install from GitHub

git clone https://github.com/Jackie-2000/OmniTrace.git
cd OmniTrace
pip install -e .

⚡ Quick Start (Python API)

from omnitrace import OmniTracer

tracer = OmniTracer(
    model_name="qwen",      # or "minicpm"
    method="attmean"        # attribution method, choose from "attmean", "attraw", "attgrads"
)

# visual-text input
sample = {
    "prompt": "Answer the question based on the images provided. Explain your reasoning step by step.",
    "question": [
        {"text": "Is the time shown in clock or watch in both <image> and <image> the same?\n(A) Yes, they are both at 9 o'clock\n(B) Yes, they are both at 12 o'clock\n(C) No, they show different time"},
        {"image": "examples/media/262_0.jpg"},
        {"image": "examples/media/262_1.jpg"}
    ],
}

# audio input
sample = {
    "prompt": "Answer the question based on the audio provided. Explain your reasoning step by step.\n",
    "question": [
        {"text": "What was the last sound in the sequence?\nA. footsteps\nB. dog_barking\nC. camera_shutter_clicking\nD. tapping_on_glass"},
        {"audio": "examples/media/b7701ab1-c37e-49f2-8ad9-7177fe0465e9.wav"}
    ],
}

# video input
sample = {
    "prompt": "Answer the question based on the video provided. Explain your reasoning step by step.\n",
    "question": [
        {"text": "What type of weapon does the slain legend retrieve?\nA. Sword\nB. Axe\nC. Gun\nD. Spear"},
        {"video": "examples/media/6Z_XNM_iT4g.mp4"}
    ],
}

result = tracer.trace(sample)
print(result)

🖥️ Command Line Usage

Run OmniTrace on a dataset file:

python scripts/run_demo.py trace \
  --questions_path examples/question_visual_text.json \
  --model_name qwen \
  --method attmean

📂 Input Format

The input file should be a JSON list of samples:

[
  {
    "id": 0,
    "prompt": "Answer the question based on the images provided. Explain your reasoning step by step.\n",
    "question": [
        {"text": "Is the time shown in clock or watch in both <image> and <image> the same?\n(A) Yes, they are both at 9 o'clock\n(B) Yes, they are both at 12 o'clock\n(C) No, they show different time"},
        {"image": "examples/media/262_0.jpg"},
        {"image": "examples/media/262_1.jpg"}
    ],
  },
  {
    "id": 1,
    "prompt": "Answer the question based on the audio provided. Explain your reasoning step by step.\n",
    "question": [
        {"text": "What was the last sound in the sequence?\nA. footsteps\nB. dog_barking\nC. camera_shutter_clicking\nD. tapping_on_glass"},
        {"audio": "examples/media/b7701ab1-c37e-49f2-8ad9-7177fe0465e9.wav"}
    ],
  },
  {
    "id": 2,
    "prompt": "Answer the question based on the video provided. Explain your reasoning step by step.\n",
    "question": [
        {"text": "What type of weapon does the slain legend retrieve?\nA. Sword\nB. Axe\nC. Gun\nD. Spear"},
        {"video": "examples/media/6Z_XNM_iT4g.mp4"}
    ],
  },
]

🧩 Supported Modalities

OmniTrace supports multimodal inputs with the following structure:

🔹 Text + Image (Interleaved)

You can provide multiple text and image inputs, interleaved:

Field	Description
`text`	Input text (string or list)
`image`	Path(s) to image(s)

Example:

{
    "prompt": "Summarize the conversation.\n",
    "question": [
        { "text": "<TURN> \"I have most enjoyed painting poor, delicate children. I didn't know whether that will interest anyone.\" - Helene Schjerfbeck (1862-1946). The Convalescent (1888) is her most famous example of this. It shows the girl getting her energy back."},
        {"image": "examples/media/-288980723939800020.jpg"},
        {"text": "<TURN> Thank you for sharing this. 'The Wounded Angel' is my favourite painting in AteneumMuseum"},
        {"image": "examples/media/8846049217870534914.jpg"},
        {"image": "examples/media/-4402135406098345009.jpg"},
        {"text": "<TURN> That's a very nice indeed!"}
    ],
}

🔹 Audio/Video

Field	Description
`audio`	Path to a single audio/video file
`question` / `text`	Prompt related to the audio/video

Example:

{
    "question": [
        {"text": "What was the last sound in the sequence?\nA. footsteps\nB. dog_barking\nC. camera_shutter_clicking\nD. tapping_on_glass"},
        {"audio": "examples/media/b7701ab1-c37e-49f2-8ad9-7177fe0465e9.wav"}
    ],
}
{
    "question": [
        {"text": "What type of weapon does the slain legend retrieve?\nA. Sword\nB. Axe\nC. Gun\nD. Spear"},
        {"video": "examples/media/6Z_XNM_iT4g.mp4"}
    ],
}

⚠️ Notes

Text + Image supports multiple inputs and interleaving.
Audio and Video currently support only one file per sample.
Each sample should include a prompt (text or question) describing the task.

⚙️ Arguments

`--questions_path`

Path to input JSON file.

`--model_name`

Supported:

qwen
minicpm

`--method`

Attribution method:

attmean
attraw
attgrads

📁 Example Files

We provide ready-to-run examples:

examples/question_visual_text.json
examples/question_audio.json
examples/question_video.json

🧪 Minimal Test

Run this to verify everything works:

python scripts/run_demo.py trace \
  --questions_path examples/question_visual_text.json \
  --model_name qwen \
  --method attmean

📊 Attribution Performance

Attribution performance across omni-modal models and tasks.
OT_AttMean, OT_RawAtt, and OT_AttGrads denote OmniTrace instantiated with mean-pooled attention, raw attention, and gradient-based scoring signals, respectively.
$\dagger$ indicates results not reported due to computational constraints.
$\times$ indicates the method is not applicable.

Qwen2.5-Omni-7B

Method	Text F1 (Summ.)	Image F1 (Summ.)	Image F1 (QA)	Time F1 (Audio Summ.)	Time F1 (Audio QA)	Time F1 (Video QA)
OT_AttMean	75.66	76.59	56.60	83.12	49.90	40.16
OT_RawAtt	72.51	51.82	65.44	76.69	47.64	36.53
OT_AttGrads	67.70	42.24	65.02	†	47.56	†
Self-Attribution	9.25	40.60	61.03	4.43	29.01	13.67
Embed_processor	17.30	14.55	36.88	×	×	×
Embed_CLIP	17.20	3.54	6.32	×	×	×
Random	10.98	8.38	24.70	×	×	×

MiniCPM-o 4.5-9B

Method	Text F1 (Summ.)	Image F1 (Summ.)	Image F1 (QA)	Time F1 (Audio Summ.)	Time F1 (Audio QA)	Time F1 (Video QA)
OT_AttMean	30.57	75.43	37.00	33.52	46.94	22.85
OT_RawAtt	37.32	76.46	45.41	49.21	41.06	21.59
Self-Attribution	9.06	66.53	39.39	0.08	34.66	18.26
Embed_processor	18.02	7.14	5.98	×	×	×
Embed_CLIP	17.98	5.55	5.32	×	×	×
Random	12.05	10.03	22.96	×	×	×

💡 Tips

Always run from the repo root
Use relative paths for media files
attgrads may require high-memory GPUs (e.g., H100/H200)

📌 Citation

(To be added)

Project details

Release history Release notifications | RSS feed

This version

0.1.1

Mar 20, 2026

0.1.0

Mar 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omnitrace-0.1.1.tar.gz (39.5 kB view details)

Uploaded Mar 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

omnitrace-0.1.1-py3-none-any.whl (47.8 kB view details)

Uploaded Mar 20, 2026 Python 3

File details

Details for the file omnitrace-0.1.1.tar.gz.

File metadata

Download URL: omnitrace-0.1.1.tar.gz
Upload date: Mar 20, 2026
Size: 39.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for omnitrace-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`9aee687fd15ad55b3a87651cd43ccabc5d648a7621341d85571ec92e72b5037b`
MD5	`43328541c0987ae9c403d8281b559f8d`
BLAKE2b-256	`a8675d009c8969d1a17e5636501d6745cc5e35962fd7d0065125e65698b787d2`

See more details on using hashes here.

File details

Details for the file omnitrace-0.1.1-py3-none-any.whl.

File metadata

Download URL: omnitrace-0.1.1-py3-none-any.whl
Upload date: Mar 20, 2026
Size: 47.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for omnitrace-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b0bd69afae0616d4fa90918811c2196b82add568b89989a8c44715ddb0bc8ccd`
MD5	`2ca5a87702dabc1bb348caa5696464a0`
BLAKE2b-256	`9526b272400710adf1498e946f86f4bc174d4c1c49d64838a1ef973cf22b90aa`

See more details on using hashes here.

omnitrace 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

OmniTrace

🧠 Overview

🚀 Installation

🔧 Step 1: Install Backend Environments (Required)

📦 Step 2: Install OmniTrace

Option 1: Install from PyPI (recommended)

Option 2: Install from GitHub

⚡ Quick Start (Python API)

🖥️ Command Line Usage

📂 Input Format

🧩 Supported Modalities

🔹 Text + Image (Interleaved)

🔹 Audio/Video

⚠️ Notes

⚙️ Arguments

--questions_path

--model_name

--method

📁 Example Files

🧪 Minimal Test

📊 Attribution Performance

Qwen2.5-Omni-7B

MiniCPM-o 4.5-9B

💡 Tips

📌 Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`--questions_path`

`--model_name`

`--method`