OmniTrace: multimodal attribution for generative models
Project description
OmniTrace
This is the official repository for the paper: "OmniTrace: A Unified Framework for Generation-Time Attribution in Omni-Modal LLMs"
📄 Paper | 🌐 Project Page | 🤗 Demo | 📦 PyPI
🧠 Overview
OmniTrace is a plug-and-play framework for generation-time attribution in multimodal large language models (text, image, audio, video).
- Works for decoder-only multimodal LLMs
- Supports text, image, audio, and video
- Provides generation-time attribution
- Plug-and-play across different backends (Qwen, MiniCPM)
- No retraining required
🚀 Installation
🔧 Step 1: Install Backend Environments (Required)
OmniTrace relies on multimodal backends. Please follow the official setup instructions:
- Qwen2.5-Omni: https://github.com/QwenLM/Qwen2.5-Omni
- MiniCPM-o: https://github.com/OpenBMB/MiniCPM-o
We recommend creating a dedicated conda environment for each backend. Then install dependencies following the official repositories above.
📦 Step 2: Install OmniTrace
Option 1: Install from PyPI (recommended)
pip install omnitrace
Option 2: Install from GitHub
git clone https://github.com/Jackie-2000/OmniTrace.git
cd OmniTrace
pip install -e .
⚡ Quick Start (Python API)
from omnitrace import OmniTracer
tracer = OmniTracer(
model_name="qwen", # or "minicpm"
method="attmean" # attribution method, choose from "attmean", "attraw", "attgrads"
)
# visual-text input
sample = {
"prompt": "Answer the question based on the images provided. Explain your reasoning step by step.",
"question": [
{"text": "Is the time shown in clock or watch in both <image> and <image> the same?\n(A) Yes, they are both at 9 o'clock\n(B) Yes, they are both at 12 o'clock\n(C) No, they show different time"},
{"image": "examples/media/262_0.jpg"},
{"image": "examples/media/262_1.jpg"}
],
}
# audio input
sample = {
"prompt": "Answer the question based on the audio provided. Explain your reasoning step by step.\n",
"question": [
{"text": "What was the last sound in the sequence?\nA. footsteps\nB. dog_barking\nC. camera_shutter_clicking\nD. tapping_on_glass"},
{"audio": "examples/media/b7701ab1-c37e-49f2-8ad9-7177fe0465e9.wav"}
],
}
# video input
sample = {
"prompt": "Answer the question based on the video provided. Explain your reasoning step by step.\n",
"question": [
{"text": "What type of weapon does the slain legend retrieve?\nA. Sword\nB. Axe\nC. Gun\nD. Spear"},
{"video": "examples/media/6Z_XNM_iT4g.mp4"}
],
}
result = tracer.trace(sample)
print(result)
🖥️ Command Line Usage
Run OmniTrace on a dataset file:
python scripts/run_demo.py trace \
--questions_path examples/question_visual_text.json \
--model_name qwen \
--method attmean
📂 Input Format
The input file should be a JSON list of samples:
[
{
"id": 0,
"prompt": "Answer the question based on the images provided. Explain your reasoning step by step.\n",
"question": [
{"text": "Is the time shown in clock or watch in both <image> and <image> the same?\n(A) Yes, they are both at 9 o'clock\n(B) Yes, they are both at 12 o'clock\n(C) No, they show different time"},
{"image": "examples/media/262_0.jpg"},
{"image": "examples/media/262_1.jpg"}
],
},
{
"id": 1,
"prompt": "Answer the question based on the audio provided. Explain your reasoning step by step.\n",
"question": [
{"text": "What was the last sound in the sequence?\nA. footsteps\nB. dog_barking\nC. camera_shutter_clicking\nD. tapping_on_glass"},
{"audio": "examples/media/b7701ab1-c37e-49f2-8ad9-7177fe0465e9.wav"}
],
},
{
"id": 2,
"prompt": "Answer the question based on the video provided. Explain your reasoning step by step.\n",
"question": [
{"text": "What type of weapon does the slain legend retrieve?\nA. Sword\nB. Axe\nC. Gun\nD. Spear"},
{"video": "examples/media/6Z_XNM_iT4g.mp4"}
],
},
]
🧩 Supported Modalities
OmniTrace supports multimodal inputs with the following structure:
🔹 Text + Image (Interleaved)
You can provide multiple text and image inputs, interleaved:
| Field | Description |
|---|---|
text |
Input text (string or list) |
image |
Path(s) to image(s) |
Example:
{
"prompt": "Summarize the conversation.\n",
"question": [
{ "text": "<TURN> \"I have most enjoyed painting poor, delicate children. I didn't know whether that will interest anyone.\" - Helene Schjerfbeck (1862-1946). The Convalescent (1888) is her most famous example of this. It shows the girl getting her energy back."},
{"image": "examples/media/-288980723939800020.jpg"},
{"text": "<TURN> Thank you for sharing this. 'The Wounded Angel' is my favourite painting in AteneumMuseum"},
{"image": "examples/media/8846049217870534914.jpg"},
{"image": "examples/media/-4402135406098345009.jpg"},
{"text": "<TURN> That's a very nice indeed!"}
],
}
🔹 Audio/Video
| Field | Description |
|---|---|
audio |
Path to a single audio/video file |
question / text |
Prompt related to the audio/video |
Example:
{
"question": [
{"text": "What was the last sound in the sequence?\nA. footsteps\nB. dog_barking\nC. camera_shutter_clicking\nD. tapping_on_glass"},
{"audio": "examples/media/b7701ab1-c37e-49f2-8ad9-7177fe0465e9.wav"}
],
}
{
"question": [
{"text": "What type of weapon does the slain legend retrieve?\nA. Sword\nB. Axe\nC. Gun\nD. Spear"},
{"video": "examples/media/6Z_XNM_iT4g.mp4"}
],
}
⚠️ Notes
- Text + Image supports multiple inputs and interleaving.
- Audio and Video currently support only one file per sample.
- Each sample should include a prompt (
textorquestion) describing the task.
⚙️ Arguments
--questions_path
Path to input JSON file.
--model_name
Supported:
qwenminicpm
--method
Attribution method:
attmeanattrawattgrads
📁 Example Files
We provide ready-to-run examples:
examples/question_visual_text.json
examples/question_audio.json
examples/question_video.json
🧪 Minimal Test
Run this to verify everything works:
python scripts/run_demo.py trace \
--questions_path examples/question_visual_text.json \
--model_name qwen \
--method attmean
📊 Attribution Performance
Attribution performance across omni-modal models and tasks.
OTAttMean, OTRawAtt, and OTAttGrads denote OmniTrace instantiated with mean-pooled attention, raw attention, and gradient-based scoring signals, respectively.
$\dagger$ indicates results not reported due to computational constraints.
$\times$ indicates the method is not applicable.
Qwen2.5-Omni-7B
| Method | Text F1 (Summ.) | Image F1 (Summ.) | Image F1 (QA) | Time F1 (Audio Summ.) | Time F1 (Audio QA) | Time F1 (Video QA) |
|---|---|---|---|---|---|---|
| OTAttMean | 75.66 | 76.59 | 56.60 | 83.12 | 49.90 | 40.16 |
| OTRawAtt | 72.51 | 51.82 | 65.44 | 76.69 | 47.64 | 36.53 |
| OTAttGrads | 67.70 | 42.24 | 65.02 | † | 47.56 | † |
| Self-Attribution | 9.25 | 40.60 | 61.03 | 4.43 | 29.01 | 13.67 |
| Embedprocessor | 17.30 | 14.55 | 36.88 | × | × | × |
| EmbedCLIP | 17.20 | 3.54 | 6.32 | × | × | × |
| Random | 10.98 | 8.38 | 24.70 | × | × | × |
MiniCPM-o 4.5-9B
| Method | Text F1 (Summ.) | Image F1 (Summ.) | Image F1 (QA) | Time F1 (Audio Summ.) | Time F1 (Audio QA) | Time F1 (Video QA) |
|---|---|---|---|---|---|---|
| OTAttMean | 30.57 | 75.43 | 37.00 | 33.52 | 46.94 | 22.85 |
| OTRawAtt | 37.32 | 76.46 | 45.41 | 49.21 | 41.06 | 21.59 |
| Self-Attribution | 9.06 | 66.53 | 39.39 | 0.08 | 34.66 | 18.26 |
| Embedprocessor | 18.02 | 7.14 | 5.98 | × | × | × |
| EmbedCLIP | 17.98 | 5.55 | 5.32 | × | × | × |
| Random | 12.05 | 10.03 | 22.96 | × | × | × |
💡 Tips
- Always run from the repo root
- Use relative paths for media files
attgradsmay require high-memory GPUs (e.g., H100/H200)
📌 Citation
(To be added)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omnitrace-0.1.1.tar.gz.
File metadata
- Download URL: omnitrace-0.1.1.tar.gz
- Upload date:
- Size: 39.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9aee687fd15ad55b3a87651cd43ccabc5d648a7621341d85571ec92e72b5037b
|
|
| MD5 |
43328541c0987ae9c403d8281b559f8d
|
|
| BLAKE2b-256 |
a8675d009c8969d1a17e5636501d6745cc5e35962fd7d0065125e65698b787d2
|
File details
Details for the file omnitrace-0.1.1-py3-none-any.whl.
File metadata
- Download URL: omnitrace-0.1.1-py3-none-any.whl
- Upload date:
- Size: 47.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0bd69afae0616d4fa90918811c2196b82add568b89989a8c44715ddb0bc8ccd
|
|
| MD5 |
2ca5a87702dabc1bb348caa5696464a0
|
|
| BLAKE2b-256 |
9526b272400710adf1498e946f86f4bc174d4c1c49d64838a1ef973cf22b90aa
|