Interactive attention visualization for multi-modal transformer models

These details have not been verified by PyPI

Project links

Project description

🔍 Transformers Attention Viz

Interactive attention visualization for multi-modal transformer models

Visualize and understand cross-modal attention in vision-language models like BLIP and CLIP

🚀 Try it Now!

Full Demo - Explore all features

🎯 Features

📊 Cross-Modal Attention: Visualize how text tokens attend to image regions in BLIP
🔄 Multi-Layer Support: Analyze attention patterns across all transformer layers
📈 Attention Statistics: Compute entropy, concentration, and top attended regions
🎨 Publication Ready: Export high-quality figures for papers (PNG, PDF, SVG)
🚀 Easy Integration: Works seamlessly with HuggingFace models
🖥️ Interactive Dashboard: Explore attention patterns in real-time (local only)

📦 Installation

pip install transformers-attention-viz

💡 Basic Usage

from transformers_attention_viz import AttentionVisualizer
from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image

# Load BLIP model (supports cross-modal attention)
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")

# Create visualizer
visualizer = AttentionVisualizer(model, processor)

# Load your image
image = Image.open("cat.jpg")
text = "a fluffy orange cat"

# Visualize cross-modal attention
viz = visualizer.visualize(
    image=image,
    text=text,
    visualization_type="heatmap",
    attention_type="cross"  # text -> image attention
)
viz.show()

# Get attention statistics
stats = visualizer.get_attention_stats(image, text, attention_type="cross")
print(f"Average entropy: {stats['entropy'].mean():.3f}")
print(f"Top attended regions: {stats['top_tokens'][:3]}")

📸 Example Visualizations

Cross-Modal Attention (BLIP)

Each text token gets its own heatmap showing attention to image patches:

# Visualizing "a fluffy orange cat sitting on a surface"
# Generates separate heatmaps for each token

BLIP Cross-Modal Attention

Attention Statistics

# Example output:
Average entropy: 4.251
Top attended regions: 
  1. Patch_(24,4): 0.0429
  2. Patch_(20,1): 0.0395
  3. Patch_(23,4): 0.0391

🛠️ Advanced Features

Multi-Layer Analysis

# Visualize attention at different layers
viz = visualizer.visualize(
    image=image,
    text=text,
    layer_indices=[0, 5, 11],  # First, middle, last
    attention_type="cross"
)

Export for Publications

# Save high-quality figures
viz.save("attention_figure.png", dpi=300)  # For papers
viz.save("attention_figure.pdf")           # For LaTeX
viz.save("attention_figure.svg")           # For web

Interactive Dashboard

from transformers_attention_viz import launch_dashboard

# Launch interactive exploration tool (requires local environment)
launch_dashboard(model, processor)
# Opens at http://localhost:7860

🤖 Supported Models

Model	Cross-Modal Attention	Self-Attention	Status
BLIP	✅	✅	Fully Supported
CLIP	❌	✅	Self-Attention Only
BLIP-2	✅	✅	Coming Soon
Flamingo	✅	✅	In Development

📊 Understanding the Visualizations

BLIP Cross-Modal Attention: Shows how each text token attends to the 24×24 grid of image patches
Attention Entropy: Lower entropy indicates more focused attention
Diffuse Attention: BLIP often shows uniform attention, especially on simple images - this is normal behavior

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

# Clone the repo
git clone https://github.com/sisird864/transformers-attention-viz.git
cd transformers-attention-viz

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest tests/

📖 Citation

If you use this tool in your research, please cite:

@software{transformers-attention-viz,
  author = {Sisir Doppalapudi},
  title = {Transformers Attention Viz: Interactive Attention Visualization for Multi-Modal Transformers},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/sisird864/transformers-attention-viz}
}

🚧 Known Limitations

v0.1.15:
- Individual attention head visualization (aggregate_heads=False) not fully supported
- Flow visualization has dimension compatibility issues with BLIP
- BLIP text self-attention not captured (cross-modal and vision self-attention work fine)

🛤️ Roadmap

Full support for individual attention heads
Fix flow visualization for BLIP
Add BLIP-2 support
Add LLaVA support
3D attention visualization
Attention pattern export to TensorBoard
Real-time video attention tracking

📄 License

MIT License - see LICENSE for details.

🙏 Acknowledgments

HuggingFace team for the amazing Transformers library
Salesforce Research for BLIP
OpenAI for CLIP
All contributors and users of this tool

⭐ Support

If you find this tool useful, please consider:

Starring this repository
Sharing it with colleagues
Contributing improvements
Citing it in your research

Made with ❤️ for the ML research community

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.15

Jul 25, 2025

0.1.13

Jul 23, 2025

0.1.12

Jul 22, 2025

0.1.11

Jul 22, 2025

0.1.10

Jul 22, 2025

0.1.9

Jul 22, 2025

0.1.8

Jul 22, 2025

0.1.7

Jul 21, 2025

0.1.6

Jul 21, 2025

0.1.5

Jul 21, 2025

0.1.4

Jul 21, 2025

0.1.3

Jul 20, 2025

0.1.2

Jul 20, 2025

0.1.1

Jul 19, 2025

0.1.0

Jul 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transformers_attention_viz-0.1.15.tar.gz (29.4 kB view details)

Uploaded Jul 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

transformers_attention_viz-0.1.15-py3-none-any.whl (32.3 kB view details)

Uploaded Jul 25, 2025 Python 3

File details

Details for the file transformers_attention_viz-0.1.15.tar.gz.

File metadata

Download URL: transformers_attention_viz-0.1.15.tar.gz
Upload date: Jul 25, 2025
Size: 29.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for transformers_attention_viz-0.1.15.tar.gz
Algorithm	Hash digest
SHA256	`eb5bf9a6d0141a3d270597733d21b7d767173f10bdbcd20fcc75db6f6e360e47`
MD5	`46cb1ea900411bc578927a6de2d7a443`
BLAKE2b-256	`0cb39ddf0fda7727f1582b7211b36affb39bb85b3eff699a699d8c434c093a7d`

See more details on using hashes here.

File details

Details for the file transformers_attention_viz-0.1.15-py3-none-any.whl.

File metadata

Download URL: transformers_attention_viz-0.1.15-py3-none-any.whl
Upload date: Jul 25, 2025
Size: 32.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for transformers_attention_viz-0.1.15-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3218549ad671318f23870443291e44b21bc83d9a32172565309edfeb4ad3b235`
MD5	`4a858f1b1928cb9df625267efe320bb6`
BLAKE2b-256	`eec13f9d4f55aaed5e187dc5457f8f11a9b37046ff209baa59767baca62ac1dd`

See more details on using hashes here.

transformers-attention-viz 0.1.15

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🔍 Transformers Attention Viz

🚀 Try it Now!

🎯 Features

📦 Installation

💡 Basic Usage

📸 Example Visualizations

Cross-Modal Attention (BLIP)

Attention Statistics

🛠️ Advanced Features

Multi-Layer Analysis

Export for Publications

Interactive Dashboard

🤖 Supported Models

📊 Understanding the Visualizations

🤝 Contributing

📖 Citation

🚧 Known Limitations

🛤️ Roadmap

📄 License

🙏 Acknowledgments

⭐ Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes