pixelrag

Visual Retrieval-Augmented Generation — render, embed, index, search, train

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

andylizf

These details have not been verified by PyPI

Project links

Homepage

Project description

PixelRAG — Visual Retrieval-Augmented Generation

Search any document by how it looks, not just the text it contains.

License

What it is · Give Claude eyes · How it works · Pipelines

pip install pixelrag  # TODO: not on PyPI yet — publish, then this is the one-line install

The two core operations — render a page to screenshots, search a visual index:

# Render any page or document to screenshot tiles
pixelshot https://en.wikipedia.org/wiki/Python --output ./tiles

# Search a hosted index of 8.28M Wikipedia pages — no setup, runs against the live API
curl -X POST http://api.pixelrag.ai:30001/search \
  -H "Content-Type: application/json" \
  -d '{"queries": [{"text": "What is the capital of France?"}], "n_docs": 5}'

Or try it in the browser at pixelrag.ai.

What it is

PixelRAG renders documents — web pages, PDFs, images — as screenshots and retrieves over the images directly. Visual structure that HTML parsing throws away — tables, charts, layout, infographics — stays intact, so the reader model can actually answer questions about it. Wikipedia's 8.28M articles ship as a pre-built index; the pipeline itself is general-purpose.

Give Claude eyes

The renderer also ships as a Claude Code plugin — the pixelbrowse skill. Instead of fetching raw HTML, Claude screenshots a page with pixelshot and reads the image, so it sees charts, diagrams, tables, and layout the way a person does.

# One-time setup
./plugin/setup.sh

# Then run it in one shot — claude -p with the plugin:
claude --plugin-dir ./plugin -p "screenshot https://news.ycombinator.com and summarize the top stories"
claude --plugin-dir ./plugin -p "screenshot https://arxiv.org/abs/2404.12387 and explain the key findings"
claude --plugin-dir ./plugin -p "screenshot http://localhost:3000 and tell me if anything looks broken"

Or interactively — claude --plugin-dir ./plugin, then /screenshot https://example.com. No MCP server, no backend: the skill just calls pixelshot (Playwright/CDP) on your machine.

How it works

Text-based RAG parses to text and loses the table; PixelRAG renders to screenshot tiles and keeps it

Text-based RAG parses the page to text chunks and loses the table — the reader can't find the answer. PixelRAG renders the page to screenshot tiles, retrieves the right tile, and the reader reads the number straight off the image.

Two pieces make this work: (1) rendering documents to images instead of parsing them to text, and (2) a Qwen3-VL-Embedding model, LoRA-fine-tuned on screenshot data, that embeds page images into a space where visual content is retrievable.

Pipelines

Capture is the standalone pixelshot command; the rest of the pipeline runs through the pixelrag umbrella — pixelrag <stage>. Install only the stages you need:

Command	What it does	Install
`pixelshot`	Document → image tiles (Playwright CDP, PDF)	`pip install pixelrag`
`pixelrag chunk` · `embed` · `build-index`	Tiles → vectors → FAISS index	`pip install 'pixelrag[embed]'`
`pixelrag index`	Orchestrates the full pipeline: source → ingest → embed → index	`pip install 'pixelrag[index]'`
`pixelrag serve`	FAISS search API (FastAPI, CPU or GPU)	`pip install 'pixelrag[serve]'`
`pixelrag-train`	LoRA fine-tuning for Qwen3-VL-Embedding	`cd train && uv sync`

render ←── index ──→ embed       serve (independent)       train → serve (HTTP)

train is a separate uv project with its own pinned env (torch==2.9.1+cu129, transformers==4.57.1, cuDNN 9.20) — install it from inside train/, not from the root.

Search a pre-built index

pip install 'pixelrag[serve]'

# Download the pre-built Wikipedia index (8.28M pages) from Hugging Face
# TODO: publish the index to Hugging Face, then replace <HF_REPO> below
huggingface-cli download <HF_REPO> --repo-type dataset --local-dir ./index

# Serve, then query
pixelrag serve --index-dir ./index --port 30001

curl -X POST http://localhost:30001/search \
  -H "Content-Type: application/json" \
  -d '{"queries": [{"text": "What is the capital of France?"}], "n_docs": 5}'

Build an index from your own documents

pip install 'pixelrag[index]'

# Create pixelrag.yaml
cat > pixelrag.yaml << 'EOF'
source:
  type: local
  path: ./my_docs

embed:
  model: Qwen/Qwen3-VL-Embedding-2B
  device: cuda
  gpu_ids: [0]

output: ./my_index
EOF

# Build, then serve
pixelrag index build
pixelrag serve --index-dir ./my_index --port 30001

Render a page programmatically

from pixelrag_render import render_url

# render a single page to tiles — e.g. for an agent to read
tiles = render_url("https://en.wikipedia.org/wiki/Python", "./tiles")

Embed tools (standalone)

Each stage runs independently, without the orchestrator:

pip install 'pixelrag[embed]'

pixelrag chunk --tiles-dir ./tiles
pixelrag embed --shard-dir ./tiles --output-dir ./embeddings --gpu-ids 0,1
pixelrag build-index --embeddings-dir ./embeddings --output-dir ./index

Training

pixelrag-train LoRA fine-tunes Qwen/Qwen3-VL-Embedding-2B for webpage retrieval. See train/README.md for the full recipe.

You don't need to retrain to use the model — the trained adapters are published at Chrisyichuan/wiki-screenshot-embedding-lora.

We also release the full training set (Chrisyichuan/screenshot-training-natural-filtered-v2), so you can adapt other backbones yourself — a larger Qwen, or any other embedding model.

Acknowledgments

Thanks to Rulin Shao for support.

Thanks also to Claude Code and OpenAI Codex for supporting open-source contributors with credits and plans, which we earned by working on LEANN.

This work is done by the Berkeley Sky Computing Lab, BAIR, and the Berkeley NLP Group.

License

Apache-2.0

Project details

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

andylizf

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.2.1

Jun 1, 2026

0.2.0

Jun 1, 2026

This version

0.1.0

May 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pixelrag-0.1.0.tar.gz (9.8 kB view details)

Uploaded May 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pixelrag-0.1.0-py3-none-any.whl (10.1 kB view details)

Uploaded May 31, 2026 Python 3

File details

Details for the file pixelrag-0.1.0.tar.gz.

File metadata

Download URL: pixelrag-0.1.0.tar.gz
Upload date: May 31, 2026
Size: 9.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pixelrag-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`46a794683798e33612e77a9919a16428cb008544e474a1388ea9d9d04359fa0d`
MD5	`03ec6884f1bc9e67e1befde2ceffab91`
BLAKE2b-256	`38a4e45f681c6853cf44a2969b02474cc0c64f0125b52a5269f33059b92cefed`

See more details on using hashes here.

File details

Details for the file pixelrag-0.1.0-py3-none-any.whl.

File metadata

Download URL: pixelrag-0.1.0-py3-none-any.whl
Upload date: May 31, 2026
Size: 10.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pixelrag-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b72ff597f5876df9ca138367d75601312dbe6765005a2f78694ff1a52b2c3d33`
MD5	`ad5c855bc3207d56fe7a3748966ac1f1`
BLAKE2b-256	`c4613bfb3e897048ec6428efe52594c68c45a2bb1c0bfd9cd5d35e4e74d53f3f`

See more details on using hashes here.

pixelrag 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

What it is

Give Claude eyes

How it works

Pipelines

Search a pre-built index

Build an index from your own documents

Render a page programmatically

Embed tools (standalone)

Training

Acknowledgments

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes