Structured web page representation for AI agents — 97% HTML token reduction
Project description
PageMap (Private Repo)
The browsing MCP server that fits in your context window.
Compresses ~100K-token HTML into 2-5K-token structured maps while preserving every actionable element. AI agents can read and interact with any web page at 97% fewer tokens.
Deployment Status
| Channel | Status | URL |
|---|---|---|
| PyPI | Published (v0.1.0) | https://pypi.org/project/retio-pagemap/ |
| GitHub (Public) | Live | https://github.com/Retio-ai/Retio-pagemap |
| mcp.so | Submitted | https://mcp.so |
| Smithery | On hold | Requires HTTP transport (STDIO not supported) |
Workflow
- 개발: 이 private repo (
Retio-ai/pagemap)에서 작업 - 배포:
scripts/release.sh --push로 public repo에 clean push - PyPI:
uv build && uv publish
# Dry-run (what would be pushed)
./scripts/release.sh
# Actually push to public repo
./scripts/release.sh --push
# Custom commit message
./scripts/release.sh --push -m "Release v0.1.1"
MCP Server Tools
| Tool | Description |
|---|---|
get_page_map |
Navigate to URL, return structured PageMap with ref numbers |
execute_action |
Click, type, select on elements by ref number |
get_page_state |
Lightweight page state check (URL, title) |
Architecture
URL → Playwright Browser
├─→ AX Tree ──→ 3-Tier Interactive Detector
└─→ HTML ─────→ 5-Stage Pruning Pipeline
1. HTMLRAG preprocessing
2. Script extraction (JSON-LD, RSC payloads)
3. Semantic filtering (nav, footer, aside)
4. Schema-aware chunk selection
5. Attribute stripping & compression
→ Budget-aware assembly → PageMap
src/pagemap/
├── server.py # MCP server (STDIO, FastMCP)
├── browser_session.py # Playwright session + crash recovery
├── interactive_detector.py # AX Tree → actionable elements (3-tier)
├── sanitizer.py # Security: boundary escape, prompt injection defense
├── page_map_builder.py # Orchestrator
├── pruning/ # 5-stage HTML compression pipeline
├── preprocessing/ # Token counting, normalization, schema registry
└── cli.py # CLI interface
Benchmark (vs Competitors)
| PageMap | Playwright MCP | Firecrawl | Jina Reader | |
|---|---|---|---|---|
| Tokens / page | 2-5K | 50-540K | 10-50K | 10-50K |
| Interaction | click / type / select | Raw tree parsing | Read-only | Read-only |
| Multi-page sessions | Unlimited | Breaks at 2-3 pages | N/A | N/A |
| Task success (66 tasks) | 95.2% | — | 60.9% | 61.2% |
| Cost / 62 tasks | $0.58 | — | $2.66 | $1.54 |
P0 Security/Stability (Completed)
-
</web_content>boundary escape defense (sanitizer.py) - Affordance-Action mismatch fix (interactive_detector.py)
- Browser crash recovery with 2-stage health check (browser_session.py)
- Stale ref detection on navigation (server.py)
Key Files
| File | Purpose |
|---|---|
README_PUBLIC.md |
Public repo README (copied as README.md on release) |
scripts/release.sh |
Private → Public release script |
smithery.yaml |
Smithery MCP config |
mvp/docs/architecture-improvements.md |
P0-P2 improvement tracker |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file retio_pagemap-0.1.1.tar.gz.
File metadata
- Download URL: retio_pagemap-0.1.1.tar.gz
- Upload date:
- Size: 234.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b64ba703b82b3ad7eb53d37e7b8c9aefe2fd64820f0455e4f3dbac25b688d452
|
|
| MD5 |
0cc3e021d788f23c88fddd6499593071
|
|
| BLAKE2b-256 |
418a9915def00206bcc45d5fe8bb8c396ee9e2b5256453659b6aaaa8785edb5d
|
File details
Details for the file retio_pagemap-0.1.1-py3-none-any.whl.
File metadata
- Download URL: retio_pagemap-0.1.1-py3-none-any.whl
- Upload date:
- Size: 70.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab458f3a4ce10ed2ce9ca831f4a66a2c5b7f49580f0cdd623d5f1d1ab0591eab
|
|
| MD5 |
2a4543415723997f9cc6545fa12b983e
|
|
| BLAKE2b-256 |
1c9d194f887041962a58861d97a4bd8742f3a3e69ba7fba24ff68014fb7c9f14
|