Skip to main content

Structured web page representation for AI agents — 97% HTML token reduction

Project description

PageMap (Private Repo)

The browsing MCP server that fits in your context window.

Compresses ~100K-token HTML into 2-5K-token structured maps while preserving every actionable element. AI agents can read and interact with any web page at 97% fewer tokens.

Deployment Status

Channel Status URL
PyPI Published (v0.1.0) https://pypi.org/project/retio-pagemap/
GitHub (Public) Live https://github.com/Retio-ai/Retio-pagemap
mcp.so Submitted https://mcp.so
Smithery On hold Requires HTTP transport (STDIO not supported)

Workflow

  • 개발: 이 private repo (Retio-ai/pagemap)에서 작업
  • 배포: scripts/release.sh --push로 public repo에 clean push
  • PyPI: uv build && uv publish
# Dry-run (what would be pushed)
./scripts/release.sh

# Actually push to public repo
./scripts/release.sh --push

# Custom commit message
./scripts/release.sh --push -m "Release v0.1.1"

MCP Server Tools

Tool Description
get_page_map Navigate to URL, return structured PageMap with ref numbers
execute_action Click, type, select on elements by ref number
get_page_state Lightweight page state check (URL, title)

Architecture

URL → Playwright Browser
       ├─→ AX Tree ──→ 3-Tier Interactive Detector
       └─→ HTML ─────→ 5-Stage Pruning Pipeline
                         1. HTMLRAG preprocessing
                         2. Script extraction (JSON-LD, RSC payloads)
                         3. Semantic filtering (nav, footer, aside)
                         4. Schema-aware chunk selection
                         5. Attribute stripping & compression
                       → Budget-aware assembly → PageMap
src/pagemap/
├── server.py              # MCP server (STDIO, FastMCP)
├── browser_session.py     # Playwright session + crash recovery
├── interactive_detector.py # AX Tree → actionable elements (3-tier)
├── sanitizer.py           # Security: boundary escape, prompt injection defense
├── page_map_builder.py    # Orchestrator
├── pruning/               # 5-stage HTML compression pipeline
├── preprocessing/         # Token counting, normalization, schema registry
└── cli.py                 # CLI interface

Benchmark (vs Competitors)

PageMap Playwright MCP Firecrawl Jina Reader
Tokens / page 2-5K 50-540K 10-50K 10-50K
Interaction click / type / select Raw tree parsing Read-only Read-only
Multi-page sessions Unlimited Breaks at 2-3 pages N/A N/A
Task success (66 tasks) 95.2% 60.9% 61.2%
Cost / 62 tasks $0.58 $2.66 $1.54

P0 Security/Stability (Completed)

  • </web_content> boundary escape defense (sanitizer.py)
  • Affordance-Action mismatch fix (interactive_detector.py)
  • Browser crash recovery with 2-stage health check (browser_session.py)
  • Stale ref detection on navigation (server.py)

Key Files

File Purpose
README_PUBLIC.md Public repo README (copied as README.md on release)
scripts/release.sh Private → Public release script
smithery.yaml Smithery MCP config
mvp/docs/architecture-improvements.md P0-P2 improvement tracker

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retio_pagemap-0.1.1.tar.gz (234.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

retio_pagemap-0.1.1-py3-none-any.whl (70.5 kB view details)

Uploaded Python 3

File details

Details for the file retio_pagemap-0.1.1.tar.gz.

File metadata

  • Download URL: retio_pagemap-0.1.1.tar.gz
  • Upload date:
  • Size: 234.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.5

File hashes

Hashes for retio_pagemap-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b64ba703b82b3ad7eb53d37e7b8c9aefe2fd64820f0455e4f3dbac25b688d452
MD5 0cc3e021d788f23c88fddd6499593071
BLAKE2b-256 418a9915def00206bcc45d5fe8bb8c396ee9e2b5256453659b6aaaa8785edb5d

See more details on using hashes here.

File details

Details for the file retio_pagemap-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for retio_pagemap-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ab458f3a4ce10ed2ce9ca831f4a66a2c5b7f49580f0cdd623d5f1d1ab0591eab
MD5 2a4543415723997f9cc6545fa12b983e
BLAKE2b-256 1c9d194f887041962a58861d97a4bd8742f3a3e69ba7fba24ff68014fb7c9f14

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page