Turn any website into a CLI/API for AI agents
Project description
Turn any website into a CLI/API for AI agents.
The Problem
AI agents interact with websites through browser automation, which is slow, expensive, and unreliable:
| Without site2cli | With site2cli | |
|---|---|---|
| Speed | 10-30s per action (browser) | <1s per action (API) |
| Cost | Thousands of LLM tokens per page | Zero tokens for cached actions |
| Reliability | ~15-35% on benchmarks | >95% for discovered APIs |
| Setup | Write custom Playwright scripts | site2cli discover <url> |
| Output | Screenshots, raw HTML | Structured JSON, typed clients |
How It Works
site2cli uses Progressive Formalization — a 3-tier system that automatically graduates interactions from slow-but-universal to fast-but-specific:
graph LR
A["Tier 1: Browser<br/>Exploration"] -->|"Pattern<br/>detected"| B["Tier 2: Cached<br/>Workflow"]
B -->|"API<br/>discovered"| C["Tier 3: Direct<br/>API Call"]
style A fill:#ff6b6b,color:#fff
style B fill:#ffd93d,color:#000
style C fill:#6bcb77,color:#fff
The Discovery Pipeline captures browser traffic and converts it into structured interfaces:
graph TD
A[Launch Browser + CDP] --> B[Capture Network Traffic]
B --> C[Group by Endpoint Pattern]
C --> D[LLM-Assisted Analysis]
D --> E[OpenAPI 3.1 Spec]
E --> F[Python Client]
E --> G[CLI Commands]
E --> H[MCP Server]
Comparison
| Feature | browser-use | Hand-built CLIs | CLI-Anything | webctl | site2cli |
|---|---|---|---|---|---|
| Works on any site | Yes | No | Yes | Yes | Yes |
| Structured output | No | Yes | Yes | JSON/a11y/md | Yes |
| Auto-discovery | No | No | No | No | Yes |
| MCP server generation | No | No | No | No | Yes |
| Progressive optimization | No | N/A | No | No | Yes |
| Cookie banner handling | No | N/A | No | Yes | Yes |
| Auth page detection | No | N/A | No | Yes | Yes |
| Self-healing | No | No | No | No | Yes |
| No browser needed (after discovery) | No | Yes | No | No | Yes |
| Agent init/config | No | No | No | Yes | Yes |
| Community spec sharing | No | No | No | No | Yes |
Quick Start
# Install (lightweight - no browser deps by default)
pip install site2cli
# Install with all features
pip install site2cli[all]
# Or pick what you need
pip install site2cli[browser] # Playwright for traffic capture
pip install site2cli[llm] # Claude API for smart analysis
pip install site2cli[mcp] # MCP server generation
Discover a Site's API
# Capture traffic and discover API endpoints
site2cli discover kayak.com --action "search flights"
# site2cli launches a browser, captures network traffic,
# and generates: OpenAPI spec + Python client + MCP tools
Use the Generated Interface
# CLI
site2cli run kayak.com search_flights from=SFO to=JFK date=2025-04-01
# Or as MCP tools for AI agents
site2cli mcp generate kayak.com
site2cli mcp serve kayak.com
As a Python Library
from site2cli.discovery.analyzer import TrafficAnalyzer
from site2cli.discovery.spec_generator import generate_openapi_spec
from site2cli.generators.mcp_gen import generate_mcp_server_code
# Analyze captured traffic
analyzer = TrafficAnalyzer(exchanges)
endpoints = analyzer.extract_endpoints()
# Generate OpenAPI spec
spec = generate_openapi_spec(api)
# Generate MCP server
mcp_code = generate_mcp_server_code(site, spec)
What Gets Generated
From a single discovery session, site2cli produces:
| Output | Description |
|---|---|
| OpenAPI 3.1 Spec | Full API specification with schemas, parameters, auth |
| Python Client | Typed httpx client with methods for each endpoint |
| CLI Commands | Typer commands you can run from terminal |
| MCP Server | Tools that AI agents (Claude, etc.) can call directly |
Architecture
graph TB
subgraph "Interface Layer"
CLI[CLI - Typer]
MCP[MCP Server]
SDK[Python SDK]
end
subgraph "Router"
R[Tier Router + Fallback]
end
subgraph "Execution Tiers"
T1[Tier 1: Browser]
T2[Tier 2: Workflow]
T3[Tier 3: API]
end
subgraph "Discovery Engine"
CAP[Traffic Capture - CDP]
ANA[Pattern Analyzer]
GEN[Code Generators]
end
CLI --> R
MCP --> R
SDK --> R
R --> T1
R --> T2
R --> T3
CAP --> ANA --> GEN
Live Validation
site2cli has been validated with 7 experiments across 15+ real public APIs — a comprehensive pre-launch test suite:
Experiment #8: Core Pipeline (5 APIs)
| API | Endpoints | Spec | Client | MCP | Pipeline |
|---|---|---|---|---|---|
| JSONPlaceholder | 8 | Valid | Makes real calls | 8 tools | 157ms |
| httpbin.org | 7 | Valid | Makes real calls | 7 tools | 179ms |
| Dog CEO API | 5 | Valid | Makes real calls | 5 tools | 209ms |
| Open-Meteo | 1 | Valid | Makes real calls | 1 tool | 686ms |
| GitHub API | 4 | Valid | Makes real calls | 4 tools | 323ms |
| Total | 25 | 5/5 | 5/5 | 25 tools | avg 310ms |
Experiment #9: API Breadth (10 APIs, 7 categories)
| API | Category | Endpoints | Spec | MCP Tools |
|---|---|---|---|---|
| PokeAPI | Structured REST | 5 | Valid | 5 |
| CatFacts | Simple REST | 3 | Valid | 3 |
| Chuck Norris | Simple REST | 3 | Valid | 3 |
| SWAPI (Star Wars) | Nested Paths | 5 | Valid | 5 |
| Open Library | Query Params | 2 | Valid | 2 |
| USGS Earthquake | Government/Science | 2 | Valid | 2 |
| NASA APOD | Government/Science | 1 | Valid | 1 |
| Met Museum | Cultural | 3 | Valid | 3 |
| Art Institute Chicago | Cultural | 4 | Valid | 4 |
| REST Countries | Geographic | 5 | Valid | 5 |
| Total | 7 categories | 33 | 10/10 | 33 |
Full Validation Suite Summary
| # | Experiment | Key Result |
|---|---|---|
| 8 | Core Pipeline | 25 endpoints, 5/5 APIs, avg 310ms |
| 9 | API Breadth | 33 endpoints across 10 diverse APIs |
| 10 | Unofficial API Benchmark | 62% coverage vs hand-reverse-engineered APIs, 2M x faster |
| 11 | Speed & Cost | 74% cheaper than browser-use, 32 req/s throughput |
| 12 | MCP Validation | 20 tools, 14/14 quality checks, 100% handler coverage |
| 13 | Spec Accuracy | 80% accuracy vs ground truth |
| 14 | Resilience | 100% health check accuracy, drift detection works |
All 7 experiments pass in ~74 seconds.
# Auto-generated client for JSONPlaceholder — no human code
client = JSONPlaceholderClient()
albums = client.get_albums()
# → [{"userId": 1, "id": 1, "title": "quidem molestiae enim"}, ...]
# Auto-generated client for Open-Meteo — handles query params
client = OpenMeteoClient()
weather = client.get_v1_forecast(latitude="37.77", longitude="-122.42", current_weather="true")
# → {"current_weather": {"temperature": 12.3, "windspeed": 8.2, ...}}
Reproduce all experiments: python experiments/run_all_experiments.py
Testing
214 tests (208 unit/integration + 6 live), all passing on Python 3.10+.
| Test File | Tests | Coverage Area |
|---|---|---|
test_analyzer.py |
23 | Traffic analysis, path normalization, schema inference, auth detection |
test_cli.py |
16 | All CLI subcommands via CliRunner |
test_models.py |
15 | Pydantic model validation, serialization, defaults |
test_router.py |
15 | Tier routing, fallback, promotion, param forwarding |
test_cookie_banner.py |
12 | Cookie banner detection & auto-dismissal |
test_auth.py |
11 | Keyring store/get, auth headers, cookie extraction |
test_integration_pipeline.py |
11 | Full pipeline with mock data |
test_registry.py |
10 | SQLite CRUD, tier updates, health tracking |
test_wait_conditions.py |
10 | Rich wait conditions (network-idle, selector, stable) |
test_detectors.py |
10 | Auth/SSO/CAPTCHA page detection |
test_tier_promotion.py |
9 | Tier fallback, auto-promotion, failure gates |
test_config.py |
8 | Config singleton, dirs, YAML save/load, API key |
test_health.py |
8 | Health check with mock httpx, status persistence |
test_generated_code.py |
8 | compile() validation of generated code |
test_retry.py |
8 | Async retry utility with delay and callbacks |
test_a11y.py |
8 | Accessibility tree extraction and formatting |
test_output_filter.py |
8 | Output filtering (grep, limit, keys-only) |
test_agent_config.py |
8 | Agent config generation (Claude MCP, generic) |
test_spec_generator.py |
6 | OpenAPI spec generation and persistence |
test_community.py |
6 | Export/import roundtrip, community listing |
test_client_generator.py |
4 | Python client code generation |
test_integration_live.py |
6 | Live tests against JSONPlaceholder + httpbin |
Development
# Clone and install with dev dependencies
git clone https://github.com/lonexreb/site2cli.git
cd site2cli
pip install -e ".[dev]"
# Run tests
pytest # Unit + integration tests (no network)
pytest -m live # Live tests (hits real APIs)
pytest -v # Verbose output
# Lint
ruff check src/ tests/
API Keys
- Anthropic API key (
ANTHROPIC_API_KEY): Used for LLM-assisted endpoint analysis. Optional — discovery works without it, just without enhanced descriptions. - No other keys required for core functionality.
What's New in v0.2.5
- Cookie banner auto-dismissal — 3-strategy detection (30+ vendor selectors, multilingual text matching, a11y role matching) runs automatically during discovery
- Auth page detection — Detects login/SSO/OAuth/MFA/CAPTCHA pages and suggests
site2cli auth login - Accessibility tree extraction — Better page representation for LLM-driven exploration (replaces CSS-only element extraction)
- Action retry logic — Configurable retries with delay for click/fill/select/press actions
- Rich wait conditions — 9 condition types:
network-idle,load,exists:<selector>,visible:<selector>,hidden:<selector>,url-contains:<text>,text-contains:<text>,stable - Output filtering —
--grep,--limit,--keys-only,--compactflags onsite2cli run - Agent init command —
site2cli initgenerates Claude MCP config or generic agent prompts from discovered sites - 214 tests (up from 156), all passing
Roadmap
- Core discovery pipeline (traffic capture → OpenAPI → client)
- MCP server generation
- Community spec sharing (export/import)
- Health monitoring and self-healing
- Tier auto-promotion (Browser → Workflow → API)
- PyPI package publication
- Pre-launch validation suite (7 experiments, 15+ APIs, all passing)
- Cookie banner handling & auth page detection
- Accessibility tree extraction for browser exploration
- Agent init/config generation
- Output filtering for run results
- OAuth device flow support
- Workflow recording UI
- Multi-site orchestration
- Trained endpoint classifier (replace heuristics)
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file site2cli-0.2.5.tar.gz.
File metadata
- Download URL: site2cli-0.2.5.tar.gz
- Upload date:
- Size: 44.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64520c739593b5664ad897d6473096090eb5d30a9d31ee9c10daf30fa36d1d85
|
|
| MD5 |
176c0988373e336a1585dcae1d9e7c47
|
|
| BLAKE2b-256 |
d65b83b84c8b34fdebd40dffe554b52c9cfd389e1455f6205b953f369a665f2e
|
Provenance
The following attestation bundles were made for site2cli-0.2.5.tar.gz:
Publisher:
publish.yml on lonexreb/site2cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
site2cli-0.2.5.tar.gz -
Subject digest:
64520c739593b5664ad897d6473096090eb5d30a9d31ee9c10daf30fa36d1d85 - Sigstore transparency entry: 1114516840
- Sigstore integration time:
-
Permalink:
lonexreb/site2cli@02784ce074fb70a0ee14ae69aa807e6669f93a1c -
Branch / Tag:
refs/tags/v0.2.5 - Owner: https://github.com/lonexreb
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@02784ce074fb70a0ee14ae69aa807e6669f93a1c -
Trigger Event:
release
-
Statement type:
File details
Details for the file site2cli-0.2.5-py3-none-any.whl.
File metadata
- Download URL: site2cli-0.2.5-py3-none-any.whl
- Upload date:
- Size: 57.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a0903992b222780fa8897fdb6c22ed42a7714f4f134bf8f70f4c7ca7b0f2aba
|
|
| MD5 |
92ac8ba20923f2fd3745c14ade6f95d3
|
|
| BLAKE2b-256 |
0f51bbf069539d5479334173d2e05898bf32bd79ac217425a5784e453d6cc4a7
|
Provenance
The following attestation bundles were made for site2cli-0.2.5-py3-none-any.whl:
Publisher:
publish.yml on lonexreb/site2cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
site2cli-0.2.5-py3-none-any.whl -
Subject digest:
1a0903992b222780fa8897fdb6c22ed42a7714f4f134bf8f70f4c7ca7b0f2aba - Sigstore transparency entry: 1114516847
- Sigstore integration time:
-
Permalink:
lonexreb/site2cli@02784ce074fb70a0ee14ae69aa807e6669f93a1c -
Branch / Tag:
refs/tags/v0.2.5 - Owner: https://github.com/lonexreb
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@02784ce074fb70a0ee14ae69aa807e6669f93a1c -
Trigger Event:
release
-
Statement type: