Python SDK for the UnWeb API — convert HTML to Markdown for AI pipelines
Project description
UnWeb Python SDK
Python SDK for the UnWeb API — convert HTML to clean, LLM-ready Markdown for RAG pipelines, AI agents, and documentation ingestion.
Installation
pip install unweb
Quick Start
from unweb import UnWebClient
client = UnWebClient(api_key="unweb_your_key_here")
# Convert HTML to Markdown
result = client.convert.paste("<h1>Hello World</h1><p>Clean markdown output.</p>")
print(result.markdown) # "# Hello World\n\nClean markdown output."
print(result.quality_score) # 100
# Convert a webpage
result = client.convert.url("https://example.com/article")
print(result.markdown)
# Upload an HTML file
result = client.convert.upload("page.html")
print(result.markdown)
Get your free API key at app.unweb.info (500 credits/month, no credit card required).
Features
- Conversions - Paste HTML, fetch URLs, or upload files. Returns clean CommonMark with quality scores.
- Web Crawler - Crawl entire documentation sites with BFS traversal. Export as raw Markdown, LangChain JSONL, or LlamaIndex JSON.
- Webhook Notifications - Get notified when crawl jobs complete via HTTPS webhooks.
- Dashboard Access - Manage API keys, view usage, and handle subscriptions programmatically.
- Quality Scores - Every conversion returns a 0-100 quality score detecting JS-rendered pages and content extraction issues.
API Reference
Conversions
All conversion methods return a ConversionResult with markdown, warnings, and quality_score.
from unweb import UnWebClient
client = UnWebClient(api_key="unweb_...")
# Paste raw HTML
result = client.convert.paste("<h1>Title</h1><p>Content</p>")
result.markdown # "# Title\n\nContent"
result.quality_score # 0-100
result.warnings # ["Content auto-detected using: <main> element"]
# Convert from URL (fetches and converts server-side)
result = client.convert.url("https://docs.python.org/3/tutorial/index.html")
# Upload an HTML file
result = client.convert.upload("./downloaded-page.html")
Web Crawler
Crawl documentation sites and download results as a ZIP archive.
import time
# Start a crawl job
job = client.crawl.start(
"https://docs.example.com",
allowed_path="/docs/", # Only crawl URLs under this path
max_pages=100, # Page limit
export_format="raw-md", # "raw-md", "langchain", or "llamaindex"
webhook_url="https://your-app.com/hooks/crawl", # Optional completion webhook
)
print(f"Job started: {job.job_id}") # Job ID for polling
# Poll until complete
while not job.is_complete:
time.sleep(5)
job = client.crawl.status(job.job_id)
print(f" {job.status}: {job.pages_crawled} pages crawled")
# Download results
if job.status == "Completed":
download = client.crawl.download(job.job_id)
print(f"Download ZIP: {download.download_url}")
print(f"Size: {download.size_bytes} bytes")
# List all your crawl jobs
jobs = client.crawl.list(status="Completed")
for j in jobs.jobs:
print(f" {j.job_id}: {j.pages_crawled} pages")
# Cancel a running job
client.crawl.cancel(job.job_id)
Export formats:
| Format | Output | Use case |
|---|---|---|
raw-md |
ZIP with .md files + manifest |
General purpose |
langchain |
JSONL compatible with LangChain document loaders | RAG with LangChain |
llamaindex |
JSON compatible with LlamaIndex readers | RAG with LlamaIndex |
Authentication
The SDK uses API keys for conversion and crawler endpoints (set once in the constructor). For dashboard endpoints (usage, keys, subscription), authenticate with email/password to get a JWT:
# API key auth (conversions + crawler) - set in constructor
client = UnWebClient(api_key="unweb_...")
# JWT auth (dashboard endpoints) - login first
client.auth.login("you@example.com", "your-password")
# Now dashboard endpoints work
usage = client.usage.current()
keys = client.keys.list()
# Register a new account
token = client.auth.register("new@example.com", "password", "First", "Last")
# Get current user profile
profile = client.auth.me()
print(f"{profile.first_name} ({profile.email})")
# Update profile
client.auth.update_profile(first_name="NewName")
# Change password
client.auth.change_password("old-password", "new-password")
API Key Management
Requires JWT auth (client.auth.login(...) first).
# List API keys
keys = client.keys.list()
for key in keys:
print(f" {key.name}: {key.key_prefix}...")
# Create a new API key (full key only shown once)
new_key = client.keys.create("Production Key")
print(f"Key: {new_key.key}") # Save this — not retrievable later
# Revoke an API key
client.keys.revoke(key_id="...")
Usage Tracking
Requires JWT auth.
usage = client.usage.current()
print(f"Credits used: {usage.credits_used}/{usage.credits_limit}")
print(f"Overage: {usage.overage_credits_used}")
print(f"Billing cycle: {usage.billing_cycle_start} - {usage.billing_cycle_end}")
# Detailed stats and history (returns raw dict)
stats = client.usage.stats()
history = client.usage.history()
Subscription
Requires JWT auth.
sub = client.subscription.get()
print(f"Tier: {sub.tier}") # Free, Starter, Pro, Scale
print(f"Credits: {sub.credits_used}/{sub.monthly_credits}")
print(f"Overage: {sub.allows_overage}")
# Get a checkout URL to upgrade
url = client.subscription.checkout("Pro")
print(f"Upgrade: {url}")
# Cancel subscription
client.subscription.cancel()
Error Handling
The SDK raises typed exceptions for API errors:
from unweb import UnWebClient, UnWebError, AuthError, QuotaExceededError, ValidationError, NotFoundError
client = UnWebClient(api_key="unweb_...")
try:
result = client.convert.paste("")
except ValidationError as e:
print(f"Bad request: {e}") # 400
except AuthError as e:
print(f"Auth failed: {e}") # 401/403
except QuotaExceededError as e:
print(f"Quota exceeded: {e}") # 429
except NotFoundError as e:
print(f"Not found: {e}") # 404
except UnWebError as e:
print(f"API error ({e.status_code}): {e}")
# All exceptions have:
# e.status_code - HTTP status code
# e.response - Raw response body dict
Configuration
client = UnWebClient(
api_key="unweb_...", # API key for conversions/crawler
base_url="https://api.unweb.info", # Default API URL
timeout=30.0, # Request timeout in seconds
)
# Use as context manager for automatic cleanup
with UnWebClient(api_key="unweb_...") as client:
result = client.convert.paste("<h1>Hello</h1>")
Pricing
| Tier | Credits/month | Price |
|---|---|---|
| Free | 500 | $0 |
| Starter | 2,000 | $12/mo |
| Pro | 15,000 | $39/mo |
| Scale | 60,000 | $99/mo |
Different operations cost different credits. Paid plans include overage billing so your pipelines never stop. See unweb.info for details.
Links
- UnWeb Homepage
- API Documentation
- Dashboard (get your API key)
- Report Issues
License
MIT - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file unweb-0.1.0.tar.gz.
File metadata
- Download URL: unweb-0.1.0.tar.gz
- Upload date:
- Size: 14.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77aec91e5707facb206c9a4be28cc5349c244bf63a956470ca044652121009ce
|
|
| MD5 |
571ab23d53c71baefb0a7aa2d270d40a
|
|
| BLAKE2b-256 |
ccde90210d8d3be50b16bceadf3a896243098c3efb8b9f1386a91c8aa5f1f58d
|
Provenance
The following attestation bundles were made for unweb-0.1.0.tar.gz:
Publisher:
publish.yml on mbsoft-systems/unweb-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
unweb-0.1.0.tar.gz -
Subject digest:
77aec91e5707facb206c9a4be28cc5349c244bf63a956470ca044652121009ce - Sigstore transparency entry: 1201997840
- Sigstore integration time:
-
Permalink:
mbsoft-systems/unweb-python@600b6574ebf12d7e3241fca4f60f089cc148bd59 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mbsoft-systems
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@600b6574ebf12d7e3241fca4f60f089cc148bd59 -
Trigger Event:
push
-
Statement type:
File details
Details for the file unweb-0.1.0-py3-none-any.whl.
File metadata
- Download URL: unweb-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4002e2e27296417b70e432a0277dd0f172c16e6d1bd7e2f6357546bd104a8571
|
|
| MD5 |
8c24ffd0e715c65da0a261dcf3d92b45
|
|
| BLAKE2b-256 |
6d5055d872701f56573a17f02c0ba0ec5037ac03f40b3fb3782284cfcf3165a0
|
Provenance
The following attestation bundles were made for unweb-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on mbsoft-systems/unweb-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
unweb-0.1.0-py3-none-any.whl -
Subject digest:
4002e2e27296417b70e432a0277dd0f172c16e6d1bd7e2f6357546bd104a8571 - Sigstore transparency entry: 1201997845
- Sigstore integration time:
-
Permalink:
mbsoft-systems/unweb-python@600b6574ebf12d7e3241fca4f60f089cc148bd59 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mbsoft-systems
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@600b6574ebf12d7e3241fca4f60f089cc148bd59 -
Trigger Event:
push
-
Statement type: