Skip to main content

Python SDK for the UnWeb API — convert HTML to Markdown for AI pipelines

Project description

UnWeb Python SDK

CI PyPI version Python 3.9+ License: MIT

Python SDK for the UnWeb API — convert HTML to clean, LLM-ready Markdown for RAG pipelines, AI agents, and documentation ingestion.

Installation

pip install unweb

Quick Start

from unweb import UnWebClient

client = UnWebClient(api_key="unweb_your_key_here")

# Convert HTML to Markdown
result = client.convert.paste("<h1>Hello World</h1><p>Clean markdown output.</p>")
print(result.markdown)       # "# Hello World\n\nClean markdown output."
print(result.quality_score)  # 100

# Convert a webpage
result = client.convert.url("https://example.com/article")
print(result.markdown)

# Upload an HTML file
result = client.convert.upload("page.html")
print(result.markdown)

Get your free API key at app.unweb.info (500 credits/month, no credit card required).

Features

  • Conversions - Paste HTML, fetch URLs, or upload files. Returns clean CommonMark with quality scores.
  • Web Crawler - Crawl entire documentation sites with BFS traversal. Export as raw Markdown, LangChain JSONL, or LlamaIndex JSON.
  • Webhook Notifications - Get notified when crawl jobs complete via HTTPS webhooks.
  • Dashboard Access - Manage API keys, view usage, and handle subscriptions programmatically.
  • Quality Scores - Every conversion returns a 0-100 quality score detecting JS-rendered pages and content extraction issues.

API Reference

Conversions

All conversion methods return a ConversionResult with markdown, warnings, and quality_score.

from unweb import UnWebClient

client = UnWebClient(api_key="unweb_...")

# Paste raw HTML
result = client.convert.paste("<h1>Title</h1><p>Content</p>")
result.markdown       # "# Title\n\nContent"
result.quality_score  # 0-100
result.warnings       # ["Content auto-detected using: <main> element"]

# Convert from URL (fetches and converts server-side)
result = client.convert.url("https://docs.python.org/3/tutorial/index.html")

# Upload an HTML file
result = client.convert.upload("./downloaded-page.html")

Web Crawler

Crawl documentation sites and download results as a ZIP archive.

import time

# Start a crawl job
job = client.crawl.start(
    "https://docs.example.com",
    allowed_path="/docs/",      # Only crawl URLs under this path
    max_pages=100,               # Page limit
    export_format="raw-md",      # "raw-md", "langchain", or "llamaindex"
    webhook_url="https://your-app.com/hooks/crawl",  # Optional completion webhook
)
print(f"Job started: {job.job_id}")  # Job ID for polling

# Poll until complete
while not job.is_complete:
    time.sleep(5)
    job = client.crawl.status(job.job_id)
    print(f"  {job.status}: {job.pages_crawled} pages crawled")

# Download results
if job.status == "Completed":
    download = client.crawl.download(job.job_id)
    print(f"Download ZIP: {download.download_url}")
    print(f"Size: {download.size_bytes} bytes")

# List all your crawl jobs
jobs = client.crawl.list(status="Completed")
for j in jobs.jobs:
    print(f"  {j.job_id}: {j.pages_crawled} pages")

# Cancel a running job
client.crawl.cancel(job.job_id)

Export formats:

Format Output Use case
raw-md ZIP with .md files + manifest General purpose
langchain JSONL compatible with LangChain document loaders RAG with LangChain
llamaindex JSON compatible with LlamaIndex readers RAG with LlamaIndex

Authentication

The SDK uses API keys for conversion and crawler endpoints (set once in the constructor). For dashboard endpoints (usage, keys, subscription), authenticate with email/password to get a JWT:

# API key auth (conversions + crawler) - set in constructor
client = UnWebClient(api_key="unweb_...")

# JWT auth (dashboard endpoints) - login first
client.auth.login("you@example.com", "your-password")

# Now dashboard endpoints work
usage = client.usage.current()
keys = client.keys.list()
# Register a new account
token = client.auth.register("new@example.com", "password", "First", "Last")

# Get current user profile
profile = client.auth.me()
print(f"{profile.first_name} ({profile.email})")

# Update profile
client.auth.update_profile(first_name="NewName")

# Change password
client.auth.change_password("old-password", "new-password")

API Key Management

Requires JWT auth (client.auth.login(...) first).

# List API keys
keys = client.keys.list()
for key in keys:
    print(f"  {key.name}: {key.key_prefix}...")

# Create a new API key (full key only shown once)
new_key = client.keys.create("Production Key")
print(f"Key: {new_key.key}")  # Save this — not retrievable later

# Revoke an API key
client.keys.revoke(key_id="...")

Usage Tracking

Requires JWT auth.

usage = client.usage.current()
print(f"Credits used: {usage.credits_used}/{usage.credits_limit}")
print(f"Overage: {usage.overage_credits_used}")
print(f"Billing cycle: {usage.billing_cycle_start} - {usage.billing_cycle_end}")

# Detailed stats and history (returns raw dict)
stats = client.usage.stats()
history = client.usage.history()

Subscription

Requires JWT auth.

sub = client.subscription.get()
print(f"Tier: {sub.tier}")            # Free, Starter, Pro, Scale
print(f"Credits: {sub.credits_used}/{sub.monthly_credits}")
print(f"Overage: {sub.allows_overage}")

# Get a checkout URL to upgrade
url = client.subscription.checkout("Pro")
print(f"Upgrade: {url}")

# Cancel subscription
client.subscription.cancel()

Error Handling

The SDK raises typed exceptions for API errors:

from unweb import UnWebClient, UnWebError, AuthError, QuotaExceededError, ValidationError, NotFoundError

client = UnWebClient(api_key="unweb_...")

try:
    result = client.convert.paste("")
except ValidationError as e:
    print(f"Bad request: {e}")           # 400
except AuthError as e:
    print(f"Auth failed: {e}")           # 401/403
except QuotaExceededError as e:
    print(f"Quota exceeded: {e}")        # 429
except NotFoundError as e:
    print(f"Not found: {e}")             # 404
except UnWebError as e:
    print(f"API error ({e.status_code}): {e}")

# All exceptions have:
# e.status_code  - HTTP status code
# e.response     - Raw response body dict

Configuration

client = UnWebClient(
    api_key="unweb_...",                       # API key for conversions/crawler
    base_url="https://api.unweb.info",         # Default API URL
    timeout=30.0,                              # Request timeout in seconds
)

# Use as context manager for automatic cleanup
with UnWebClient(api_key="unweb_...") as client:
    result = client.convert.paste("<h1>Hello</h1>")

Pricing

Tier Credits/month Price
Free 500 $0
Starter 2,000 $12/mo
Pro 15,000 $39/mo
Scale 60,000 $99/mo

Different operations cost different credits. Paid plans include overage billing so your pipelines never stop. See unweb.info for details.

Links

License

MIT - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unweb-0.1.0.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unweb-0.1.0-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file unweb-0.1.0.tar.gz.

File metadata

  • Download URL: unweb-0.1.0.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for unweb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 77aec91e5707facb206c9a4be28cc5349c244bf63a956470ca044652121009ce
MD5 571ab23d53c71baefb0a7aa2d270d40a
BLAKE2b-256 ccde90210d8d3be50b16bceadf3a896243098c3efb8b9f1386a91c8aa5f1f58d

See more details on using hashes here.

Provenance

The following attestation bundles were made for unweb-0.1.0.tar.gz:

Publisher: publish.yml on mbsoft-systems/unweb-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file unweb-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: unweb-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for unweb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4002e2e27296417b70e432a0277dd0f172c16e6d1bd7e2f6357546bd104a8571
MD5 8c24ffd0e715c65da0a261dcf3d92b45
BLAKE2b-256 6d5055d872701f56573a17f02c0ba0ec5037ac03f40b3fb3782284cfcf3165a0

See more details on using hashes here.

Provenance

The following attestation bundles were made for unweb-0.1.0-py3-none-any.whl:

Publisher: publish.yml on mbsoft-systems/unweb-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page