Official Python SDK for the AlterLab Web Scraping API - Extract data from any website with intelligent anti-bot bypass
Project description
AlterLab Python SDK
Official Python SDK for the AlterLab Web Scraping API. Extract data from any website with intelligent anti-bot bypass, JavaScript rendering, and structured extraction.
Features
- Simple API: 3 lines of code to scrape any website
- Intelligent Anti-Bot Bypass: Automatic tier escalation (curl → HTTP → stealth → browser)
- JavaScript Rendering: Full Playwright browser for JS-heavy sites
- Structured Extraction: JSON Schema, prompts, and pre-built profiles
- BYOP Support: Bring Your Own Proxy for 20% discount
- Async Support: Native asyncio for concurrent scraping
- Type Hints: Full typing support for IDE autocomplete
- Cost Controls: Set budgets, prefer cost/speed, fail-fast options
Installation
pip install alterlab
Quick Start
from alterlab import AlterLab
# Initialize client
client = AlterLab(api_key="sk_live_...") # or set ALTERLAB_API_KEY env var
# Scrape a website
result = client.scrape("https://example.com")
print(result.text) # Extracted text
print(result.json) # Structured JSON (Schema.org, metadata)
print(result.billing.cost_dollars) # Cost breakdown
Pricing
Pay-as-you-go pricing with no subscriptions. $1 = 5,000 scrapes (Tier 1).
| Tier | Name | Price | Per $1 | Use Case |
|---|---|---|---|---|
| 1 | Curl | $0.0002 | 5,000 | Static HTML sites |
| 2 | HTTP | $0.0003 | 3,333 | Sites with TLS fingerprinting |
| 3 | Stealth | $0.0005 | 2,000 | Sites with browser checks |
| 4 | Browser | $0.001 | 1,000 | JS-heavy SPAs |
| 5 | Captcha | $0.02 | 50 | Sites with CAPTCHAs |
The API automatically escalates through tiers until successful, charging only for the tier used.
Usage Examples
Basic Scraping
from alterlab import AlterLab
client = AlterLab(api_key="sk_live_...")
# Auto mode - intelligent tier escalation
result = client.scrape("https://example.com")
# Force HTML-only (fastest, cheapest)
result = client.scrape_html("https://example.com")
# JavaScript rendering
result = client.scrape_js("https://spa-app.com", screenshot=True)
print(result.screenshot_url)
Structured Extraction
# Extract specific fields with JSON Schema
result = client.scrape(
"https://store.com/product/123",
extraction_schema={
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
"in_stock": {"type": "boolean"}
}
}
)
print(result.json) # {"name": "...", "price": 29.99, "in_stock": true}
# Or use a pre-built profile
result = client.scrape(
"https://store.com/product/123",
extraction_profile="product"
)
Cost Controls
from alterlab import AlterLab, CostControls
client = AlterLab(api_key="sk_live_...")
# Limit to cheap tiers only
result = client.scrape(
"https://example.com",
cost_controls=CostControls(
max_tier="2", # Don't go above HTTP tier
prefer_cost=True, # Optimize for lowest cost
fail_fast=True # Error instead of escalating
)
)
# Estimate cost before scraping
estimate = client.estimate_cost("https://linkedin.com")
print(f"Estimated: ${estimate.estimated_cost_dollars:.4f}")
print(f"Confidence: {estimate.confidence}")
Advanced Options
from alterlab import AlterLab, AdvancedOptions
client = AlterLab(api_key="sk_live_...")
# Full browser with screenshot and PDF
result = client.scrape(
"https://example.com",
mode="js",
advanced=AdvancedOptions(
render_js=True,
screenshot=True,
generate_pdf=True,
markdown=True,
wait_condition="networkidle"
)
)
print(result.screenshot_url)
print(result.pdf_url)
print(result.markdown_content)
BYOP (Bring Your Own Proxy)
Get 20% discount when using your own proxy:
from alterlab import AlterLab, AdvancedOptions
client = AlterLab(api_key="sk_live_...")
# Use your configured proxy integration
result = client.scrape(
"https://example.com",
advanced=AdvancedOptions(
use_own_proxy=True,
proxy_country="US" # Optional: request specific geo
)
)
# Check if BYOP was applied
if result.billing.byop_applied:
print(f"Saved {result.billing.byop_discount_percent}%!")
Async Support
import asyncio
from alterlab import AsyncAlterLab
async def main():
async with AsyncAlterLab(api_key="sk_live_...") as client:
# Single request
result = await client.scrape("https://example.com")
# Concurrent requests
urls = [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3",
]
results = await asyncio.gather(*[client.scrape(url) for url in urls])
for r in results:
print(r.title, r.billing.cost_dollars)
asyncio.run(main())
Caching
# Enable caching (opt-in)
result = client.scrape(
"https://example.com",
cache=True, # Enable caching
cache_ttl=3600, # Cache for 1 hour
)
if result.cached:
print("Cache hit - no credits charged!")
# Force refresh
result = client.scrape(
"https://example.com",
cache=True,
force_refresh=True # Bypass cache
)
PDF and Image Extraction
# Extract text from PDF
result = client.scrape_pdf(
"https://example.com/document.pdf",
format="markdown"
)
print(result.text)
# OCR for images
result = client.scrape_ocr(
"https://example.com/image.png",
language="eng"
)
print(result.text)
Error Handling
from alterlab import (
AlterLab,
AuthenticationError,
InsufficientCreditsError,
RateLimitError,
ScrapeError,
TimeoutError
)
client = AlterLab(api_key="sk_live_...")
try:
result = client.scrape("https://example.com")
except AuthenticationError:
print("Invalid API key")
except InsufficientCreditsError:
print("Please top up your balance")
except RateLimitError as e:
print(f"Rate limited. Retry after {e.retry_after}s")
except ScrapeError as e:
print(f"Scraping failed: {e.message}")
except TimeoutError:
print("Request timed out")
Check Usage & Balance
usage = client.get_usage()
print(f"Balance: ${usage.balance_dollars:.2f}")
print(f"Used this month: {usage.credits_used_month} credits")
API Reference
AlterLab Client
AlterLab(
api_key: str = None, # API key (or ALTERLAB_API_KEY env var)
base_url: str = None, # Custom API URL
timeout: int = 120, # Request timeout in seconds
max_retries: int = 3, # Retry count for transient failures
retry_delay: float = 1.0 # Initial retry delay (exponential backoff)
)
scrape() Method
client.scrape(
url: str, # URL to scrape
mode: str = "auto", # "auto", "html", "js", "pdf", "ocr"
sync: bool = True, # Wait for result vs return job ID
advanced: AdvancedOptions, # Advanced scraping options
cost_controls: CostControls, # Budget and optimization settings
cache: bool = False, # Enable response caching
cache_ttl: int = None, # Cache TTL in seconds (60-86400)
formats: list = None, # Output formats: ["text", "json", "html", "markdown"]
extraction_schema: dict, # JSON Schema for structured extraction
extraction_prompt: str, # Natural language extraction instructions
extraction_profile: str, # Pre-built profile: "product", "article", etc.
wait_for: str = None, # CSS selector to wait for (JS mode)
screenshot: bool = False, # Capture screenshot (JS mode)
) -> ScrapeResult
ScrapeResult
result.url # Scraped URL
result.status_code # HTTP status
result.text # Extracted text content
result.html # HTML content
result.json # Structured JSON content
result.title # Page title
result.author # Author (if detected)
result.billing # BillingDetails object
result.billing.tier_used # Tier that succeeded
result.billing.cost_dollars # Final cost in USD
result.screenshot_url # Screenshot URL (if requested)
result.pdf_url # PDF URL (if requested)
result.cached # Whether result was from cache
Environment Variables
| Variable | Description |
|---|---|
ALTERLAB_API_KEY |
Your API key (alternative to passing in constructor) |
Requirements
- Python 3.8+
- httpx >= 0.24.0
Support
- Documentation: https://alterlab.io/docs
- API Status: https://status.alterlab.io
- Support: support@alterlab.io
- Issues: GitHub Issues
License
MIT License - see LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file alterlab-2.0.0.tar.gz.
File metadata
- Download URL: alterlab-2.0.0.tar.gz
- Upload date:
- Size: 15.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.12.3 Linux/5.15.153.1-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b189005eb89596d4614278c06f4cdecaf30bec09f53cecfae947c0de02f5865
|
|
| MD5 |
8b71466fc789415716cb6e2e9d378c5a
|
|
| BLAKE2b-256 |
375cdb2fc88a1a57d2f28932544976683869532da49b2f2895045f7d384e07fd
|
File details
Details for the file alterlab-2.0.0-py3-none-any.whl.
File metadata
- Download URL: alterlab-2.0.0-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.12.3 Linux/5.15.153.1-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
701e72e0d183c4f1f5945bdbdbd1a088ee3658edb5ba72faa62c2250fca94c63
|
|
| MD5 |
aad69d2d707d66b5e52b21ef43135c11
|
|
| BLAKE2b-256 |
3c0e02a12dd46eaaa95b63f28c3e9e45cc163278730c36ce67d3a89313337b85
|