Skip to main content

Official Python SDK for the AlterLab Web Scraping API - Extract data from any website with intelligent anti-bot bypass

Project description

AlterLab Python SDK

Official Python SDK for the AlterLab Web Scraping API. Extract data from any website with intelligent anti-bot bypass, JavaScript rendering, and structured extraction.

PyPI version Python 3.8+ License: MIT

Features

  • Simple API: 3 lines of code to scrape any website
  • Intelligent Anti-Bot Bypass: Automatic tier escalation (curl → HTTP → stealth → browser)
  • JavaScript Rendering: Full Playwright browser for JS-heavy sites
  • Structured Extraction: JSON Schema, prompts, and pre-built profiles
  • BYOP Support: Bring Your Own Proxy for 20% discount
  • Async Support: Native asyncio for concurrent scraping
  • Type Hints: Full typing support for IDE autocomplete
  • Cost Controls: Set budgets, prefer cost/speed, fail-fast options

Installation

pip install alterlab

Quick Start

from alterlab import AlterLab

# Initialize client
client = AlterLab(api_key="sk_live_...")  # or set ALTERLAB_API_KEY env var

# Scrape a website
result = client.scrape("https://example.com")
print(result.text)          # Extracted text
print(result.json)          # Structured JSON (Schema.org, metadata)
print(result.billing.cost_dollars)  # Cost breakdown

Pricing

Pay-as-you-go pricing with no subscriptions. $1 = 5,000 scrapes (Tier 1).

Tier Name Price Per $1 Use Case
1 Curl $0.0002 5,000 Static HTML sites
2 HTTP $0.0003 3,333 Sites with TLS fingerprinting
3 Stealth $0.0005 2,000 Sites with browser checks
4 Browser $0.001 1,000 JS-heavy SPAs
5 Captcha $0.02 50 Sites with CAPTCHAs

The API automatically escalates through tiers until successful, charging only for the tier used.

Usage Examples

Basic Scraping

from alterlab import AlterLab

client = AlterLab(api_key="sk_live_...")

# Auto mode - intelligent tier escalation
result = client.scrape("https://example.com")

# Force HTML-only (fastest, cheapest)
result = client.scrape_html("https://example.com")

# JavaScript rendering
result = client.scrape_js("https://spa-app.com", screenshot=True)
print(result.screenshot_url)

Structured Extraction

# Extract specific fields with JSON Schema
result = client.scrape(
    "https://store.com/product/123",
    extraction_schema={
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "price": {"type": "number"},
            "in_stock": {"type": "boolean"}
        }
    }
)
print(result.json)  # {"name": "...", "price": 29.99, "in_stock": true}

# Or use a pre-built profile
result = client.scrape(
    "https://store.com/product/123",
    extraction_profile="product"
)

Cost Controls

from alterlab import AlterLab, CostControls

client = AlterLab(api_key="sk_live_...")

# Limit to cheap tiers only
result = client.scrape(
    "https://example.com",
    cost_controls=CostControls(
        max_tier="2",       # Don't go above HTTP tier
        prefer_cost=True,   # Optimize for lowest cost
        fail_fast=True      # Error instead of escalating
    )
)

# Estimate cost before scraping
estimate = client.estimate_cost("https://linkedin.com")
print(f"Estimated: ${estimate.estimated_cost_dollars:.4f}")
print(f"Confidence: {estimate.confidence}")

Advanced Options

from alterlab import AlterLab, AdvancedOptions

client = AlterLab(api_key="sk_live_...")

# Full browser with screenshot and PDF
result = client.scrape(
    "https://example.com",
    mode="js",
    advanced=AdvancedOptions(
        render_js=True,
        screenshot=True,
        generate_pdf=True,
        markdown=True,
        wait_condition="networkidle"
    )
)

print(result.screenshot_url)
print(result.pdf_url)
print(result.markdown_content)

BYOP (Bring Your Own Proxy)

Get 20% discount when using your own proxy:

from alterlab import AlterLab, AdvancedOptions

client = AlterLab(api_key="sk_live_...")

# Use your configured proxy integration
result = client.scrape(
    "https://example.com",
    advanced=AdvancedOptions(
        use_own_proxy=True,
        proxy_country="US"  # Optional: request specific geo
    )
)

# Check if BYOP was applied
if result.billing.byop_applied:
    print(f"Saved {result.billing.byop_discount_percent}%!")

Async Support

import asyncio
from alterlab import AsyncAlterLab

async def main():
    async with AsyncAlterLab(api_key="sk_live_...") as client:
        # Single request
        result = await client.scrape("https://example.com")

        # Concurrent requests
        urls = [
            "https://example.com/page1",
            "https://example.com/page2",
            "https://example.com/page3",
        ]
        results = await asyncio.gather(*[client.scrape(url) for url in urls])

        for r in results:
            print(r.title, r.billing.cost_dollars)

asyncio.run(main())

Caching

# Enable caching (opt-in)
result = client.scrape(
    "https://example.com",
    cache=True,          # Enable caching
    cache_ttl=3600,      # Cache for 1 hour
)

if result.cached:
    print("Cache hit - no credits charged!")

# Force refresh
result = client.scrape(
    "https://example.com",
    cache=True,
    force_refresh=True   # Bypass cache
)

PDF and Image Extraction

# Extract text from PDF
result = client.scrape_pdf(
    "https://example.com/document.pdf",
    format="markdown"
)
print(result.text)

# OCR for images
result = client.scrape_ocr(
    "https://example.com/image.png",
    language="eng"
)
print(result.text)

Error Handling

from alterlab import (
    AlterLab,
    AuthenticationError,
    InsufficientCreditsError,
    RateLimitError,
    ScrapeError,
    TimeoutError
)

client = AlterLab(api_key="sk_live_...")

try:
    result = client.scrape("https://example.com")
except AuthenticationError:
    print("Invalid API key")
except InsufficientCreditsError:
    print("Please top up your balance")
except RateLimitError as e:
    print(f"Rate limited. Retry after {e.retry_after}s")
except ScrapeError as e:
    print(f"Scraping failed: {e.message}")
except TimeoutError:
    print("Request timed out")

Check Usage & Balance

usage = client.get_usage()
print(f"Balance: ${usage.balance_dollars:.2f}")
print(f"Used this month: {usage.credits_used_month} credits")

API Reference

AlterLab Client

AlterLab(
    api_key: str = None,           # API key (or ALTERLAB_API_KEY env var)
    base_url: str = None,          # Custom API URL
    timeout: int = 120,            # Request timeout in seconds
    max_retries: int = 3,          # Retry count for transient failures
    retry_delay: float = 1.0       # Initial retry delay (exponential backoff)
)

scrape() Method

client.scrape(
    url: str,                      # URL to scrape
    mode: str = "auto",            # "auto", "html", "js", "pdf", "ocr"
    sync: bool = True,             # Wait for result vs return job ID
    advanced: AdvancedOptions,     # Advanced scraping options
    cost_controls: CostControls,   # Budget and optimization settings
    cache: bool = False,           # Enable response caching
    cache_ttl: int = None,         # Cache TTL in seconds (60-86400)
    formats: list = None,          # Output formats: ["text", "json", "html", "markdown"]
    extraction_schema: dict,       # JSON Schema for structured extraction
    extraction_prompt: str,        # Natural language extraction instructions
    extraction_profile: str,       # Pre-built profile: "product", "article", etc.
    wait_for: str = None,          # CSS selector to wait for (JS mode)
    screenshot: bool = False,      # Capture screenshot (JS mode)
) -> ScrapeResult

ScrapeResult

result.url                # Scraped URL
result.status_code        # HTTP status
result.text               # Extracted text content
result.html               # HTML content
result.json               # Structured JSON content
result.title              # Page title
result.author             # Author (if detected)
result.billing            # BillingDetails object
result.billing.tier_used  # Tier that succeeded
result.billing.cost_dollars  # Final cost in USD
result.screenshot_url     # Screenshot URL (if requested)
result.pdf_url            # PDF URL (if requested)
result.cached             # Whether result was from cache

Environment Variables

Variable Description
ALTERLAB_API_KEY Your API key (alternative to passing in constructor)

Requirements

  • Python 3.8+
  • httpx >= 0.24.0

Support

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alterlab-2.0.0.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

alterlab-2.0.0-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file alterlab-2.0.0.tar.gz.

File metadata

  • Download URL: alterlab-2.0.0.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.12.3 Linux/5.15.153.1-microsoft-standard-WSL2

File hashes

Hashes for alterlab-2.0.0.tar.gz
Algorithm Hash digest
SHA256 7b189005eb89596d4614278c06f4cdecaf30bec09f53cecfae947c0de02f5865
MD5 8b71466fc789415716cb6e2e9d378c5a
BLAKE2b-256 375cdb2fc88a1a57d2f28932544976683869532da49b2f2895045f7d384e07fd

See more details on using hashes here.

File details

Details for the file alterlab-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: alterlab-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.12.3 Linux/5.15.153.1-microsoft-standard-WSL2

File hashes

Hashes for alterlab-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 701e72e0d183c4f1f5945bdbdbd1a088ee3658edb5ba72faa62c2250fca94c63
MD5 aad69d2d707d66b5e52b21ef43135c11
BLAKE2b-256 3c0e02a12dd46eaaa95b63f28c3e9e45cc163278730c36ce67d3a89313337b85

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page