Skip to main content

Python SDK for DAWG platform - managed headless browsers and HTTP scraping via CDP

Project description

dawg-sdk-python

Python SDK for BaaS (Browser as a Service).

Two tools in one SDK:

  • Baas — cloud browser via CDP WebSocket (Playwright, Puppeteer, Selenium)
  • Scraper — fast HTTP scraping with content extraction (no browser needed)

Installation

pip install dawg-sdk-python

Scraper — HTTP scraping

Extract clean content from web pages without a browser. Fast, cheap, TLS-fingerprinted.

from dawg_sdk import Scraper

with Scraper(api_key="your_key") as s:
    # Single page → markdown
    result = s.scrape("https://example.com")
    print(result.content)

    # Crawl a site
    job = s.crawl("https://example.com", max_depth=2, max_pages=20)
    job.wait()
    for page in job.pages:
        print(page.url, len(page.content))

    # Batch scrape
    job = s.batch(["https://a.com", "https://b.com"])
    job.wait()

Scraper Methods

  • scrape(url, format="markdown", main_content=False, include_links=False)ScrapeResult
  • crawl(url, max_depth=2, max_pages=50, concurrency=3)ScrapeJob
  • batch(urls, concurrency=5)ScrapeJob
  • get_job(job_id)ScrapeJob
  • cancel_job(job_id)

Formats: "markdown", "text", "html"

Jobs (crawl/batch) are async — use job.wait() to block until done, or job.refresh() to poll manually.

Browser — CDP access

Get a cloud browser via WebSocket. Use with any automation framework.

from dawg_sdk import Baas

with Baas(api_key="your_key") as ws_url:
    browser = playwright.chromium.connect_over_cdp(ws_url)
    # ... your code ...
# auto-released

With Proxy

baas = Baas(api_key="your_key")
ws_url = baas.create(proxy="socks5://user:pass@host:port")

Async

from dawg_sdk import AsyncBaas

async with AsyncBaas(api_key="your_key") as ws_url:
    browser = await playwright.chromium.connect_over_cdp(ws_url)

Browser Methods

  • create(proxy=None, geo=None) -> str — returns ws_url
  • release() — release browser back to pool
  • close() — close HTTP session

Exceptions

from dawg_sdk import BaasError, AuthError, RateLimitError

try:
    result = scraper.scrape("https://example.com")
except AuthError:
    print("Invalid API key")
except RateLimitError as e:
    print(f"Rate limit, retry after {e.retry_after}s")

Examples

Scrape to markdown

from dawg_sdk import Scraper

s = Scraper(api_key="your_key")
result = s.scrape("https://news.ycombinator.com", format="markdown", main_content=True)
print(result.metadata["title"])
print(result.content)
s.close()

Playwright browser

from playwright.sync_api import sync_playwright
from dawg_sdk import Baas

with Baas(api_key="your_key") as ws_url:
    with sync_playwright() as p:
        browser = p.chromium.connect_over_cdp(ws_url)
        page = browser.contexts[0].pages[0]
        page.goto("https://example.com")
        print(page.title())
        browser.close()

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dawg_sdk_python-0.3.0.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dawg_sdk_python-0.3.0-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file dawg_sdk_python-0.3.0.tar.gz.

File metadata

  • Download URL: dawg_sdk_python-0.3.0.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for dawg_sdk_python-0.3.0.tar.gz
Algorithm Hash digest
SHA256 094188f3e5cbaf461e37b4e00537cb0fa0f3d1524988325dea6afbc7055f8af5
MD5 79f3342f1db49cba155b1ad1f67372fc
BLAKE2b-256 8b0ce3b9fc9bf86af86b7ac22fa0295095cca63ad379ec8879ef499fdfd5b306

See more details on using hashes here.

File details

Details for the file dawg_sdk_python-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dawg_sdk_python-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3adc1e5a059e965bc01e330392d3719ad8b7a4367c00a90cd77f556f2ae023ad
MD5 d7c8525e7df2034d898a61f9b8edac16
BLAKE2b-256 ae7a09339693d62460048323f62f0c5fd020f7b0c405a8d8fd6a393bc9c5a67f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page