Skip to main content

Python SDK for BaaS (Browser as a Service) - managed headless browsers and HTTP scraping via CDP

Project description

dawg-baas

Python SDK for BaaS (Browser as a Service).

Two tools in one SDK:

  • Baas — cloud browser via CDP WebSocket (Playwright, Puppeteer, Selenium)
  • Scraper — fast HTTP scraping with content extraction (no browser needed)

Installation

pip install dawg-baas

Scraper — HTTP scraping

Extract clean content from web pages without a browser. Fast, cheap, TLS-fingerprinted.

from dawg_baas import Scraper

with Scraper(api_key="your_key") as s:
    # Single page → markdown
    result = s.scrape("https://example.com")
    print(result.content)

    # Crawl a site
    job = s.crawl("https://example.com", max_depth=2, max_pages=20)
    job.wait()
    for page in job.pages:
        print(page.url, len(page.content))

    # Batch scrape
    job = s.batch(["https://a.com", "https://b.com"])
    job.wait()

Scraper Methods

  • scrape(url, format="markdown", main_content=False, include_links=False)ScrapeResult
  • crawl(url, max_depth=2, max_pages=50, concurrency=3)ScrapeJob
  • batch(urls, concurrency=5)ScrapeJob
  • get_job(job_id)ScrapeJob
  • cancel_job(job_id)

Formats: "markdown", "text", "html"

Jobs (crawl/batch) are async — use job.wait() to block until done, or job.refresh() to poll manually.

Browser — CDP access

Get a cloud browser via WebSocket. Use with any automation framework.

from dawg_baas import Baas

with Baas(api_key="your_key") as ws_url:
    browser = playwright.chromium.connect_over_cdp(ws_url)
    # ... your code ...
# auto-released

With Proxy

baas = Baas(api_key="your_key")
ws_url = baas.create(proxy="socks5://user:pass@host:port")

Async

from dawg_baas import AsyncBaas

async with AsyncBaas(api_key="your_key") as ws_url:
    browser = await playwright.chromium.connect_over_cdp(ws_url)

Browser Methods

  • create(proxy=None, geo=None) -> str — returns ws_url
  • release() — release browser back to pool
  • close() — close HTTP session

Exceptions

from dawg_baas import BaasError, AuthError, RateLimitError

try:
    result = scraper.scrape("https://example.com")
except AuthError:
    print("Invalid API key")
except RateLimitError as e:
    print(f"Rate limit, retry after {e.retry_after}s")

Examples

Scrape to markdown

from dawg_baas import Scraper

s = Scraper(api_key="your_key")
result = s.scrape("https://news.ycombinator.com", format="markdown", main_content=True)
print(result.metadata["title"])
print(result.content)
s.close()

Playwright browser

from playwright.sync_api import sync_playwright
from dawg_baas import Baas

with Baas(api_key="your_key") as ws_url:
    with sync_playwright() as p:
        browser = p.chromium.connect_over_cdp(ws_url)
        page = browser.contexts[0].pages[0]
        page.goto("https://example.com")
        print(page.title())
        browser.close()

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dawg_baas-0.2.1-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file dawg_baas-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: dawg_baas-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for dawg_baas-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d5d8c5852937f111542b8dfefbbaa7f809303b6537f312adbfb57eba18c8035f
MD5 8a76430943ccf906a9fcf723e60a66a5
BLAKE2b-256 81973d51d0e37771783b00fcda9eb549ca241d0252f48a907edadf9d0e8a76ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page