Python SDK for BaaS (Browser as a Service) - managed headless browsers and HTTP scraping via CDP
Project description
dawg-baas
Python SDK for BaaS (Browser as a Service).
Two tools in one SDK:
- Baas — cloud browser via CDP WebSocket (Playwright, Puppeteer, Selenium)
- Scraper — fast HTTP scraping with content extraction (no browser needed)
Installation
pip install dawg-baas
Scraper — HTTP scraping
Extract clean content from web pages without a browser. Fast, cheap, TLS-fingerprinted.
from dawg_baas import Scraper
with Scraper(api_key="your_key") as s:
# Single page → markdown
result = s.scrape("https://example.com")
print(result.content)
# Crawl a site
job = s.crawl("https://example.com", max_depth=2, max_pages=20)
job.wait()
for page in job.pages:
print(page.url, len(page.content))
# Batch scrape
job = s.batch(["https://a.com", "https://b.com"])
job.wait()
Scraper Methods
scrape(url, format="markdown", main_content=False, include_links=False)→ScrapeResultcrawl(url, max_depth=2, max_pages=50, concurrency=3)→ScrapeJobbatch(urls, concurrency=5)→ScrapeJobget_job(job_id)→ScrapeJobcancel_job(job_id)
Formats: "markdown", "text", "html"
Jobs (crawl/batch) are async — use job.wait() to block until done, or job.refresh() to poll manually.
Browser — CDP access
Get a cloud browser via WebSocket. Use with any automation framework.
from dawg_baas import Baas
with Baas(api_key="your_key") as ws_url:
browser = playwright.chromium.connect_over_cdp(ws_url)
# ... your code ...
# auto-released
With Proxy
baas = Baas(api_key="your_key")
ws_url = baas.create(proxy="socks5://user:pass@host:port")
Async
from dawg_baas import AsyncBaas
async with AsyncBaas(api_key="your_key") as ws_url:
browser = await playwright.chromium.connect_over_cdp(ws_url)
Browser Methods
create(proxy=None, geo=None) -> str— returnsws_urlrelease()— release browser back to poolclose()— close HTTP session
Exceptions
from dawg_baas import BaasError, AuthError, RateLimitError
try:
result = scraper.scrape("https://example.com")
except AuthError:
print("Invalid API key")
except RateLimitError as e:
print(f"Rate limit, retry after {e.retry_after}s")
Examples
Scrape to markdown
from dawg_baas import Scraper
s = Scraper(api_key="your_key")
result = s.scrape("https://news.ycombinator.com", format="markdown", main_content=True)
print(result.metadata["title"])
print(result.content)
s.close()
Playwright browser
from playwright.sync_api import sync_playwright
from dawg_baas import Baas
with Baas(api_key="your_key") as ws_url:
with sync_playwright() as p:
browser = p.chromium.connect_over_cdp(ws_url)
page = browser.contexts[0].pages[0]
page.goto("https://example.com")
print(page.title())
browser.close()
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dawg_baas-0.2.1-py3-none-any.whl.
File metadata
- Download URL: dawg_baas-0.2.1-py3-none-any.whl
- Upload date:
- Size: 10.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5d8c5852937f111542b8dfefbbaa7f809303b6537f312adbfb57eba18c8035f
|
|
| MD5 |
8a76430943ccf906a9fcf723e60a66a5
|
|
| BLAKE2b-256 |
81973d51d0e37771783b00fcda9eb549ca241d0252f48a907edadf9d0e8a76ac
|