Python SDK for DAWG platform - managed headless browsers and HTTP scraping via CDP
Project description
dawg-sdk-python
Python SDK for BaaS (Browser as a Service).
Two tools in one SDK:
- Baas — cloud browser via CDP WebSocket (Playwright, Puppeteer, Selenium)
- Scraper — fast HTTP scraping with content extraction (no browser needed)
Installation
pip install dawg-sdk-python
Scraper — HTTP scraping
Extract clean content from web pages without a browser. Fast, cheap, TLS-fingerprinted.
from dawg_sdk import Scraper
with Scraper(api_key="your_key") as s:
# Single page → markdown
result = s.scrape("https://example.com")
print(result.content)
# Crawl a site
job = s.crawl("https://example.com", max_depth=2, max_pages=20)
job.wait()
for page in job.pages:
print(page.url, len(page.content))
# Batch scrape
job = s.batch(["https://a.com", "https://b.com"])
job.wait()
Scraper Methods
scrape(url, format="markdown", main_content=False, include_links=False)→ScrapeResultcrawl(url, max_depth=2, max_pages=50, concurrency=3)→ScrapeJobbatch(urls, concurrency=5)→ScrapeJobget_job(job_id)→ScrapeJobcancel_job(job_id)
Formats: "markdown", "text", "html"
Jobs (crawl/batch) are async — use job.wait() to block until done, or job.refresh() to poll manually.
Browser — CDP access
Get a cloud browser via WebSocket. Use with any automation framework.
from dawg_sdk import Baas
with Baas(api_key="your_key") as ws_url:
browser = playwright.chromium.connect_over_cdp(ws_url)
# ... your code ...
# auto-released
With Proxy
baas = Baas(api_key="your_key")
ws_url = baas.create(proxy="socks5://user:pass@host:port")
Async
from dawg_sdk import AsyncBaas
async with AsyncBaas(api_key="your_key") as ws_url:
browser = await playwright.chromium.connect_over_cdp(ws_url)
Browser Methods
create(proxy=None, geo=None) -> str— returnsws_urlrelease()— release browser back to poolclose()— close HTTP session
Exceptions
from dawg_sdk import BaasError, AuthError, RateLimitError
try:
result = scraper.scrape("https://example.com")
except AuthError:
print("Invalid API key")
except RateLimitError as e:
print(f"Rate limit, retry after {e.retry_after}s")
Examples
Scrape to markdown
from dawg_sdk import Scraper
s = Scraper(api_key="your_key")
result = s.scrape("https://news.ycombinator.com", format="markdown", main_content=True)
print(result.metadata["title"])
print(result.content)
s.close()
Playwright browser
from playwright.sync_api import sync_playwright
from dawg_sdk import Baas
with Baas(api_key="your_key") as ws_url:
with sync_playwright() as p:
browser = p.chromium.connect_over_cdp(ws_url)
page = browser.contexts[0].pages[0]
page.goto("https://example.com")
print(page.title())
browser.close()
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dawg_sdk_python-0.3.0.tar.gz.
File metadata
- Download URL: dawg_sdk_python-0.3.0.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
094188f3e5cbaf461e37b4e00537cb0fa0f3d1524988325dea6afbc7055f8af5
|
|
| MD5 |
79f3342f1db49cba155b1ad1f67372fc
|
|
| BLAKE2b-256 |
8b0ce3b9fc9bf86af86b7ac22fa0295095cca63ad379ec8879ef499fdfd5b306
|
File details
Details for the file dawg_sdk_python-0.3.0-py3-none-any.whl.
File metadata
- Download URL: dawg_sdk_python-0.3.0-py3-none-any.whl
- Upload date:
- Size: 10.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3adc1e5a059e965bc01e330392d3719ad8b7a4367c00a90cd77f556f2ae023ad
|
|
| MD5 |
d7c8525e7df2034d898a61f9b8edac16
|
|
| BLAKE2b-256 |
ae7a09339693d62460048323f62f0c5fd020f7b0c405a8d8fd6a393bc9c5a67f
|