Playwright-shape Python client for the paprika browser fleet
Project description
paprika-client
Playwright-shape async Python client for the paprika browser fleet.
import asyncio
from paprika_client import async_paprika
async def main():
async with async_paprika.connect() as cli:
async with cli.session(initial_url="https://news.ycombinator.com") as page:
await page.locator(".athing .titleline > a").click()
state = await page.state()
print(state["url"], "->", state["title"])
await page.back()
await page.screenshot(path="hn.png")
asyncio.run(main())
What is this?
paprika is a hub-and-spoke browser fleet: a hub orchestrates many workers, each of which runs N pre-warmed Chrome lanes. This client opens a Session against the hub — the hub reserves a Lane on some Worker, attaches CDP, and the client drives the Chrome action-by-action over HTTP.
The action surface mirrors Playwright's so existing browser-automation intuition transfers. You don't get a local Chrome — you get a shared fleet, viewable over noVNC for free.
Install
From the paprika source tree:
pip install -e ./client/python
Or pull just this directory into your own project. The only runtime
dependency is httpx.
API
Connect
cli = async_paprika.connect() # PAPRIKA_HUB env -> localhost:8000
async with cli: # opens an httpx.AsyncClient
...
# Or pass an explicit URL:
cli = async_paprika.connect("http://hub.lan")
Open a Session
A session reserves one Lane (one Chrome process) for the lifetime of
the async with block.
async with cli.session(initial_url="https://example.com") as page:
...
Or manually:
page = await cli.open_session(initial_url="https://example.com")
try:
...
finally:
await page.close()
Actions
| Playwright | paprika |
|---|---|
page.goto(url) |
page.goto(url) |
page.click(selector) |
page.click(selector) |
page.fill(selector, value) |
page.fill(selector, value) |
page.press(key) |
page.press(key) |
page.go_back() |
page.back() (alias go_back) |
page.screenshot(path=...) |
page.screenshot(path=...) |
page.title() |
page.title() |
page.url |
page.url (sync, cached) |
page.locator(sel) |
page.locator(sel) |
page.get_by_role(role) |
page.get_by_role(role) |
page.get_by_text(text) |
page.get_by_text(text) |
paprika-only:
await page.outline() # text view with [@N] ids
await page.visited_urls() # list of canonical URLs the session opened
await page.capture("snap") # persist HTML + PNG + outline server-side
await page.assets() # URLs of images captured on this page (see below)
await page.save_assets("out/") # download those captured images to disk
await page.cookies() # current cookie jar (CDP-shaped), host-filtered
await page.save_cookies_to_host() # promote those cookies into the Host registry
await page.network() # media network log (image/audio/video responses)
await page.close_popups() # close all non-default tabs (after a popup-spawning click)
page.novnc_url # live noVNC URL for this session's lane
Captured assets (page.assets / page.save_assets)
paprika passively records every resource the page loads (images, video,
…) — the same machinery Fetch mode uses. page.assets() flushes the
worker's capture buffer and lists what was captured; it's the one-call
equivalent of wiring up Playwright's page.on("response") yourself.
async with cli.session(
"https://example.com/article",
parent_job_id="my-crawl", # assets need a job dir to land in
) as page:
await page.scroll() # trigger lazy-loaded images
srcs = await page.assets() # -> ["http://hub/jobs/.../img_001.jpg", ...]
rows = await page.assets(details=True) # -> list of dicts w/ size, source_url, mime
await page.save_assets("out/images") # download them to disk
| arg | default | meaning |
|---|---|---|
kind |
"image" |
image / video / audio / other, or None for all |
absolute |
True |
absolute URLs (False -> hub-relative href) |
refresh |
True |
flush newly-captured assets off the worker first |
details |
False |
return full metadata dicts instead of URL strings |
Job-bound session required. Like
get_state/set_state, the passive capture needs a parent job to store assets under. Scripts run by paprika-runner are bound automatically (PAPRIKA_JOB_ID); a rawcli.session(...)must passparent_job_id=..., elsepage.assets()raisesPaprikaActionError.
Multi-tab gotcha.
await sess[-1].close()is NOT how you close a popup tab. Popups spawned by worker-side clicks are not in the SDK's local_pagescache, sosess[-1]resolves tosessitself, andSession.close()is unconditional — it kills the whole session. Useawait sess.close_popups()instead.
Jobs & assets (non-session API)
The session API drives a live browser. The job API is the other half: submit a one-shot fetch / codegen job, poll it, and read its captured assets after the fact. These are methods on the client, not the page:
async with async_paprika.connect() as cli:
# fetch() = submit a fetch-mode job + wait for it to finish
job = await cli.fetch("https://example.com/article", scroll=True)
print(job["status"]) # "completed"
# collect the captured images (assets.json, kind=image)
imgs = await cli.job_images(job["job_id"]) # -> [url, url, ...]
rows = await cli.job_assets(job["job_id"], details=True) # full metadata
await cli.download_job_assets(job["job_id"], "out/images")
| method | endpoint | purpose |
|---|---|---|
cli.create_job(url, **opts) |
POST /jobs |
submit (returns immediately) |
cli.fetch(url, wait=True, **opts) |
POST /jobs (+poll) |
submit fetch + wait |
cli.get_job(id) / cli.list_jobs() |
GET /jobs[/{id}] |
status / listing |
cli.wait_job(id) |
poll GET /jobs/{id} |
block until terminal |
cli.job_result(id) |
GET /jobs/{id}/result |
final JobResult |
cli.cancel_job(id) / cli.delete_job(id) |
POST cancel / DELETE |
lifecycle |
cli.job_assets(id, kind=, details=) |
GET /jobs/{id}/assets.json |
captured assets |
cli.job_images(id) |
(assets, kind="image") |
shorthand |
cli.download_job_assets(id, dir) |
GET /jobs/{id}/assets/* |
save to disk |
**opts flow into JobOptions (mode=, scroll=, scroll_max=,
use_profile=, cookies_from=, goal= for codegen / vision modes, …).
Anything not wrapped here (hosts / profiles / engines / settings / …) is still reachable via
await cli._json("GET", "/hosts")etc. — the same thin HTTP helper every wrapper uses.
Keep the session alive after the script exits
Call page.keepalive() (alias detach) before leaving the async with block. The hub keeps the lane held and the browser open so a
human can take over via noVNC; the session auto-closes after
idle_ttl_s seconds of no operator activity (mouse / key / clipboard
through the noVNC viewer).
async with cli.session(initial_url="https://example.com") as page:
await page.get_by_text("Login").click()
await page.fill("input[name=user]", "alice")
await page.keepalive(idle_ttl_s=120) # default: 120s
# leaving the `with` block here no longer kills the session.
The hub's screenshot grid shows three states:
- RUNNING (red) — a script action or noVNC interaction is in flight
- KEEPALIVE (orange) — alive but nobody is touching it
- IDLE — closed and reaped (lane freed)
Fetch jobs submitted with options.keep_session=true (server-side
crawl + human handoff) use a 60 s default idle TTL; SDK keepalive()
defaults to 120 s and can be set per-call.
Locators
page.locator(selector) returns a Locator. Like Playwright, it's
lazy — the selector is resolved each time you call .click() etc:
btn = page.locator("button.primary")
await btn.click()
get_by_text("…") walks the current page outline to find the first
interactive element with that visible label, then clicks the matching
[data-paprika-id="N"]:
await page.get_by_text("Login").click()
Errors
| Exception | When |
|---|---|
PaprikaError |
HTTP-level error (404, 5xx, network) |
PaprikaActionError |
The hub returned 200 but the action returned NO_MATCH or ERR: ... |
PaprikaActionError.status carries the raw string from browser_ops.
Now implemented (built on page.evaluate)
page.evaluate(js) landed and the DOM surface is built on top of it
(see REFERENCE.ja.md §7): wait_for_selector,
text_content / inner_text / get_attribute / input_value /
count / is_visible / is_checked / …, the JS-dispatched inputs
(hover / dblclick / select_option / check / uncheck / focus),
set_input_files (CDP), cookies() (read), and the locator chain
(first / last / nth / all / count) plus get_by_test_id /
get_by_placeholder / get_by_title / get_by_alt_text.
Still deferred (V1 → V2)
| Feature | Why deferred |
|---|---|
page.wait_for_url, wait_for_load_state |
Navigation-event hooks not wired |
locator.bounding_box(), element screenshot() |
Need a geometry/CDP path |
page.context.add_cookies() |
Use the Host registry + use_profile / cookies_from |
iframe / frame_locator / multiple BrowserContext |
Worker is single-frame, 1 session = 1 context |
Real (trusted) input events / route() interception |
Synthetic events + network() polling are the V1 stand-ins |
Sync API (sync_paprika) |
Async-first; sync wrapper can come later |
⚠️ Note:
page.evaluateruns arbitrary JS in the browser. paprika is LAN-trusted (same model as cookie injection / profile upload), so it's exposed without the RFC-001 §12 auth gate.
See also
- API.ja.md — 全公開関数の API リファレンス(関数ごとの引数・戻り値・例)
- REFERENCE.ja.md — 画像・動画取得の実践リファレンス (単発ページ取得 / Recent jobs からの取得 / 動画 / ログイン必須サイト)
- RFC-001 for the protocol design
- The paprika hub admin UI at
http://hub:8000/for live VNC viewers
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file paprika_client-0.1.0.tar.gz.
File metadata
- Download URL: paprika_client-0.1.0.tar.gz
- Upload date:
- Size: 71.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76eb251c1c2c7e487a2c182b4eee83faa55c79741f7acdf1d32d46368c15ccba
|
|
| MD5 |
36bff5d3e2d7c5faecd3f638a800d0b5
|
|
| BLAKE2b-256 |
5a7dfdbf706ca821526b6acd4caca0d0151e36b58b1b1c1f8d95e916be35f65d
|
File details
Details for the file paprika_client-0.1.0-py3-none-any.whl.
File metadata
- Download URL: paprika_client-0.1.0-py3-none-any.whl
- Upload date:
- Size: 71.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a00daeb0185d73c2d3ef70a5bfb1f7045b69df75d74f9a1c886df45e6ec403f8
|
|
| MD5 |
90941c520914a0a43b57ec265ed73a97
|
|
| BLAKE2b-256 |
80d3ebf93acb06748defd7b44bcdb9152350bbe8146f5d330498732dde5b704a
|