Screenshot service client with polling, batch orchestration, and CLI for Azure screenshot service.
Project description
Screenshot Client Batch
Complete screenshot service client and batch orchestration package. Provides both a
low-level API client (ScreenshotServiceClient) and high-level coordinators for
batch processing of screenshot jobs via the Azure-hosted screenshot service.
Built on top of the auto-generated screenshot-client-core, this package adds:
- Polling with configurable timeout/interval (using
infra_core.polling) - Environment-based configuration (
from_env()) - Custom exception hierarchy for better error handling
- Batch coordination with chunking, parallel execution, and progress tracking
- CLI tool for processing JSONL files
Installation
pip install screenshot-client-batch
Requirements
- Python 3.11+
- Access to a screenshot service endpoint (Azure Function)
Environment Variables
| Variable | Required | Description |
|---|---|---|
SCREENSHOT_FUNC_BASE_URL |
Yes | Base URL of the screenshot service |
SCREENSHOT_FUNC_API_KEY |
No | API key for authenticated endpoints |
SCREENSHOT_AZURE_STORAGE_CONNECTION_STRING |
No | Azure Storage connection for upload/download |
Quick Start: API Client
from screenshot_batch import ScreenshotServiceClient
# Create client from environment variables
client = ScreenshotServiceClient.from_env() # Uses SCREENSHOT_FUNC_BASE_URL, SCREENSHOT_FUNC_API_KEY
# Start a batch
handle = client.start_batch(
"my-batch",
{"jobs": [{"job_id": "1", "url": "https://example.com"}]}
)
# Poll until completion
result = client.poll_run(handle.batch_id, handle.job_id, timeout=600)
print(f"Status: {result['status']}")
# Get results
manifest = client.get_result(handle.batch_id, handle.job_id)
Batch Orchestration
For single batch runs with automatic polling and optional artifact download:
from pathlib import Path
from screenshot import ScreenshotOptions, CaptureOptions
from screenshot_batch import ScreenshotBatchCoordinator, ScreenshotJobSpec
coordinator = ScreenshotBatchCoordinator(
base_url="https://site-screenshot.azurewebsites.net",
batch_id="marketing-refresh",
store_dir=Path("data/site-screens"),
download_results=True,
)
jobs = [
ScreenshotJobSpec.from_url(
"https://example.com",
options=ScreenshotOptions(
capture=CaptureOptions(enabled=True, max_pages=3, depth=1),
),
),
]
result = coordinator.run_batch_sync(jobs)
print(result.status, result.job_completed)
Chunked Runs
For large collections (e.g., 4,000 URLs), use ScreenshotMultiRunCoordinator to fan out
multiple API calls while keeping results grouped in per-run folders:
from screenshot_batch import (
ScreenshotMultiRunCoordinator,
ScreenshotJobSpec,
)
runner = ScreenshotMultiRunCoordinator(
base_url="https://site-screenshot.azurewebsites.net",
batch_id="yc-companies",
store_dir=Path("data/site-screens"),
download_results=True,
upload_results=True, # Optional: upload to Azure Storage
)
specs = [
ScreenshotJobSpec.from_url("https://example.com", options=my_options),
# ... more specs
]
summaries = runner.run_jobs(specs, chunk_size=4, max_parallel_runs=4)
for summary in summaries:
print(summary.chunk_index, summary.result.status, summary.result.job_failed)
CLI
Install the package and run the CLI to process a JSONL file in chunks:
screenshot-client-batch input.jsonl my-batch-id \
--base-url https://site-screenshot.azurewebsites.net \
--chunk-size 4 \
--max-parallel-runs 32 \
--per-run-concurrency 4 \
--store-dir data/screenshots \
--download-results \
--upload-results \
--state-path data/state.json \
--metadata '{"source":"my-project"}' \
--output data/results.jsonl
CLI Options
| Option | Default | Description |
|---|---|---|
--base-url |
$SCREENSHOT_FUNC_BASE_URL |
Screenshot service URL |
--chunk-size |
4 | Jobs per API call |
--max-parallel-runs |
32 | Concurrent API calls |
--per-run-concurrency |
4 | Concurrency per run |
--store-dir |
None | Directory for downloaded artifacts |
--download-results |
False | Download artifacts after completion |
--upload-results |
False | Upload artifacts to Azure Storage |
--state-path |
None | JSON file for resumable state |
--metadata |
None | JSON metadata merged into all jobs |
--output |
{input}.results.jsonl |
Output file path |
--sample N |
None | Limit to first N jobs (for testing) |
Input Format
The input JSONL file should contain one JSON object per line:
{"url": "https://example.com", "job_id": "site-1", "metadata": {"category": "tech"}}
{"url": "https://another.com", "job_id": "site-2"}
Output Format
The output JSONL mirrors the input with added status fields:
{"url": "https://example.com", "job_id": "site-1", "status": "succeeded", "blob_url": "https://..."}
{"url": "https://another.com", "job_id": "site-2", "status": "failed", "errors": ["timeout"]}
Exception Handling
The package provides a custom exception hierarchy:
from screenshot_batch import (
ScreenshotServiceError, # Base exception
ScreenshotConfigError, # Configuration/environment errors
ScreenshotAPIError, # API communication errors
ScreenshotTimeoutError, # Polling timeout
)
try:
result = client.poll_run(batch_id, job_id, timeout=60)
except ScreenshotTimeoutError:
print("Batch did not complete in time")
except ScreenshotAPIError as e:
print(f"API error: {e}")
Development
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Type checking
mypy src/screenshot_batch
# Linting
ruff check src/ tests/
ruff format src/ tests/
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file screenshot_client_batch-0.1.0.tar.gz.
File metadata
- Download URL: screenshot_client_batch-0.1.0.tar.gz
- Upload date:
- Size: 38.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e43ed4fb02275b57b9437a92e26d73738a00ced928e5d77eb3d0a56fb544928
|
|
| MD5 |
1166a1ecc2804f198e2ee329bbdfa99e
|
|
| BLAKE2b-256 |
25018187a4bdaaf49fcdbc9e2736f136ca18ed7c9216d6f7b704a49672794e35
|
Provenance
The following attestation bundles were made for screenshot_client_batch-0.1.0.tar.gz:
Publisher:
publish-batch-client.yml on pj-ms/screenshot-service
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
screenshot_client_batch-0.1.0.tar.gz -
Subject digest:
4e43ed4fb02275b57b9437a92e26d73738a00ced928e5d77eb3d0a56fb544928 - Sigstore transparency entry: 720369709
- Sigstore integration time:
-
Permalink:
pj-ms/screenshot-service@80023e36ab9c706aab8fdc31b93403d9a2002dc1 -
Branch / Tag:
refs/tags/screenshot-client-batch-v0.1.0 - Owner: https://github.com/pj-ms
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-batch-client.yml@80023e36ab9c706aab8fdc31b93403d9a2002dc1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file screenshot_client_batch-0.1.0-py3-none-any.whl.
File metadata
- Download URL: screenshot_client_batch-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
057e070567ef43346a77e4cced2bb226964809dba1e5a7767fcbd7cfeb911cc4
|
|
| MD5 |
bdc6ceb0a211b982795d4a5c5babf50c
|
|
| BLAKE2b-256 |
71443141f7ba26e097fb8eb886d48973ff03f2e95c8518be5cf51474afc18f0e
|
Provenance
The following attestation bundles were made for screenshot_client_batch-0.1.0-py3-none-any.whl:
Publisher:
publish-batch-client.yml on pj-ms/screenshot-service
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
screenshot_client_batch-0.1.0-py3-none-any.whl -
Subject digest:
057e070567ef43346a77e4cced2bb226964809dba1e5a7767fcbd7cfeb911cc4 - Sigstore transparency entry: 720369748
- Sigstore integration time:
-
Permalink:
pj-ms/screenshot-service@80023e36ab9c706aab8fdc31b93403d9a2002dc1 -
Branch / Tag:
refs/tags/screenshot-client-batch-v0.1.0 - Owner: https://github.com/pj-ms
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-batch-client.yml@80023e36ab9c706aab8fdc31b93403d9a2002dc1 -
Trigger Event:
push
-
Statement type: