Skip to main content

Screenshot service client with polling, batch orchestration, and CLI for Azure screenshot service.

Project description

Screenshot Client Batch

PyPI version Python 3.11+

Complete screenshot service client and batch orchestration package. Provides both a low-level API client (ScreenshotServiceClient) and high-level coordinators for batch processing of screenshot jobs via the Azure-hosted screenshot service.

Built on top of the auto-generated screenshot-client-core, this package adds:

  • Polling with configurable timeout/interval (using infra_core.polling)
  • Environment-based configuration (from_env())
  • Custom exception hierarchy for better error handling
  • Batch coordination with chunking, parallel execution, and progress tracking
  • CLI tool for processing JSONL files

Installation

pip install screenshot-client-batch

Requirements

  • Python 3.11+
  • Access to a screenshot service endpoint (Azure Function)

Environment Variables

Variable Required Description
SCREENSHOT_FUNC_BASE_URL Yes Base URL of the screenshot service
SCREENSHOT_FUNC_API_KEY No API key for authenticated endpoints
SCREENSHOT_AZURE_STORAGE_CONNECTION_STRING No Azure Storage connection for upload/download

Quick Start: API Client

from screenshot_batch import ScreenshotServiceClient

# Create client from environment variables
client = ScreenshotServiceClient.from_env()  # Uses SCREENSHOT_FUNC_BASE_URL, SCREENSHOT_FUNC_API_KEY

# Start a batch
handle = client.start_batch(
    "my-batch",
    {"jobs": [{"job_id": "1", "url": "https://example.com"}]}
)

# Poll until completion
result = client.poll_run(handle.batch_id, handle.job_id, timeout=600)
print(f"Status: {result['status']}")

# Get results
manifest = client.get_result(handle.batch_id, handle.job_id)

Batch Orchestration

For single batch runs with automatic polling and optional artifact download:

from pathlib import Path

from screenshot import ScreenshotOptions, CaptureOptions
from screenshot_batch import ScreenshotBatchCoordinator, ScreenshotJobSpec

coordinator = ScreenshotBatchCoordinator(
    base_url="https://site-screenshot.azurewebsites.net",
    batch_id="marketing-refresh",
    store_dir=Path("data/site-screens"),
    download_results=True,
)

jobs = [
    ScreenshotJobSpec.from_url(
        "https://example.com",
        options=ScreenshotOptions(
            capture=CaptureOptions(enabled=True, max_pages=3, depth=1),
        ),
    ),
]

result = coordinator.run_batch_sync(jobs)
print(result.status, result.job_completed)

Chunked Runs

For large collections (e.g., 4,000 URLs), use ScreenshotMultiRunCoordinator to fan out multiple API calls while keeping results grouped in per-run folders:

from screenshot_batch import (
    ScreenshotMultiRunCoordinator,
    ScreenshotJobSpec,
)

runner = ScreenshotMultiRunCoordinator(
    base_url="https://site-screenshot.azurewebsites.net",
    batch_id="yc-companies",
    store_dir=Path("data/site-screens"),
    download_results=True,
    upload_results=True,  # Optional: upload to Azure Storage
)

specs = [
    ScreenshotJobSpec.from_url("https://example.com", options=my_options),
    # ... more specs
]

summaries = runner.run_jobs(specs, chunk_size=4, max_parallel_runs=4)
for summary in summaries:
    print(summary.chunk_index, summary.result.status, summary.result.job_failed)

CLI

Install the package and run the CLI to process a JSONL file in chunks:

screenshot-client-batch input.jsonl my-batch-id \
  --base-url https://site-screenshot.azurewebsites.net \
  --chunk-size 4 \
  --max-parallel-runs 32 \
  --per-run-concurrency 4 \
  --store-dir data/screenshots \
  --download-results \
  --upload-results \
  --state-path data/state.json \
  --metadata '{"source":"my-project"}' \
  --output data/results.jsonl

CLI Options

Option Default Description
--base-url $SCREENSHOT_FUNC_BASE_URL Screenshot service URL
--chunk-size 4 Jobs per API call
--max-parallel-runs 32 Concurrent API calls
--per-run-concurrency 4 Concurrency per run
--store-dir None Directory for downloaded artifacts
--download-results False Download artifacts after completion
--upload-results False Upload artifacts to Azure Storage
--state-path None JSON file for resumable state
--metadata None JSON metadata merged into all jobs
--output {input}.results.jsonl Output file path
--sample N None Limit to first N jobs (for testing)

Input Format

The input JSONL file should contain one JSON object per line:

{"url": "https://example.com", "job_id": "site-1", "metadata": {"category": "tech"}}
{"url": "https://another.com", "job_id": "site-2"}

Output Format

The output JSONL mirrors the input with added status fields:

{"url": "https://example.com", "job_id": "site-1", "status": "succeeded", "blob_url": "https://..."}
{"url": "https://another.com", "job_id": "site-2", "status": "failed", "errors": ["timeout"]}

Exception Handling

The package provides a custom exception hierarchy:

from screenshot_batch import (
    ScreenshotServiceError,  # Base exception
    ScreenshotConfigError,   # Configuration/environment errors
    ScreenshotAPIError,      # API communication errors
    ScreenshotTimeoutError,  # Polling timeout
)

try:
    result = client.poll_run(batch_id, job_id, timeout=60)
except ScreenshotTimeoutError:
    print("Batch did not complete in time")
except ScreenshotAPIError as e:
    print(f"API error: {e}")

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Type checking
mypy src/screenshot_batch

# Linting
ruff check src/ tests/
ruff format src/ tests/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

screenshot_client_batch-0.1.0.tar.gz (38.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

screenshot_client_batch-0.1.0-py3-none-any.whl (25.5 kB view details)

Uploaded Python 3

File details

Details for the file screenshot_client_batch-0.1.0.tar.gz.

File metadata

  • Download URL: screenshot_client_batch-0.1.0.tar.gz
  • Upload date:
  • Size: 38.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for screenshot_client_batch-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4e43ed4fb02275b57b9437a92e26d73738a00ced928e5d77eb3d0a56fb544928
MD5 1166a1ecc2804f198e2ee329bbdfa99e
BLAKE2b-256 25018187a4bdaaf49fcdbc9e2736f136ca18ed7c9216d6f7b704a49672794e35

See more details on using hashes here.

Provenance

The following attestation bundles were made for screenshot_client_batch-0.1.0.tar.gz:

Publisher: publish-batch-client.yml on pj-ms/screenshot-service

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file screenshot_client_batch-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for screenshot_client_batch-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 057e070567ef43346a77e4cced2bb226964809dba1e5a7767fcbd7cfeb911cc4
MD5 bdc6ceb0a211b982795d4a5c5babf50c
BLAKE2b-256 71443141f7ba26e097fb8eb886d48973ff03f2e95c8518be5cf51474afc18f0e

See more details on using hashes here.

Provenance

The following attestation bundles were made for screenshot_client_batch-0.1.0-py3-none-any.whl:

Publisher: publish-batch-client.yml on pj-ms/screenshot-service

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page