Skip to main content

Python SDK for the Lobstr.io API

Project description

Lobstr.io
lobstrio-sdk
Python SDK for the Lobstr.io API — web scraping automation platform

PyPI Python Tests License Last commit Issues Stars Forks Downloads Ruff mypy


  • Sync + async clients with the same API surface
  • Typed dataclass models for all responses
  • Lazy auto-pagination
  • Automatic token resolution from CLI config or environment

Installation

pip install lobstrio-sdk

Requires Python 3.10+. The only runtime dependency is httpx.

Authentication

The client resolves your API token in this order:

  1. ExplicitLobstrClient(token="your-token")
  2. Environment variableLOBSTR_TOKEN
  3. CLI config file~/.config/lobstr/config.toml (same file used by lobstr CLI)

If you already have the CLI set up, the SDK works with no configuration:

from lobstrio import LobstrClient

client = LobstrClient()  # token auto-resolved
user = client.me()
print(user.email)

Quick Start

from lobstrio import LobstrClient

with LobstrClient() as client:
    # Account info
    user = client.me()
    balance = client.balance()
    print(f"{user.email}{balance.credits} credits")

    # List crawlers
    for crawler in client.crawlers.list():
        print(f"{crawler.name} ({crawler.id})")

    # Create a squid, add tasks, run it
    squid = client.squids.create("google-maps-scraper", name="My Scrape")
    client.squids.update(squid.id, params={"language": "English (United States)"})
    client.tasks.add(squid=squid.id, tasks=[{"url": "https://maps.google.com/..."}])
    run = client.runs.start(squid=squid.id)

    # Wait for completion with progress callback
    final = client.runs.wait(run.id, callback=lambda s: print(f"{s.percent_done}%"))

    # Download results
    client.runs.download(run.id, "results.csv")

Resources

All API operations are organized under resource namespaces on the client.

User
user = client.me()           # User profile
balance = client.balance()   # Account balance (credits, subscription)
Crawlers — browse scraper templates
crawlers = client.crawlers.list()              # All crawlers
crawler = client.crawlers.get("crawler-id")    # Single crawler
params = client.crawlers.params("crawler-id")  # Parameter schema
attrs = client.crawlers.attributes("crawler-id")  # Result columns

Models: Crawler, CrawlerAttribute, CrawlerParams

Squids — manage scraper instances
# List & iterate
squids = client.squids.list(limit=50, page=1)
for squid in client.squids.iter():     # auto-paginate all squids
    print(squid.name)

# CRUD
squid = client.squids.create("crawler-id", name="My Project")
squid = client.squids.get("squid-id")
squid = client.squids.update("squid-id", name="Renamed", concurrency=2,
                              params={"language": "English"})
client.squids.empty("squid-id")        # remove all tasks
client.squids.delete("squid-id")

Model: Squid (id, name, crawler, is_active, concurrency, params, created_at, ...)

Tasks — manage input URLs and keywords
# List & iterate
tasks = client.tasks.list(squid="squid-id")
for task in client.tasks.iter(squid="squid-id"):
    print(task.id)

# Add tasks
result = client.tasks.add(
    squid="squid-id",
    tasks=[
        {"url": "https://maps.google.com/maps?cid=123"},
        {"url": "https://maps.google.com/maps?cid=456"},
    ],
)
print(f"Added {len(result.tasks)}, {result.duplicated_count} duplicates")

# Upload from CSV/TSV
resp = client.tasks.upload(squid="squid-id", file="tasks.csv")
status = client.tasks.upload_status(resp["id"])

# Get & delete
task = client.tasks.get("task-hash")
client.tasks.delete("task-hash")

Models: Task, TaskStatus, AddTasksResult, UploadStatus, UploadMeta

Runs — start, monitor, and download
# Start a run
run = client.runs.start(squid="squid-id")

# List runs
runs = client.runs.list(squid="squid-id")
for run in client.runs.iter(squid="squid-id"):
    print(run.id, run.status)

# Monitor
run = client.runs.get("run-id")
stats = client.runs.stats("run-id")
print(f"{stats.percent_done}% done, {stats.total_results} results")

# Wait for completion (blocking, with optional progress callback)
final = client.runs.wait("run-id", poll_interval=5.0,
                          callback=lambda s: print(f"{s.percent_done}%"))

# Download results
url = client.runs.download_url("run-id")   # signed S3 URL
client.runs.download("run-id", "output.csv")  # download to file

# Abort
client.runs.abort("run-id")

# Tasks within a run
tasks = client.runs.tasks("run-id")

Models: Run, RunStats

Results — fetch scraped data
results = client.results.list(squid="squid-id", page_size=100)

# Auto-paginate all results
for row in client.results.iter(squid="squid-id"):
    print(row)  # dict

Results are returned as plain dict objects (the schema depends on the crawler).

Accounts — manage connected platform accounts
accounts = client.accounts.list()
account = client.accounts.get("account-id")
types = client.accounts.types()     # available account types

# Sync account with cookies
resp = client.accounts.sync(type="google", cookies={"SID": "...", "HSID": "..."})
status = client.accounts.sync_status(resp["id"])

# Update limits
client.accounts.update("account-id", type="google", params={"daily_limit": 100})

# Delete
client.accounts.delete("account-id")

Models: Account, AccountType, SyncStatus

Delivery — configure result delivery
# Email
client.delivery.email("squid-id", email="you@example.com")
client.delivery.test_email(email="you@example.com")

# Google Sheets
client.delivery.google_sheet("squid-id", url="https://docs.google.com/spreadsheets/d/...", append=True)
client.delivery.test_google_sheet(url="https://docs.google.com/spreadsheets/d/...")

# Webhook
client.delivery.webhook("squid-id", url="https://your-server.com/hook",
                         on_done=True, on_error=True)
client.delivery.test_webhook(url="https://your-server.com/hook")

# S3
client.delivery.s3("squid-id", bucket="my-bucket", target_path="scrapes/",
                    aws_access_key="...", aws_secret_key="...")
client.delivery.test_s3(bucket="my-bucket")

# SFTP
client.delivery.sftp("squid-id", host="ftp.example.com", username="user",
                      password="pass", directory="/uploads")
client.delivery.test_sftp(host="ftp.example.com", username="user",
                           password="pass", directory="/uploads")

Models: EmailDelivery, GoogleSheetDelivery, S3Delivery, WebhookDelivery, SFTPDelivery

Async Client

The async client mirrors the sync API exactly, using async/await:

from lobstrio import AsyncLobstrClient

async def main():
    async with AsyncLobstrClient() as client:
        user = await client.me()
        print(user.email)

        crawlers = await client.crawlers.list()
        for c in crawlers:
            print(c.name)

        squid = await client.squids.create("crawler-id", name="Async Scrape")
        await client.tasks.add(squid=squid.id, tasks=[{"url": "..."}])
        run = await client.runs.start(squid=squid.id)
        final = await client.runs.wait(run.id)
        await client.runs.download(run.id, "results.csv")

All resource methods (client.crawlers.*, client.squids.*, etc.) work identically — just add await.

Pagination

Resources that return lists support two patterns:

Single page (.list()) — returns one page of results:

page1 = client.squids.list(limit=10, page=1)
page2 = client.squids.list(limit=10, page=2)

Auto-pagination (.iter()) — lazy iterator that fetches pages on demand:

for squid in client.squids.iter(limit=50):
    print(squid.name)  # automatically fetches next pages

The async client provides AsyncPageIterator for use with async for.

Error Handling

All API errors raise typed exceptions with status_code, message, and body:

from lobstrio import LobstrClient, AuthError, NotFoundError, RateLimitError, APIError

try:
    client.squids.get("nonexistent")
except NotFoundError as e:
    print(f"Not found: {e.message}")
except AuthError:
    print("Invalid or expired token")
except RateLimitError as e:
    print(f"Rate limited, retry after {e.retry_after}s")
except APIError as e:
    print(f"API error [{e.status_code}]: {e.message}")
Exception HTTP Status When
AuthError 401 Invalid or missing token
NotFoundError 404 Resource doesn't exist
RateLimitError 429 Too many requests (has retry_after)
APIError 4xx/5xx All other API errors

CLI vs SDK

CLI (pip install lobstrio) SDK (pip install lobstrio-sdk)
Use case Terminal workflows, quick scrapes, cron jobs Scripts, pipelines, applications
Interface Shell commands Python API
Output Rich tables, progress bars, CSV files Typed dataclass models
Async No Yes (AsyncLobstrClient)
Pagination Manual (--page, --limit) Auto (client.squids.iter())

For terminal workflows, see lobstrio — the companion CLI tool.

FAQ

Where do I get an API token?

Go to Dashboard → API to find your token. It's always available there, pre-generated.

Do I need the CLI installed for the SDK to work?

No. The SDK is standalone. However, if you have the CLI configured (lobstr config set-token), the SDK will automatically pick up the token from ~/.config/lobstr/config.toml — no code changes needed.

How do I handle rate limiting?

Catch RateLimitError and use its retry_after attribute:

from lobstrio import RateLimitError
import time

try:
    results = client.results.list(squid="squid-id")
except RateLimitError as e:
    time.sleep(float(e.retry_after or 5))
    results = client.results.list(squid="squid-id")
Can I use the async client with Django/FastAPI?

Yes. Use AsyncLobstrClient in any async context:

from lobstrio import AsyncLobstrClient

async def scrape_view(request):
    async with AsyncLobstrClient() as client:
        results = await client.results.list(squid="squid-id")
        return results

Development

# Clone and install
git clone https://github.com/lobstrio/lobstrio-sdk.git
cd lobstrio-sdk
pip install -e ".[dev]"

# Run unit tests
pytest

# Run live tests (requires API token)
pytest tests/test_live.py -v

# Lint & type check
ruff check src/ tests/
mypy src/lobstrio/

Contributing

Contributions are welcome! See CONTRIBUTING.md for development setup, code style, and versioning guidelines.

Changelog

See CHANGELOG.md for release history.

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lobstrio_sdk-0.2.1.tar.gz (36.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lobstrio_sdk-0.2.1-py3-none-any.whl (26.5 kB view details)

Uploaded Python 3

File details

Details for the file lobstrio_sdk-0.2.1.tar.gz.

File metadata

  • Download URL: lobstrio_sdk-0.2.1.tar.gz
  • Upload date:
  • Size: 36.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lobstrio_sdk-0.2.1.tar.gz
Algorithm Hash digest
SHA256 f83715e4c566cd3f8012088d9d7d87015b1f61f2c9a53e4f1cf552e172cfe29c
MD5 efb48a6e33978975bf7dc64e791346c6
BLAKE2b-256 a28061b9ea1a9934d0cb451e316c134f04cc568364bc037198a5e77d464799b2

See more details on using hashes here.

File details

Details for the file lobstrio_sdk-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: lobstrio_sdk-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 26.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lobstrio_sdk-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 24df072bd81fe0ee9f76f0df90b59ff8a09a605854f22e626fe1aeb719c8ce17
MD5 95c67a6dc7f1d8468e51ddf30598c997
BLAKE2b-256 f65024ed3872739c9cdc419a34e8ed09465d9c59773af392c0d394dd38bc5488

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page