Python SDK for the Lobstr.io API
Project description
lobstrio-sdk
Python SDK for the Lobstr.io API — web scraping automation platform
- Sync + async clients with the same API surface
- Typed dataclass models for all responses
- Lazy auto-pagination
- Automatic token resolution from CLI config or environment
Installation
pip install lobstrio-sdk
Requires Python 3.10+. The only runtime dependency is httpx.
Authentication
The client resolves your API token in this order:
- Explicit —
LobstrClient(token="your-token") - Environment variable —
LOBSTR_TOKEN - CLI config file —
~/.config/lobstr/config.toml(same file used bylobstrCLI)
If you already have the CLI set up, the SDK works with no configuration:
from lobstrio import LobstrClient
client = LobstrClient() # token auto-resolved
user = client.me()
print(user.email)
Quick Start
from lobstrio import LobstrClient
with LobstrClient() as client:
# Account info
user = client.me()
balance = client.balance()
print(f"{user.email} — {balance.credits} credits")
# List crawlers
for crawler in client.crawlers.list():
print(f"{crawler.name} ({crawler.id})")
# Create a squid, add tasks, run it
squid = client.squids.create("google-maps-scraper", name="My Scrape")
client.squids.update(squid.id, params={"language": "English (United States)"})
client.tasks.add(squid=squid.id, tasks=[{"url": "https://maps.google.com/..."}])
run = client.runs.start(squid=squid.id)
# Wait for completion with progress callback
final = client.runs.wait(run.id, callback=lambda s: print(f"{s.percent_done}%"))
# Download results
client.runs.download(run.id, "results.csv")
Resources
All API operations are organized under resource namespaces on the client.
User
user = client.me() # User profile
balance = client.balance() # Account balance (credits, subscription)
Crawlers — browse scraper templates
crawlers = client.crawlers.list() # All crawlers
crawler = client.crawlers.get("crawler-id") # Single crawler
params = client.crawlers.params("crawler-id") # Parameter schema
attrs = client.crawlers.attributes("crawler-id") # Result columns
Models: Crawler, CrawlerAttribute, CrawlerParams
Squids — manage scraper instances
# List & iterate
squids = client.squids.list(limit=50, page=1)
for squid in client.squids.iter(): # auto-paginate all squids
print(squid.name)
# CRUD
squid = client.squids.create("crawler-id", name="My Project")
squid = client.squids.get("squid-id")
squid = client.squids.update("squid-id", name="Renamed", concurrency=2,
params={"language": "English"})
client.squids.empty("squid-id") # remove all tasks
client.squids.delete("squid-id")
Model: Squid (id, name, crawler, is_active, concurrency, params, created_at, ...)
Tasks — manage input URLs and keywords
# List & iterate
tasks = client.tasks.list(squid="squid-id")
for task in client.tasks.iter(squid="squid-id"):
print(task.id)
# Add tasks
result = client.tasks.add(
squid="squid-id",
tasks=[
{"url": "https://maps.google.com/maps?cid=123"},
{"url": "https://maps.google.com/maps?cid=456"},
],
)
print(f"Added {len(result.tasks)}, {result.duplicated_count} duplicates")
# Upload from CSV/TSV
resp = client.tasks.upload(squid="squid-id", file="tasks.csv")
status = client.tasks.upload_status(resp["id"])
# Get & delete
task = client.tasks.get("task-hash")
client.tasks.delete("task-hash")
Models: Task, TaskStatus, AddTasksResult, UploadStatus, UploadMeta
Runs — start, monitor, and download
# Start a run
run = client.runs.start(squid="squid-id")
# List runs
runs = client.runs.list(squid="squid-id")
for run in client.runs.iter(squid="squid-id"):
print(run.id, run.status)
# Monitor
run = client.runs.get("run-id")
stats = client.runs.stats("run-id")
print(f"{stats.percent_done}% done, {stats.total_results} results")
# Wait for completion (blocking, with optional progress callback)
final = client.runs.wait("run-id", poll_interval=5.0,
callback=lambda s: print(f"{s.percent_done}%"))
# Download results
url = client.runs.download_url("run-id") # signed S3 URL
client.runs.download("run-id", "output.csv") # download to file
# Abort
client.runs.abort("run-id")
# Tasks within a run
tasks = client.runs.tasks("run-id")
Models: Run, RunStats
Results — fetch scraped data
results = client.results.list(squid="squid-id", page_size=100)
# Auto-paginate all results
for row in client.results.iter(squid="squid-id"):
print(row) # dict
Results are returned as plain dict objects (the schema depends on the crawler).
Accounts — manage connected platform accounts
accounts = client.accounts.list()
account = client.accounts.get("account-id")
types = client.accounts.types() # available account types
# Sync account with cookies
resp = client.accounts.sync(type="google", cookies={"SID": "...", "HSID": "..."})
status = client.accounts.sync_status(resp["id"])
# Update limits
client.accounts.update("account-id", type="google", params={"daily_limit": 100})
# Delete
client.accounts.delete("account-id")
Models: Account, AccountType, SyncStatus
Delivery — configure result delivery
# Email
client.delivery.email("squid-id", email="you@example.com")
client.delivery.test_email(email="you@example.com")
# Google Sheets
client.delivery.google_sheet("squid-id", url="https://docs.google.com/spreadsheets/d/...", append=True)
client.delivery.test_google_sheet(url="https://docs.google.com/spreadsheets/d/...")
# Webhook
client.delivery.webhook("squid-id", url="https://your-server.com/hook",
on_done=True, on_error=True)
client.delivery.test_webhook(url="https://your-server.com/hook")
# S3
client.delivery.s3("squid-id", bucket="my-bucket", target_path="scrapes/",
aws_access_key="...", aws_secret_key="...")
client.delivery.test_s3(bucket="my-bucket")
# SFTP
client.delivery.sftp("squid-id", host="ftp.example.com", username="user",
password="pass", directory="/uploads")
client.delivery.test_sftp(host="ftp.example.com", username="user",
password="pass", directory="/uploads")
Models: EmailDelivery, GoogleSheetDelivery, S3Delivery, WebhookDelivery, SFTPDelivery
Async Client
The async client mirrors the sync API exactly, using async/await:
from lobstrio import AsyncLobstrClient
async def main():
async with AsyncLobstrClient() as client:
user = await client.me()
print(user.email)
crawlers = await client.crawlers.list()
for c in crawlers:
print(c.name)
squid = await client.squids.create("crawler-id", name="Async Scrape")
await client.tasks.add(squid=squid.id, tasks=[{"url": "..."}])
run = await client.runs.start(squid=squid.id)
final = await client.runs.wait(run.id)
await client.runs.download(run.id, "results.csv")
All resource methods (client.crawlers.*, client.squids.*, etc.) work identically — just add await.
Pagination
Resources that return lists support two patterns:
Single page (.list()) — returns one page of results:
page1 = client.squids.list(limit=10, page=1)
page2 = client.squids.list(limit=10, page=2)
Auto-pagination (.iter()) — lazy iterator that fetches pages on demand:
for squid in client.squids.iter(limit=50):
print(squid.name) # automatically fetches next pages
The async client provides AsyncPageIterator for use with async for.
Error Handling
All API errors raise typed exceptions with status_code, message, and body:
from lobstrio import LobstrClient, AuthError, NotFoundError, RateLimitError, APIError
try:
client.squids.get("nonexistent")
except NotFoundError as e:
print(f"Not found: {e.message}")
except AuthError:
print("Invalid or expired token")
except RateLimitError as e:
print(f"Rate limited, retry after {e.retry_after}s")
except APIError as e:
print(f"API error [{e.status_code}]: {e.message}")
| Exception | HTTP Status | When |
|---|---|---|
AuthError |
401 | Invalid or missing token |
NotFoundError |
404 | Resource doesn't exist |
RateLimitError |
429 | Too many requests (has retry_after) |
APIError |
4xx/5xx | All other API errors |
CLI vs SDK
CLI (pip install lobstrio) |
SDK (pip install lobstrio-sdk) |
|
|---|---|---|
| Use case | Terminal workflows, quick scrapes, cron jobs | Scripts, pipelines, applications |
| Interface | Shell commands | Python API |
| Output | Rich tables, progress bars, CSV files | Typed dataclass models |
| Async | No | Yes (AsyncLobstrClient) |
| Pagination | Manual (--page, --limit) |
Auto (client.squids.iter()) |
For terminal workflows, see lobstrio — the companion CLI tool.
FAQ
Where do I get an API token?
Go to Dashboard → API to find your token. It's always available there, pre-generated.
Do I need the CLI installed for the SDK to work?
No. The SDK is standalone. However, if you have the CLI configured (lobstr config set-token), the SDK will automatically pick up the token from ~/.config/lobstr/config.toml — no code changes needed.
How do I handle rate limiting?
Catch RateLimitError and use its retry_after attribute:
from lobstrio import RateLimitError
import time
try:
results = client.results.list(squid="squid-id")
except RateLimitError as e:
time.sleep(float(e.retry_after or 5))
results = client.results.list(squid="squid-id")
Can I use the async client with Django/FastAPI?
Yes. Use AsyncLobstrClient in any async context:
from lobstrio import AsyncLobstrClient
async def scrape_view(request):
async with AsyncLobstrClient() as client:
results = await client.results.list(squid="squid-id")
return results
Development
# Clone and install
git clone https://github.com/lobstrio/lobstrio-sdk.git
cd lobstrio-sdk
pip install -e ".[dev]"
# Run unit tests
pytest
# Run live tests (requires API token)
pytest tests/test_live.py -v
# Lint & type check
ruff check src/ tests/
mypy src/lobstrio/
Contributing
Contributions are welcome! See CONTRIBUTING.md for development setup, code style, and versioning guidelines.
Changelog
See CHANGELOG.md for release history.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lobstrio_sdk-0.2.1.tar.gz.
File metadata
- Download URL: lobstrio_sdk-0.2.1.tar.gz
- Upload date:
- Size: 36.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f83715e4c566cd3f8012088d9d7d87015b1f61f2c9a53e4f1cf552e172cfe29c
|
|
| MD5 |
efb48a6e33978975bf7dc64e791346c6
|
|
| BLAKE2b-256 |
a28061b9ea1a9934d0cb451e316c134f04cc568364bc037198a5e77d464799b2
|
File details
Details for the file lobstrio_sdk-0.2.1-py3-none-any.whl.
File metadata
- Download URL: lobstrio_sdk-0.2.1-py3-none-any.whl
- Upload date:
- Size: 26.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24df072bd81fe0ee9f76f0df90b59ff8a09a605854f22e626fe1aeb719c8ce17
|
|
| MD5 |
95c67a6dc7f1d8468e51ddf30598c997
|
|
| BLAKE2b-256 |
f65024ed3872739c9cdc419a34e8ed09465d9c59773af392c0d394dd38bc5488
|