Python SDK for the OpenData API
Project description
tryopendata
Python client for the OpenData API. Query, search, and analyze open datasets with a few lines of code.
Installation
pip install tryopendata
With DataFrame support:
pip install tryopendata[pandas] # pandas
pip install tryopendata[polars] # polars
pip install tryopendata[all] # both
Quick Start
from opendata_sdk import OpenData
client = OpenData() # or OpenData(api_key="od_live_...")
df = client.load("bls/cpi-u").to_pandas()
That's it. load() handles pagination automatically and returns a DataResult ready for conversion.
Querying Data
Three entry points, each for a different use case:
| Method | Returns | When |
|---|---|---|
client.load(path) |
All rows, paginated automatically | Exploration, notebooks, analysis |
client.query(path) |
First page only (up to 10,000 rows) | Sampling, dashboards |
client.datasets.query_iter(path) |
One page at a time | Very large datasets |
from opendata_sdk import OpenData
client = OpenData()
# Load everything -- the most common case
df = client.load("bls/cpi-u").to_pandas()
# Sample the first page for quick exploration
result = client.query("bls/cpi-u")
print(result.rows[:5]) # list of dicts, no extra deps needed
df = result.to_pandas()
df = result.to_polars()
# Process a huge dataset without loading it all into memory
for page in client.datasets.query_iter("bls/cpi-u", page_size=5000):
df = page.to_pandas()
process_batch(df)
Schema Inspection
# Inspect schema without converting to pandas
result = client.query("bls/cpi-u")
print(result.dtypes) # {'year': 'BIGINT', 'value': 'DOUBLE', ...}
print(result.schema) # [{'name': 'year', 'type': 'BIGINT'}, ...]
print(result.warnings) # API warnings (e.g. truncated results)
Query Builder
The Query class gives you a fluent interface for filtering, sorting, and aggregating. Each method returns a new immutable query, so you can safely reuse and extend them.
from opendata_sdk import OpenData, Query
client = OpenData()
q = (
Query()
.filter("year", "gte", 2020)
.filter("state", "eq", "California")
.sort("year", desc=True)
.fields("year", "state", "population")
.limit(100)
)
df = client.load("census/population", q).to_pandas()
Filter Operators
q = Query()
q.eq("state", "Texas") # state = 'Texas'
q.ne("status", "draft") # status != 'draft'
q.gt("year", 2020) # year > 2020
q.gte("year", 2020) # year >= 2020
q.lt("value", 1000) # value < 1000
q.lte("value", 1000) # value <= 1000
q.like("name", "%energy%") # name LIKE '%energy%'
q.isin("state", ["CA", "TX", "NY"]) # state IN ('CA', 'TX', 'NY')
Aggregation
q = (
Query()
.group_by("state")
.aggregate("sum:population", "count:id")
.sort("sum_population", desc=True)
)
df = client.load("census/population", q).to_pandas()
Views
Some datasets have pre-configured views with computed columns or joins:
q = Query().view("annual").filter("year", "gte", 2015)
df = client.load("bls/cpi-u", q).to_pandas()
Auto-Pagination
client.datasets.list() returns a PaginatedList that fetches pages transparently as you iterate:
# Iterates through all datasets, fetching pages as needed
for dataset in client.datasets.list():
print(f"{dataset.path}: {dataset.rows} rows")
# Or get one page at a time
paginated = client.datasets.list(limit=50)
for page in paginated.pages():
print(f"Page with {len(page)} datasets")
Search
results = client.search("inflation consumer prices")
for hit in results.results:
print(f"{hit.path} (relevance: {hit.relevance})")
# Filter by provider or category
results = client.search("population", provider="census", limit=5)
# Autocomplete suggestions
suggestions = client.suggest("infla")
Dataset Metadata
# Full metadata for a specific dataset (shortcut: client.meta())
meta = client.meta("census/population")
print(meta.description)
print(meta.rows)
print(meta.available_views)
# Column statistics
columns = client.datasets.columns("census/population")
for col in columns:
print(f"{col.name}: {col.type} ({col.distinct_count} distinct)")
# Available views
views = client.datasets.views("bls/cpi-u")
for view in views:
print(f"{view.name}: {view.description}")
Providers and Categories
# List all data providers
for provider in client.providers.list():
print(f"{provider.slug}: {provider.dataset_count} datasets")
# List categories
for category in client.categories.list():
print(f"{category.slug}: {category.name}")
Async Usage
The async client mirrors the sync API exactly. Import from opendata_sdk.aio:
import asyncio
from opendata_sdk.aio import OpenData
async def main():
async with OpenData() as client:
result = await client.datasets.query("census/population")
df = result.to_pandas()
# Auto-pagination works with async for
async for dataset in await client.datasets.list():
print(dataset.name)
asyncio.run(main())
Error Handling
All errors inherit from OpenDataError, so you can catch broadly or target specific cases:
from opendata_sdk import OpenData, NotFoundError, RateLimitError, OpenDataError
client = OpenData()
try:
result = client.datasets.query("nonexistent/dataset")
except NotFoundError:
print("Dataset doesn't exist")
except RateLimitError as e:
print(f"Throttled. Retry after {e.retry_after}s")
except OpenDataError as e:
print(f"API error {e.status_code}: {e.message}")
The full exception hierarchy:
| Exception | Status Code | When |
|---|---|---|
AuthenticationError |
401 | Missing or invalid API key |
ForbiddenError |
403 | Insufficient permissions |
NotFoundError |
404 | Dataset or resource not found |
InvalidRequestError |
400, 422 | Bad request parameters |
RateLimitError |
429 | Too many requests |
APIError |
5xx | Server error |
APIConnectionError |
- | Network or timeout failure |
The SDK automatically retries on 429 and 5xx responses with exponential backoff (configurable via max_retries).
Configuration
client = OpenData(
api_key="od_live_...", # or set OPENDATA_API_KEY env var
base_url="https://...", # default: https://api.tryopendata.ai/v1
timeout=60.0, # request timeout in seconds (default: 30)
max_retries=5, # retry attempts on 429/5xx (default: 3)
)
The API key can also be set via the OPENDATA_API_KEY environment variable. If both are provided, the constructor argument takes priority.
Context Manager
Both sync and async clients support context managers to ensure the HTTP connection is properly closed:
# Sync
with OpenData() as client:
result = client.datasets.query("census/population")
# Async
async with OpenData() as client:
result = await client.datasets.query("census/population")
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tryopendata-0.2.4.tar.gz.
File metadata
- Download URL: tryopendata-0.2.4.tar.gz
- Upload date:
- Size: 104.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0cb67c1b4fd81eaedc90f46d79e0bed0cee0b584b209d8d6eec1ab592e9a17bd
|
|
| MD5 |
f92ba4943eda2bbe42cd0f72d9506c02
|
|
| BLAKE2b-256 |
1aabbe15b7e247a43d17010a38b307b260ec403c521d018f5d578273ef3c6547
|
Provenance
The following attestation bundles were made for tryopendata-0.2.4.tar.gz:
Publisher:
release.yml on tryopendata/opendata-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tryopendata-0.2.4.tar.gz -
Subject digest:
0cb67c1b4fd81eaedc90f46d79e0bed0cee0b584b209d8d6eec1ab592e9a17bd - Sigstore transparency entry: 1757905073
- Sigstore integration time:
-
Permalink:
tryopendata/opendata-python@e984ec124a6568832a4c53ddced6e5db5e2f6f3a -
Branch / Tag:
refs/heads/main - Owner: https://github.com/tryopendata
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e984ec124a6568832a4c53ddced6e5db5e2f6f3a -
Trigger Event:
workflow_run
-
Statement type:
File details
Details for the file tryopendata-0.2.4-py3-none-any.whl.
File metadata
- Download URL: tryopendata-0.2.4-py3-none-any.whl
- Upload date:
- Size: 30.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1bd3d504628dd24aafa41c128bf59c8dd6b163b83b30e3719a6d3a78ff62bf2
|
|
| MD5 |
660a66102c3879f9caf893e7ad5ebede
|
|
| BLAKE2b-256 |
311b65eb4ee2a346053a7a38be64909b0ba978ebf93b075ad4391cc9566d514c
|
Provenance
The following attestation bundles were made for tryopendata-0.2.4-py3-none-any.whl:
Publisher:
release.yml on tryopendata/opendata-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tryopendata-0.2.4-py3-none-any.whl -
Subject digest:
b1bd3d504628dd24aafa41c128bf59c8dd6b163b83b30e3719a6d3a78ff62bf2 - Sigstore transparency entry: 1757905150
- Sigstore integration time:
-
Permalink:
tryopendata/opendata-python@e984ec124a6568832a4c53ddced6e5db5e2f6f3a -
Branch / Tag:
refs/heads/main - Owner: https://github.com/tryopendata
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e984ec124a6568832a4c53ddced6e5db5e2f6f3a -
Trigger Event:
workflow_run
-
Statement type: