Call REST APIs from a Polars DataFrame, one row at a time, using native Polars expressions. Sync and async GET/POST with per-row URLs, params, and bodies.
Project description
polars-api
Call REST APIs from a Polars DataFrame, one row at a time, using native Polars expressions.
polars-api registers an .api namespace on Polars expressions so you can issue HTTP GET and POST requests for every row of a DataFrame — synchronously or asynchronously — and pipe the responses straight back into your data pipeline.
import polars as pl
import polars_api # noqa: F401 — registers the `.api` namespace
(
pl.DataFrame({"url": ["https://jsonplaceholder.typicode.com/posts/1"]})
.with_columns(
pl.col("url").api.get().str.json_decode().alias("response")
)
)
- Repository: https://github.com/diegoglozano/polars-api
- Documentation: https://diegoglozano.github.io/polars-api/
- PyPI: https://pypi.org/project/polars-api/
Why polars-api?
- Expression-native — works inside
with_columns,select, and any other Polars expression context. Noforloops, no manualapply. - Sync and async out of the box — async variants (
aget/apost) fan out requests withasyncio.gatherfor high-throughput enrichment. - Per-row URLs, params, and bodies — every argument can be a Polars expression, so you can build them from other columns.
- Built on httpx (sync) and aiohttp (async) — async fan-out uses aiohttp for ~10× higher throughput at high concurrency.
- Tiny surface area — four methods (
get,aget,post,apost) you already know how to use.
Common use cases:
- Enrich a DataFrame with data from a REST API (geocoding, currency rates, user profiles…).
- Score rows against an ML inference endpoint.
- Hit an internal microservice in batch from a notebook or ETL job.
- Quickly prototype API-driven data pipelines without writing async boilerplate.
Installation
# uv
uv add polars-api
# pip
pip install polars-api
# poetry
poetry add polars-api
Requires Python 3.9+ and Polars 1.0+.
Quickstart
1. GET request per row
import polars as pl
import polars_api # noqa: F401
df = (
pl.DataFrame({"id": [1, 2, 3]})
.with_columns(
("https://jsonplaceholder.typicode.com/posts/" + pl.col("id").cast(pl.Utf8)).alias("url")
)
.with_columns(
pl.col("url").api.get().str.json_decode().alias("response")
)
)
2. GET with query parameters
Pass any Polars expression that resolves to a struct as params:
df = (
pl.DataFrame({"url": ["https://jsonplaceholder.typicode.com/posts"] * 3})
.with_columns(
pl.struct(userId=pl.Series([1, 2, 3])).alias("params"),
)
.with_columns(
pl.col("url").api.get(params=pl.col("params")).str.json_decode().alias("response")
)
)
3. POST with a JSON body
df = (
pl.DataFrame({"url": ["https://jsonplaceholder.typicode.com/posts"] * 3})
.with_columns(
pl.struct(
title=pl.lit("foo"),
body=pl.lit("bar"),
userId=pl.Series([1, 2, 3]),
).alias("body"),
)
.with_columns(
pl.col("url").api.post(body=pl.col("body")).str.json_decode().alias("response")
)
)
4. Async requests for throughput
aget and apost fan out with aiohttp and asyncio.gather, so requests run concurrently per batch. In benchmarks against a local server, this is roughly an order of magnitude faster than the sync path at high concurrency:
df = (
pl.DataFrame({"url": ["https://jsonplaceholder.typicode.com/posts"] * 100})
.with_columns(
pl.col("url").api.aget().str.json_decode().alias("response")
)
)
5. Timeouts
Every method accepts a timeout (in seconds). Sync verbs forward it to httpx; async verbs wrap it in aiohttp.ClientTimeout(total=...).
pl.col("url").api.get(timeout=5.0)
pl.col("url").api.apost(body=pl.col("body"), timeout=10.0)
API reference
All methods live under the .api namespace on any Polars expression that resolves to a URL string.
| Method | HTTP verb | Mode |
|---|---|---|
get / aget |
GET | sync / async |
post / apost |
POST | sync / async |
put / aput |
PUT | sync / async |
patch / apatch |
PATCH | sync / async |
delete / adelete |
DELETE | sync / async |
head / ahead |
HEAD | sync / async |
Arguments (all keyword-only after the positional params / body):
params— Polars expression yielding a struct of query-string parameters per row.body(POST/PUT/PATCH only) — Polars expression yielding a struct serialized as a JSON body per row.data— Polars expression yielding a struct serialized asapplication/x-www-form-urlencoded.headers— Polars expression yielding a struct of headers per row (e.g. tenant IDs, custom auth).client— preconfiguredhttpx.Client(sync verbs) oraiohttp.ClientSession(async verbs) for connection reuse, custom timeouts, base URLs, cookies, etc.timeout— request timeout in seconds.retries(int, default 0) — retry on connection errors, timeouts, 5xx, and 429.backoff(float, default 0.0) — exponential backoff base (seconds). 429s respectRetry-Afterif present.max_concurrency(async only) — cap on in-flight requests viaasyncio.Semaphore.cache(bool, default False) — memoize identical(method, url, params, body, data, headers)tuples within a batch.with_metadata(bool, default False) — return a struct{body, status, elapsed_ms, error}per row instead of just the body.with_response_headers(bool, default False) — whenwith_metadata=True, also includeresponse_headers: List[Struct{name, value}]on the struct.on_error("null" | "raise" | "return") — whenwith_metadata=False, what to do on non-2xx / network errors.on_request,on_response— observability hooks.- Sync verbs: receive
httpx.Requestandhttpx.Response. - Async verbs:
on_request(method, url, kwargs)(the args about to be sent) andon_response(aiohttp.ClientResponse).
- Sync verbs: receive
auth=("user", "pass")— basic auth.bearer=pl.col("token")— per-row bearer token (also accepts a literal string).api_key=...,api_key_header="X-API-Key"— shorthand for an API-key header.
By default, each method returns a pl.Expr of Utf8. With with_metadata=True, it returns a struct column with the schema:
{"body": Utf8, "status": Int64, "elapsed_ms": Float64, "error": Utf8}
Use .str.json_decode() to parse JSON responses.
Examples
# Per-row bearer auth + retries + concurrency cap
pl.col("url").api.aget(
bearer=pl.col("token"),
retries=3,
backoff=0.5,
max_concurrency=10,
)
# Inspect status, timing, errors and response headers
pl.col("url").api.get(with_metadata=True, with_response_headers=True)
# Bring your own session (connector tuning, base_url, cookies, etc.)
session = aiohttp.ClientSession(base_url="https://api.example.com")
pl.col("path").api.aget(client=session)
# Skip duplicate URLs within a batch (e.g. after a join/explode)
pl.col("url").api.aget(cache=True)
# Follow Link: rel="next" pagination
df.with_columns(
pl.col("url").api.paginate(max_pages=20).alias("pages")
).explode("pages")
Tips and patterns
- Decode JSON immediately: chain
.str.json_decode()and then.struct.unnest()(orpl.col("response").struct.field("…")) to flatten the result. - Build URLs from columns: use Polars string concatenation or
pl.format("https://api.example.com/users/{}", pl.col("user_id"))to build per-row URLs. - Prefer
aget/apostfor many rows: async variants run requests concurrently and are typically much faster for I/O-bound workloads. - Inspect failures: the sync helpers return
nullfor non-2xx responses; check for nulls in the resulting column before decoding.
FAQ
Can I make HTTP requests from a Polars DataFrame?
Yes — that is exactly what polars-api is for. Import the package and call .api.get() / .api.post() on a URL column.
How do I call a REST API for every row of a Polars DataFrame?
Place the URLs in a column and use pl.col("url").api.get() (or aget for async). Optional params and body arguments accept Polars expressions, so they can vary by row.
Does it support async / concurrent requests?
Yes. aget and apost issue requests concurrently with asyncio.gather, which is significantly faster than the sync variants when you have more than a handful of rows.
Is it lazy-frame compatible?
Yes — because everything is built on Polars expressions, you can use it in LazyFrame.with_columns(...) pipelines.
What does it return?
A Utf8 column with the raw response body. Pipe it through .str.json_decode() to parse JSON responses.
Contributing
Contributions are welcome — see CONTRIBUTING.md. Please open an issue before starting on larger changes.
License
MIT © Diego Garcia Lozano
Repository initiated with fpgmaas/cookiecutter-uv.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polars_api-0.3.0.tar.gz.
File metadata
- Download URL: polars_api-0.3.0.tar.gz
- Upload date:
- Size: 22.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27998b070ac9a893aea42c1950634527704f8d9af65b561976d5f4ae12b654a0
|
|
| MD5 |
53dea3e7fa22ff394f249e8e5e31ecbc
|
|
| BLAKE2b-256 |
263cc9e69f7b3038bb7f0aa9d9024c2a2e98355c6434fe70ac81ef6996a1d1e2
|
File details
Details for the file polars_api-0.3.0-py3-none-any.whl.
File metadata
- Download URL: polars_api-0.3.0-py3-none-any.whl
- Upload date:
- Size: 19.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db1e7f1c0bed5b87d6ba0ccd2044c36677b3af8646a326cab8d191ee39ed2c7e
|
|
| MD5 |
be39c9927990862164f39e2202529e3f
|
|
| BLAKE2b-256 |
86e0f3f29a7103ecb4bc8b3191eab9f001bcda2225e96877fe0963d260eb3839
|