polars-api

Call REST APIs from a Polars DataFrame, one row at a time, using native Polars expressions. Sync and async GET/POST with per-row URLs, params, and bodies.

These details have not been verified by PyPI

Project links

Project description

polars-api

Call REST APIs from a Polars DataFrame, one row at a time, using native Polars expressions.

polars-api registers an .api namespace on Polars expressions so you can issue HTTP GET and POST requests for every row of a DataFrame — synchronously or asynchronously — and pipe the responses straight back into your data pipeline.

import polars as pl
import polars_api  # noqa: F401  — registers the `.api` namespace

post = pl.Struct({"userId": pl.Int64, "id": pl.Int64, "title": pl.Utf8, "body": pl.Utf8})

(
    pl.DataFrame({"url": ["https://jsonplaceholder.typicode.com/posts/1"]})
      .with_columns(
          pl.col("url").api.get().str.json_decode(post).alias("response")
      )
)

In an expression, str.json_decode() needs an explicit dtype (recent Polars made it required). See Decoding JSON responses for the schema-free, eager alternative.

Repository: https://github.com/diegoglozano/polars-api
Documentation: https://diegoglozano.github.io/polars-api/
PyPI: https://pypi.org/project/polars-api/

Why polars-api?

Expression-native — works inside with_columns, select, and any other Polars expression context. No for loops, no manual apply.
Sync and async out of the box — async variants (aget / apost) fan out requests with asyncio.gather for high-throughput enrichment.
Per-row URLs, params, and bodies — every argument can be a Polars expression, so you can build them from other columns.
Built on httpx (sync) and aiohttp (async) — async fan-out uses aiohttp for ~10× higher throughput at high concurrency.
Tiny surface area — four methods (get, aget, post, apost) you already know how to use.

Common use cases:

Enrich a DataFrame with data from a REST API (geocoding, currency rates, user profiles…).
Score rows against an ML inference endpoint.
Hit an internal microservice in batch from a notebook or ETL job.
Quickly prototype API-driven data pipelines without writing async boilerplate.

Installation

# uv
uv add polars-api

# pip
pip install polars-api

# poetry
poetry add polars-api

Requires Python 3.9+ and Polars 1.0+.

Quickstart

1. GET request per row

import polars as pl
import polars_api  # noqa: F401

post = pl.Struct({"userId": pl.Int64, "id": pl.Int64, "title": pl.Utf8, "body": pl.Utf8})

df = (
    pl.DataFrame({"id": [1, 2, 3]})
      .with_columns(
          ("https://jsonplaceholder.typicode.com/posts/" + pl.col("id").cast(pl.Utf8)).alias("url")
      )
      .with_columns(
          pl.col("url").api.get().str.json_decode(post).alias("response")
      )
)

2. GET with query parameters

Pass any Polars expression that resolves to a struct as params. Here the endpoint returns a JSON array, so the decode schema is pl.List(post):

df = (
    pl.DataFrame({"url": ["https://jsonplaceholder.typicode.com/posts"] * 3})
      .with_columns(
          pl.struct(userId=pl.Series([1, 2, 3])).alias("params"),
      )
      .with_columns(
          pl.col("url").api.get(params=pl.col("params")).str.json_decode(pl.List(post)).alias("response")
      )
)

3. POST with a JSON body

post = pl.Struct({"userId": pl.Int64, "id": pl.Int64, "title": pl.Utf8, "body": pl.Utf8})

df = (
    pl.DataFrame({"url": ["https://jsonplaceholder.typicode.com/posts"] * 3})
      .with_columns(
          pl.struct(
              title=pl.lit("foo"),
              body=pl.lit("bar"),
              userId=pl.Series([1, 2, 3]),
          ).alias("body"),
      )
      .with_columns(
          pl.col("url").api.post(body=pl.col("body")).str.json_decode(post).alias("response")
      )
)

4. Async requests for throughput

aget and apost fan out with aiohttp and asyncio.gather, so requests run concurrently per batch. In benchmarks against a local server, this is roughly an order of magnitude faster than the sync path at high concurrency:

post = pl.Struct({"userId": pl.Int64, "id": pl.Int64, "title": pl.Utf8, "body": pl.Utf8})

df = (
    pl.DataFrame({"url": ["https://jsonplaceholder.typicode.com/posts"] * 100})
      .with_columns(
          pl.col("url").api.aget().str.json_decode(pl.List(post)).alias("response")
      )
)

5. Timeouts

Every method accepts a timeout (in seconds). Sync verbs forward it to httpx; async verbs wrap it in aiohttp.ClientTimeout(total=...).

pl.col("url").api.get(timeout=5.0)
pl.col("url").api.apost(body=pl.col("body"), timeout=10.0)

6. Decoding JSON responses

Every verb returns a Utf8 column of raw response bodies. There are two ways to parse it, depending on the context:

In an expression (works in both DataFrame and LazyFrame) — pass an explicit dtype. Recent versions of Polars made the dtype argument of Expr.str.json_decode() required, because the lazy engine needs to know the output schema up front:

post = pl.Struct({"userId": pl.Int64, "id": pl.Int64, "title": pl.Utf8, "body": pl.Utf8})

df.with_columns(
    pl.col("response").str.json_decode(post)
)

If the endpoint returns a JSON array, wrap the element schema in pl.List(...) (e.g. pl.List(post)).

On a materialized Series (eager DataFrame only) — Series.str.json_decode() can still infer the schema from the data, so you can skip the explicit dtype. This is convenient for quick, interactive exploration:

df = df.with_columns(
    df["response"].str.json_decode().alias("response")
)

Inference only works on an already-collected DataFrame; inside a LazyFrame pipeline you must use the expression form with an explicit dtype.

7. Global defaults (set options once)

Talking to an authenticated API means passing the same client=, bearer=, or auth= to every call. Register them once with set_defaults(...) and every subsequent .api call falls back to them for any argument you don't pass explicitly:

import httpx
import polars_api

polars_api.set_defaults(
    client=httpx.Client(
        base_url="https://api.example.com",
        headers={"Authorization": "Bearer my-token"},
    ),
    retries=3,
    backoff=0.5,
)

# No need to repeat client=/retries=/backoff= on each call:
df.with_columns(pl.col("path").api.get().alias("res"))

Explicit per-call arguments always win over the configured default. Use the defaults(...) context manager to scope overrides to a block, and reset_defaults() to clear them:

# Scope a client to one block; previous defaults are restored on exit
with polars_api.defaults(client=session, max_concurrency=10):
    df.with_columns(pl.col("path").api.aget().alias("res"))

polars_api.reset_defaults()            # clear everything
polars_api.reset_defaults("client")    # clear just one option
polars_api.get_defaults()              # inspect the current config

Note: client is shared across the sync (httpx.Client) and async (aiohttp.ClientSession) paths, which need different client types. If you mix sync and async verbs, set client per call (or via defaults(...)) so each path gets the right client.

API reference

All methods live under the .api namespace on any Polars expression that resolves to a URL string.

Method	HTTP verb	Mode
`get` / `aget`	GET	sync / async
`post` / `apost`	POST	sync / async
`put` / `aput`	PUT	sync / async
`patch` / `apatch`	PATCH	sync / async
`delete` / `adelete`	DELETE	sync / async
`head` / `ahead`	HEAD	sync / async

Arguments (all keyword-only after the positional params / body):

params — Polars expression yielding a struct of query-string parameters per row.
body (POST/PUT/PATCH only) — Polars expression yielding a struct serialized as a JSON body per row.
data — Polars expression yielding a struct serialized as application/x-www-form-urlencoded.
headers — Polars expression yielding a struct of headers per row (e.g. tenant IDs, custom auth).
client — preconfigured httpx.Client (sync verbs) or aiohttp.ClientSession (async verbs) for connection reuse, custom timeouts, base URLs, cookies, etc.
timeout — request timeout in seconds.
retries (int, default 0) — retry on connection errors, timeouts, 5xx, and 429.
backoff (float, default 0.0) — exponential backoff base (seconds). 429s respect Retry-After if present.
max_concurrency (async only) — cap on in-flight requests via asyncio.Semaphore.
cache (bool, default False) — memoize identical (method, url, params, body, data, headers) tuples within a batch.
with_metadata (bool, default False) — return a struct {body, status, elapsed_ms, error} per row instead of just the body.
with_response_headers (bool, default False) — when with_metadata=True, also include response_headers: List[Struct{name, value}] on the struct.
on_error ("null" | "raise" | "return") — when with_metadata=False, what to do on non-2xx / network errors.
on_request, on_response — observability hooks.
- Sync verbs: receive httpx.Request and httpx.Response.
- Async verbs: on_request(method, url, kwargs) (the args about to be sent) and on_response(aiohttp.ClientResponse).
auth=("user", "pass") — basic auth.
bearer=pl.col("token") — per-row bearer token (also accepts a literal string).
api_key=..., api_key_header="X-API-Key" — shorthand for an API-key header.

By default, each method returns a pl.Expr of Utf8. With with_metadata=True, it returns a struct column with the schema:

{"body": Utf8, "status": Int64, "elapsed_ms": Float64, "error": Utf8}

See Decoding JSON responses for how to parse the body — pass an explicit dtype in an expression, or call .str.json_decode() on the materialized Series to infer the schema.

Examples

# Per-row bearer auth + retries + concurrency cap
pl.col("url").api.aget(
    bearer=pl.col("token"),
    retries=3,
    backoff=0.5,
    max_concurrency=10,
)

# Inspect status, timing, errors and response headers
pl.col("url").api.get(with_metadata=True, with_response_headers=True)

# Bring your own session (connector tuning, base_url, cookies, etc.)
session = aiohttp.ClientSession(base_url="https://api.example.com")
pl.col("path").api.aget(client=session)

# Skip duplicate URLs within a batch (e.g. after a join/explode)
pl.col("url").api.aget(cache=True)

# Follow Link: rel="next" pagination
df.with_columns(
    pl.col("url").api.paginate(max_pages=20).alias("pages")
).explode("pages")

Global configuration

Module-level helpers let you set request options once instead of repeating them on every call. Anything left unset on a call falls back to the configured default, then to the built-in default; explicit per-call arguments always win.

Function	Purpose
`polars_api.set_defaults(**o)`	Register persistent defaults for any request option.
`polars_api.get_defaults()`	Return a copy of the currently configured defaults.
`polars_api.reset_defaults(*names)`	Clear all defaults, or only the named ones.
`polars_api.defaults(**o)`	Context manager that applies defaults within a block, then restores.

Configurable options mirror the per-call keyword arguments: client, headers, timeout, retries, backoff, max_concurrency, cache, with_metadata, with_response_headers, on_error, on_request, on_response, auth, bearer, api_key, and api_key_header.

Benchmarks

benchmarks/bench.py spins up a local aiohttp echo server on 127.0.0.1 and issues N concurrent GETs against it with each client. Local-loopback isolates client-side overhead — it is not a model of real network latency, but it is useful for comparing the per-request cost of each path.

Reproducing

# default scenarios: 100/50, 500/100, 1000/100, 2000/200, repeats=5
just bench

# or with custom scenarios (N/concurrency pairs) and repeats
uv run python benchmarks/bench.py --scenarios 100/50,1000/100 --repeats 7

The script writes benchmarks/results.json (raw timings + environment) and benchmarks/results.md (Markdown table) for inspection or sharing. Both files are gitignored — re-run locally to refresh. The numbers below are a reference run on the environment described.

Latest results

Median of 5 runs, on Linux 6.18 / x86_64 / 4 cores, Python 3.11.15, polars 1.19.0, httpx 0.28.1, aiohttp 3.13.5. Higher rps is better.

Scenario (N / concurrency)	polars-api default	polars-api shared client	bare httpx (default)	bare httpx (tuned)	bare aiohttp
100 / 50	3,471 rps	4,122 rps	538 rps	235 rps	3,041 rps
500 / 100	3,728 rps	4,284 rps	401 rps	88 rps	4,352 rps
1000 / 100	4,012 rps	3,734 rps	355 rps	137 rps	4,701 rps
2000 / 200	3,980 rps	3,772 rps	125 rps	123 rps	4,477 rps

Takeaways:

The async aget / apost path runs at roughly the same throughput as bare aiohttp, with a small overhead for the Polars expression plumbing.
It is ~10–35× faster than httpx at the concurrencies tested, which is why the async path was migrated to aiohttp.
Bringing your own aiohttp.ClientSession via client= shaves a little more off small batches (no per-call session setup) and is recommended for long-running pipelines.

These numbers measure client overhead, not API latency. With a real remote endpoint, network RTT will dominate and the gap between clients shrinks.

Tips and patterns

Decode JSON immediately: chain .str.json_decode(dtype) (an explicit schema is required in an expression — see Decoding JSON responses) and then .struct.unnest() (or pl.col("response").struct.field("…")) to flatten the result.
Build URLs from columns: use Polars string concatenation or pl.format("https://api.example.com/users/{}", pl.col("user_id")) to build per-row URLs.
Prefer aget / apost for many rows: async variants run requests concurrently and are typically much faster for I/O-bound workloads.
Inspect failures: the sync helpers return null for non-2xx responses; check for nulls in the resulting column before decoding.

FAQ

Can I make HTTP requests from a Polars DataFrame? Yes — that is exactly what polars-api is for. Import the package and call .api.get() / .api.post() on a URL column.

How do I call a REST API for every row of a Polars DataFrame? Place the URLs in a column and use pl.col("url").api.get() (or aget for async). Optional params and body arguments accept Polars expressions, so they can vary by row.

Does it support async / concurrent requests? Yes. aget and apost issue requests concurrently with asyncio.gather, which is significantly faster than the sync variants when you have more than a handful of rows.

Is it lazy-frame compatible? Yes — because everything is built on Polars expressions, you can use it in LazyFrame.with_columns(...) pipelines.

What does it return? A Utf8 column with the raw response body. Parse it with .str.json_decode(dtype) in an expression, or call .str.json_decode() on the materialized Series to infer the schema — see Decoding JSON responses.

Contributing

Contributions are welcome — see CONTRIBUTING.md. Please open an issue before starting on larger changes.

License

MIT © Diego Garcia Lozano

Repository initiated with fpgmaas/cookiecutter-uv.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.0

Jun 20, 2026

0.3.0

Apr 30, 2026

0.2.0

Apr 30, 2026

0.1.6

Jan 28, 2025

0.1.5

Jan 17, 2025

0.1.4

Jan 17, 2025

0.1.3

Jan 13, 2025

0.1.2

Jan 9, 2025

0.1.1

Jan 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_api-0.4.0.tar.gz (32.0 kB view details)

Uploaded Jun 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

polars_api-0.4.0-py3-none-any.whl (26.7 kB view details)

Uploaded Jun 20, 2026 Python 3

File details

Details for the file polars_api-0.4.0.tar.gz.

File metadata

Download URL: polars_api-0.4.0.tar.gz
Upload date: Jun 20, 2026
Size: 32.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for polars_api-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`4b6d5b07c152312da657ea6c5493447a648a3ac9ee56970b8c4a68b18a0b2c6e`
MD5	`67b40b214841f2f52e38a7a107f4e152`
BLAKE2b-256	`e8ce820c1f7eca5d25958a8efca5659970f737acaecd6a80375ebbac6522e269`

See more details on using hashes here.

File details

Details for the file polars_api-0.4.0-py3-none-any.whl.

File metadata

Download URL: polars_api-0.4.0-py3-none-any.whl
Upload date: Jun 20, 2026
Size: 26.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for polars_api-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fd8f6b96307bcabf64d64e4f0ad0c070931723029a7e167c9cf5f02d0b307a46`
MD5	`1446aabeb731620d51161d9b6841372a`
BLAKE2b-256	`cd98a4755c98a8f40f2c3c7b384fb83bade5d7054de11dc23b98b3876f008c2d`

See more details on using hashes here.

polars-api 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

polars-api

Why polars-api?

Installation

Quickstart

1. GET request per row

2. GET with query parameters

3. POST with a JSON body

4. Async requests for throughput

5. Timeouts

6. Decoding JSON responses

7. Global defaults (set options once)

API reference

Examples

Global configuration

Benchmarks

Reproducing

Latest results

Tips and patterns

FAQ

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes