Skip to main content

Call REST APIs from a Polars DataFrame, one row at a time, using native Polars expressions. Sync and async GET/POST with per-row URLs, params, and bodies.

Project description

polars-api

PyPI version Python versions Release Build status codecov License

Call REST APIs from a Polars DataFrame, one row at a time, using native Polars expressions.

polars-api registers an .api namespace on Polars expressions so you can issue HTTP GET and POST requests for every row of a DataFrame — synchronously or asynchronously — and pipe the responses straight back into your data pipeline.

import polars as pl
import polars_api  # noqa: F401  — registers the `.api` namespace

(
    pl.DataFrame({"url": ["https://jsonplaceholder.typicode.com/posts/1"]})
      .with_columns(
          pl.col("url").api.get().str.json_decode().alias("response")
      )
)

Why polars-api?

  • Expression-native — works inside with_columns, select, and any other Polars expression context. No for loops, no manual apply.
  • Sync and async out of the box — async variants (aget / apost) fan out requests with asyncio.gather for high-throughput enrichment.
  • Per-row URLs, params, and bodies — every argument can be a Polars expression, so you can build them from other columns.
  • Powered by httpx — modern, dependable HTTP client with timeouts and HTTP/2 ready.
  • Tiny surface area — four methods (get, aget, post, apost) you already know how to use.

Common use cases:

  • Enrich a DataFrame with data from a REST API (geocoding, currency rates, user profiles…).
  • Score rows against an ML inference endpoint.
  • Hit an internal microservice in batch from a notebook or ETL job.
  • Quickly prototype API-driven data pipelines without writing async boilerplate.

Installation

# uv
uv add polars-api

# pip
pip install polars-api

# poetry
poetry add polars-api

Requires Python 3.9+ and Polars 1.0+.

Quickstart

1. GET request per row

import polars as pl
import polars_api  # noqa: F401

df = (
    pl.DataFrame({"id": [1, 2, 3]})
      .with_columns(
          ("https://jsonplaceholder.typicode.com/posts/" + pl.col("id").cast(pl.Utf8)).alias("url")
      )
      .with_columns(
          pl.col("url").api.get().str.json_decode().alias("response")
      )
)

2. GET with query parameters

Pass any Polars expression that resolves to a struct as params:

df = (
    pl.DataFrame({"url": ["https://jsonplaceholder.typicode.com/posts"] * 3})
      .with_columns(
          pl.struct(userId=pl.Series([1, 2, 3])).alias("params"),
      )
      .with_columns(
          pl.col("url").api.get(params=pl.col("params")).str.json_decode().alias("response")
      )
)

3. POST with a JSON body

df = (
    pl.DataFrame({"url": ["https://jsonplaceholder.typicode.com/posts"] * 3})
      .with_columns(
          pl.struct(
              title=pl.lit("foo"),
              body=pl.lit("bar"),
              userId=pl.Series([1, 2, 3]),
          ).alias("body"),
      )
      .with_columns(
          pl.col("url").api.post(body=pl.col("body")).str.json_decode().alias("response")
      )
)

4. Async requests for throughput

aget and apost use asyncio.gather under the hood, so requests run concurrently per batch:

df = (
    pl.DataFrame({"url": ["https://jsonplaceholder.typicode.com/posts"] * 100})
      .with_columns(
          pl.col("url").api.aget().str.json_decode().alias("response")
      )
)

5. Timeouts

Every method accepts a timeout (in seconds), forwarded to httpx:

pl.col("url").api.get(timeout=5.0)
pl.col("url").api.apost(body=pl.col("body"), timeout=10.0)

API reference

All methods live under the .api namespace on any Polars expression that resolves to a URL string.

Method HTTP verb Mode
get / aget GET sync / async
post / apost POST sync / async
put / aput PUT sync / async
patch / apatch PATCH sync / async
delete / adelete DELETE sync / async
head / ahead HEAD sync / async

Arguments (all keyword-only after the positional params / body):

  • params — Polars expression yielding a struct of query-string parameters per row.
  • body (POST/PUT/PATCH only) — Polars expression yielding a struct serialized as a JSON body per row.
  • data — Polars expression yielding a struct serialized as application/x-www-form-urlencoded.
  • headers — Polars expression yielding a struct of headers per row (e.g. tenant IDs, custom auth).
  • client — preconfigured httpx.Client / httpx.AsyncClient to enable connection reuse, HTTP/2, base_url, cookies, and custom transports.
  • timeout — request timeout in seconds.
  • retries (int, default 0) — retry on connection errors, timeouts, 5xx, and 429.
  • backoff (float, default 0.0) — exponential backoff base (seconds). 429s respect Retry-After if present.
  • max_concurrency (async only) — cap on in-flight requests via asyncio.Semaphore.
  • cache (bool, default False) — memoize identical (method, url, params, body, data, headers) tuples within a batch.
  • with_metadata (bool, default False) — return a struct {body, status, elapsed_ms, error} per row instead of just the body.
  • with_response_headers (bool, default False) — when with_metadata=True, also include response_headers: List[Struct{name, value}] on the struct.
  • on_error ("null" | "raise" | "return") — when with_metadata=False, what to do on non-2xx / network errors.
  • on_request, on_response — callables that receive the httpx.Request / httpx.Response. Useful for logging, metrics, and tracing.
  • auth=("user", "pass") — basic auth.
  • bearer=pl.col("token") — per-row bearer token (also accepts a literal string).
  • api_key=..., api_key_header="X-API-Key" — shorthand for an API-key header.

By default, each method returns a pl.Expr of Utf8. With with_metadata=True, it returns a struct column with the schema:

{"body": Utf8, "status": Int64, "elapsed_ms": Float64, "error": Utf8}

Use .str.json_decode() to parse JSON responses.

Examples

# Per-row bearer auth + retries + concurrency cap
pl.col("url").api.aget(
    bearer=pl.col("token"),
    retries=3,
    backoff=0.5,
    max_concurrency=10,
)

# Inspect status, timing, errors and response headers
pl.col("url").api.get(with_metadata=True, with_response_headers=True)

# Bring your own client (HTTP/2, keep-alive, base_url, etc.)
client = httpx.AsyncClient(http2=True, base_url="https://api.example.com")
pl.col("path").api.aget(client=client)

# Skip duplicate URLs within a batch (e.g. after a join/explode)
pl.col("url").api.aget(cache=True)

# Follow Link: rel="next" pagination
df.with_columns(
    pl.col("url").api.paginate(max_pages=20).alias("pages")
).explode("pages")

Tips and patterns

  • Decode JSON immediately: chain .str.json_decode() and then .struct.unnest() (or pl.col("response").struct.field("…")) to flatten the result.
  • Build URLs from columns: use Polars string concatenation or pl.format("https://api.example.com/users/{}", pl.col("user_id")) to build per-row URLs.
  • Prefer aget / apost for many rows: async variants run requests concurrently and are typically much faster for I/O-bound workloads.
  • Inspect failures: the sync helpers return null for non-2xx responses; check for nulls in the resulting column before decoding.

FAQ

Can I make HTTP requests from a Polars DataFrame? Yes — that is exactly what polars-api is for. Import the package and call .api.get() / .api.post() on a URL column.

How do I call a REST API for every row of a Polars DataFrame? Place the URLs in a column and use pl.col("url").api.get() (or aget for async). Optional params and body arguments accept Polars expressions, so they can vary by row.

Does it support async / concurrent requests? Yes. aget and apost issue requests concurrently with asyncio.gather, which is significantly faster than the sync variants when you have more than a handful of rows.

Is it lazy-frame compatible? Yes — because everything is built on Polars expressions, you can use it in LazyFrame.with_columns(...) pipelines.

What does it return? A Utf8 column with the raw response body. Pipe it through .str.json_decode() to parse JSON responses.

Contributing

Contributions are welcome — see CONTRIBUTING.md. Please open an issue before starting on larger changes.

License

MIT © Diego Garcia Lozano


Repository initiated with fpgmaas/cookiecutter-uv.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_api-0.2.0.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_api-0.2.0-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file polars_api-0.2.0.tar.gz.

File metadata

  • Download URL: polars_api-0.2.0.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for polars_api-0.2.0.tar.gz
Algorithm Hash digest
SHA256 15fd7b7c5815d31b2a63124c8ad82e3b152b0360e057727054968f5a7c4cddeb
MD5 61a4c5b450a67f6abf763ed3b346b12a
BLAKE2b-256 2cb13ee52248bea6d1f8309eb53a363d86b6877bce573a29344745013f953a2e

See more details on using hashes here.

File details

Details for the file polars_api-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: polars_api-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for polars_api-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5997a5d5ff47c0772c4da04cd318ac28bd711a99c72dfbc366d527423c07f658
MD5 869e92e20e3afbdfd8b950e9e800aa16
BLAKE2b-256 1e2221501762de84bc7ae71548925235bc5573a1efd78cefdc1381e644a47e76

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page