Fast opinionated multi-backend (DuckDB / DataFusion) dataset HTTP server.

These details have not been verified by PyPI

Project links

Project description

datap-rs

██████╗  █████╗ ████████╗ █████╗ ██████╗       ██████╗ ███████╗
██╔══██╗██╔══██╗╚══██╔══╝██╔══██╗██╔══██╗      ██╔══██╗██╔════╝
██║  ██║███████║   ██║   ███████║██████╔╝█████╗██████╔╝███████╗
██║  ██║██╔══██║   ██║   ██╔══██║██╔═══╝ ╚════╝██╔══██╗╚════██║
██████╔╝██║  ██║   ██║   ██║  ██║██║           ██║  ██║███████║
╚═════╝ ╚═╝  ╚═╝   ╚═╝   ╚═╝  ╚═╝╚═╝           ╚═╝  ╚═╝╚══════╝

PyPI - Downloads Rust DuckDB DataFusion

Documentation · Presentation · Source · PyPI

A fast opinionated multi-backend dataset HTTP server, built in Rust and driven from Python.

datap-rs (datapress) exposes one or more Parquet or Delta datasets over a small JSON HTTP API. It ships with two pluggable engines bundled into a single wheel — pick one at runtime:

DuckDB — battle-tested SQL, lazy parquet reads, low startup.
DataFusion — pure-Rust, in-memory RecordBatch + equality index for low-latency point lookups.

Identical request/response shapes across both, so you can A/B them under your real workload.

Install

pip install datap-rs
# or
uv pip install datap-rs

Wheels are published for Linux (x86_64/aarch64), macOS (arm64), and Windows (x86_64) against CPython 3.9+ (abi3).

Quick start

For testing, we're using this kaggle US accidents 2016-2023 dataset.

import asyncio
from datap_rs.datapress import DataPress, DataPressConfig, DatasetConfig

async def main() -> None:
    ds = DatasetConfig(
        name="accidents",
        source="data/accidents.parquet",
        format="parquet",          # or "delta"
        mode="auto",               # eq-index policy: "auto" | "none" | "list"
        description="US accidents 2016-2023",
    )
    cfg = DataPressConfig(
        backend="datafusion",      # or "duckdb"
        listen="0.0.0.0",
        port=8000,
        workers=8,
    )
    server = DataPress(cfg, datasets=[ds])
    await server.run()              # blocks until SIGINT

if __name__ == "__main__":
    asyncio.run(main())

Hit it:

curl http://localhost:8000/api/v1/datasets
curl http://localhost:8000/api/v1/datasets/accidents/schema
curl -X POST http://localhost:8000/api/v1/datasets/accidents/query \
  -H 'Content-Type: application/json' \
  -d '{
    "columns": ["ID","Severity","City","State"],
    "predicates": [
      { "col": "State",    "op": "eq",  "val": "TX" },
      { "col": "Severity", "op": "gte", "val": 3   }
    ],
    "page": 1, "page_size": 50
  }'

API surface

Seven public classes, no module-level state:

Class	Purpose
`DataPressConfig`	Server tuning: `backend`, `listen`, `port`, `workers`, `prefix`, `compress`, `max_body_bytes`, `max_page_size`, `force_lazy_above_mb`, `request_timeout_ms`, `shutdown_timeout_secs`, `metrics_enabled`, `metrics_path`, `sql_enabled`, `sql_max_rows`, `admin_token`, `datafusion_pushdown_filters`, `datafusion_reorder_filters`, `datafusion_list_files_cache`, `datafusion_list_files_cache_mb`, `datafusion_list_files_cache_ttl_secs`, `pgwire_enabled`, `pgwire_listen`, `pgwire_port`, `pgwire_username`, `pgwire_password`, `pgwire_tls_cert`, `pgwire_tls_key`.
`DatasetConfig`	One dataset: `name`, `source`, `format`, `mode`, optional S3 + index.
`S3Config`	S3 / S3-compatible credentials and endpoint config.
`HMACKeyPair`	Access/secret key pair returned by an `S3Config` credentials provider.
`DataPress`	Built from a `DataPressConfig` + list of `DatasetConfig` + optional `AuthConfig`. `await .run()`.
`AuthConfig`	OIDC / OAuth2 bearer enforcement (requires the `auth` feature in the wheel).
`DataPressClient`	Sync HTTP client for talking to a running server (stdlib + lazy pyarrow).

Hover any of them in your IDE for full kwarg docs.

S3 / S3-compatible sources

from datap_rs.datapress import DataPress, DataPressConfig, DatasetConfig, S3Config

s3 = S3Config(
    region="us-east-1",
    endpoint="http://localhost:9000",   # MinIO / R2 / Wasabi / Backblaze
    addressing_style="path",            # or "virtual"
    allow_http=True,                    # only for non-https endpoints
)

ds = DatasetConfig(
    name="events",
    source="s3://events/2025/",
    format="parquet",                    # or "delta"
    s3=s3,
)

Credentials fall back to the standard AWS env vars (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, AWS_REGION) when not set inline.

Dynamic credentials provider

Pass a zero-argument callable returning an HMACKeyPair to resolve credentials at startup (e.g. from a secrets manager). It is invoked once when DataPress(...) is constructed, the result is cached indefinitely, and it takes precedence over any inline access_key_id / secret_access_key:

from datap_rs.datapress import S3Config, HMACKeyPair

def fetch_creds() -> HMACKeyPair:
    secret = my_secrets_client.get("datapress/s3")
    return HMACKeyPair(
        access_key=secret["access_key_id"],
        secret_key=secret["secret_access_key"],
    )

s3 = S3Config(
    region="us-east-1",
    endpoint="http://localhost:9000",
    allow_http=True,
    credentials_provider=fetch_creds,
)

Behind a reverse proxy

Set prefix to mount every route under a URL path — handy when nginx / Traefik / Caddy forwards the prefix verbatim:

DataPressConfig(backend="datafusion", port=8000, prefix="/datapress")
# → GET /datapress/api/v1/datasets, GET /datapress/health, ...

prefix must start with / and not end with /. Empty string (default) mounts at the root.

Response compression

Compression is on by default and negotiated per request via the Accept-Encoding header (gzip, brotli, zstd). Clients that want raw JSON send Accept-Encoding: identity or omit the header. Turn it off at the server when sitting behind a proxy that already compresses, or to save CPU on a trusted LAN:

DataPressConfig(backend="datafusion", port=8000, compress=False)

Request limits & timeouts

Two server-side guardrails are on by default:

DataPressConfig(
    backend="datafusion",
    port=8000,
    max_body_bytes=1_048_576,    # 413 above this; default 1 MiB
    max_page_size=100_000,       # clamp query page_size above this
    force_lazy_above_mb=0,       # >0: force lazy for datasets larger than this (MiB)
    request_timeout_ms=30_000,   # 504 above this; 0 disables; default 30s
    shutdown_timeout_secs=30,    # SIGTERM/SIGINT grace period, in seconds
)

Bodies larger than max_body_bytes are rejected with 413 Payload Too Large. Query page_size values larger than max_page_size are clamped before the backend runs. Handlers that take longer than request_timeout_ms are cancelled and the client sees 504 Gateway Timeout. Set the timeout to 0 to disable it entirely (useful behind a proxy that already enforces one).

DataFusion performance tuning

When backend="datafusion", five optional kwargs tune the parquet scan and object-store listing cache (they mirror the TOML [datafusion] block and are ignored by the DuckDB backend). All are off by default:

DataPressConfig(
    backend="datafusion",
    port=8000,
    datafusion_pushdown_filters=True,        # decode-time row filtering
    datafusion_reorder_filters=True,         # reorder predicates by selectivity
    datafusion_list_files_cache=True,        # cache S3/object-store LISTs
    datafusion_list_files_cache_mb=64,       # listing-cache budget (MiB)
    datafusion_list_files_cache_ttl_secs=60, # 0 = never expire
)

datafusion_pushdown_filters pushes row-level predicates into the parquet decoder so rows that fail a filter are never materialised (on top of the row-group / page-index pruning that always happens). Best for selective filters over large row groups.
datafusion_reorder_filters lets the scan reorder those pushed-down predicates by estimated selectivity — only has an effect together with datafusion_pushdown_filters.
datafusion_list_files_cache caches object-store file listings so repeated lazy queries reuse LIST results instead of re-listing the source prefix every time — the dominant per-query cost on S3. *_mb bounds the cache and *_ttl_secs bounds how long before newly written files become visible (0 = never expire). Note: this cache does not help delta sources (their file list comes from the transaction log, not an object-store LIST).

Graceful shutdown

On SIGTERM or SIGINT (Ctrl+C) the server stops accepting new connections, then waits up to shutdown_timeout_secs seconds for in-flight requests to finish before stopping workers. Set it lower for faster restarts, higher for long-running query handlers.

Client

A small sync client is bundled for talking to a running server:

from datap_rs import DataPressClient

c = DataPressClient("http://127.0.0.1:8000")
c.healthz()                                  # -> {"status": "ok"}
c.readyz()                                   # -> {"status": "ready", "datasets": N}
c.datasets()                                 # -> ["accidents", ...]
c.schema("accidents")                        # -> dict
c.count("accidents")                         # -> int
table = c.query("accidents", {               # -> pyarrow.Table
    "columns":   ["State", "Severity"],
    "page_size": 10_000,
})

query() requests Arrow IPC and returns a pyarrow.Table (pyarrow is imported lazily). For the JSON envelope verbatim, use query_json(). On non-2xx responses a DataPressHTTPError is raised with .status, .body and .payload.

Equality-index policy (DataFusion only)

DatasetConfig(
    name="big",
    source="data/big.parquet",
    mode="list",                                  # "auto" | "none" | "list"
    index_columns=["State", "Severity"],          # required for "list"
    index_max_cardinality=100_000,                # used by "auto"
)

auto — index every column whose distinct count stays below index_max_cardinality.
none — skip the index; every query goes through DataFusion SQL.
list — index only index_columns. Best for very wide datasets.

DuckDB ignores this block.

HTTP API

Same core routes for both backends.

Method	Path	Purpose
GET	`/health`	Liveness probe.
GET	`/api/v1/datasets`	List configured datasets.
GET	`/api/v1/datasets/{name}/schema`	Inferred columns + sample row.
POST	`/api/v1/datasets/{name}/query`	Filter + paginate.
POST	`/api/v1/datasets/{name}/count`	Total or filtered row count.
POST	`/api/v1/datasets/{name}/reload`	Atomic dataset reload (requires admin token).
POST	`/api/v1/sql`	Raw read-only SQL over one dataset (opt-in).

Query body

{
  "columns":   ["ID","City","State","Severity"],
  "predicates": [
    { "col": "State",    "op": "eq",  "val": "TX" },
    { "col": "Severity", "op": "gte", "val": 3   }
  ],
  "order_by": [ { "col": "Severity", "dir": "desc" } ],
  "limit":     1000,
  "page":      1,
  "page_size": 50
}

Field	Type	Default	Notes
`columns`	`string[]`	`[]`	Empty = all columns.
`predicates`	`Predicate[]`	`[]`	ANDed together.
`order_by`	`OrderBy[]`	`[]`	`{ col, dir? }`; `dir` is `asc` (default) or `desc`.
`group_by`	`string[]`	`[]`	Group-by columns; when set, `columns` is ignored.
`aggregations`	`Aggregation[]`	`[]`	`{ col?, op, alias? }`; ops: `count\|sum\|avg\|min\|max`. Requires `group_by`.
`distinct`	`bool`	`false`	Dedup the projected columns. Mutually exclusive with `group_by` / `aggregations`.
`limit`	`int` or null	`null`	Hard cap on total rows across pages.
`page`	`int >= 1`	`1`	1-based.
`page_size`	`int >= 1`	`1000`	Clamped to `DataPressConfig.max_page_size` (`100_000` by default).

Predicate operators

`op`	`val`	Meaning
`eq`	scalar	`col = val`
`neq`	scalar	`col <> val`
`gt` / `gte`	number / string	`col > val` / `col >= val`
`lt` / `lte`	number / string	`col < val` / `col <= val`
`like`	string with `%`/`_`	SQL `LIKE`
`ilike`	string with `%`/`_`	Case-insensitive `LIKE`
`in`	non-empty array	`col IN (v1, v2, …)`
`is_null`	omit	`col IS NULL`
`is_not_null`	omit	`col IS NOT NULL`

Grouping / aggregation

curl -X POST http://localhost:8000/api/v1/datasets/accidents/query \
  -H 'Content-Type: application/json' \
  -d '{
    "group_by": ["State"],
    "aggregations": [
      { "op":  "count" },
      { "col": "Severity", "op": "avg", "alias": "avg_sev" }
    ],
    "order_by": [{ "col": "count", "dir": "desc" }],
    "page_size": 10
  }'

When group_by is non-empty the SELECT list is derived from the group columns plus each aggregation's alias; the top-level columns field is ignored. aggregations without group_by returns 400. order_by keys must be a group column or aggregation alias.

Distinct

curl -X POST http://localhost:8000/api/v1/datasets/accidents/query \
  -H 'Content-Type: application/json' \
  -d '{ "columns": ["State"], "distinct": true, "order_by": [{"col":"State"}] }'

Mutually exclusive with group_by / aggregations.

Arrow IPC responses

Opt in per-request with the Accept header (or ?format=arrow) to skip the JSON envelope and receive an Arrow IPC stream instead:

import requests, pyarrow.ipc as ipc, polars as pl

r = requests.post(
    "http://localhost:8000/api/v1/datasets/accidents/query",
    json={"columns": ["ID","State"], "page_size": 1000},
    headers={"Accept": "application/vnd.apache.arrow.stream"},
)
table = ipc.open_stream(r.content).read_all()   # pyarrow.Table
df    = pl.from_arrow(table)                    # zero-copy → Polars
page, page_size = r.headers["X-Page"], r.headers["X-Page-Size"]

To read the complete result set into Polars, walk pages until the server returns fewer rows than requested:

import pyarrow as pa
import pyarrow.ipc as ipc
import polars as pl
import requests

ARROW = "application/vnd.apache.arrow.stream"


def query_all_polars(
    base_url: str,
    dataset: str,
    body: dict,
    page_size: int = 100_000,
) -> pl.DataFrame:
    tables: list[pa.Table] = []
    page = 1

    with requests.Session() as session:
        while True:
            response = session.post(
                f"{base_url.rstrip('/')}/api/v1/datasets/{dataset}/query",
                json={**body, "page": page, "page_size": page_size},
                headers={"Accept": ARROW},
            )
            response.raise_for_status()

            table = ipc.open_stream(response.content).read_all()
            tables.append(table)

            if table.num_rows < page_size:
                break
            page += 1

    table = tables[0] if len(tables) == 1 else pa.concat_tables(tables)
    return pl.from_arrow(table)

Use a deterministic order_by for full exports from datasets that may be reloaded while you page through results. Arrow IPC is supported by both backends.

Count body

Same predicate shape, no projection or pagination:

{ "predicates": [ { "col": "State", "op": "eq", "val": "TX" } ] }

Response: { "count": <int> }. Empty body ({}) counts every row. On materialised DataFusion datasets, the no-predicate case is O(1) and indexed eq / in predicates short-circuit through the equality index.

curl -X POST http://localhost:8000/api/v1/datasets/accidents/count \
  -H 'Content-Type: application/json' -d '{}'
# → { "count": 7728394 }

Raw SQL

POST /api/v1/sql runs a single read-only SELECT (or WITH … SELECT), or a DESCRIBE/DESC <table>, referencing exactly one registered dataset. It is disabled by default — a larger attack surface than the structured query API, so you opt in explicitly and the server parses and validates every statement before any engine sees it. While disabled the route returns 404, so its presence isn't even revealed.

Enable it on DataPressConfig (mirrors the TOML [sql] block):

cfg = DataPressConfig(
    backend="datafusion",
    port=8000,
    sql_enabled=True,        # exposes POST /api/v1/sql (default False)
    sql_max_rows=100_000,    # server-side hard cap on rows per query
)

Field	Default	Notes
`sql_enabled`	`False`	When `False`, the route responds `404`.
`sql_max_rows`	`100_000`	Hard cap; the result is wrapped in an outer `LIMIT` so it always applies.

Request body:

{
  "sql": "SELECT State, COUNT(*) AS n FROM accidents GROUP BY State ORDER BY n DESC",
  "max_rows": 500
}

max_rows is clamped into [1, sql_max_rows] — it can never raise the server cap. Omit it to use the configured cap. The dataset is named directly in the FROM clause using its configured name (case-insensitive). A CTE name is local to the query and is not treated as a dataset.

From the bundled client:

from datap_rs import DataPressClient

c = DataPressClient("http://127.0.0.1:8000")
rows = c.sql(
    "SELECT State, COUNT(*) AS n "
    "FROM accidents GROUP BY State ORDER BY n DESC",
    max_rows=10,
)
# -> [{"State": "CA", "n": 1234}, {"State": "TX", "n": 987}, ...]

Like /query, the response is content-negotiated: send Accept: application/vnd.apache.arrow.stream (or ?format=arrow) to receive an Arrow IPC stream instead of the JSON envelope.

curl -s http://localhost:8000/api/v1/sql \
  -H 'Content-Type: application/json' \
  -d '{"sql": "SELECT Severity, COUNT(*) AS n FROM accidents GROUP BY Severity", "max_rows": 100}'
# → { "data": [ { "Severity": 1, "n": 123 }, ... ], "max_rows": 100 }

Admin reload

POST /api/v1/datasets/{name}/reload rebuilds a dataset from its source and atomically swaps it in. Requires the X-Admin-Token header to match the configured admin token. Endpoint is disabled when no token is set (secure default).

Set the token either with the ADMIN_TOKEN env var or the admin_token kwarg on DataPressConfig (the kwarg wins when both are set):

cfg = DataPressConfig(
    backend="datafusion",
    port=8000,
    admin_token="supersecret",     # or set the ADMIN_TOKEN env var instead
)

curl -X POST -H "X-Admin-Token: supersecret" \
  http://localhost:8000/api/v1/datasets/accidents/reload
# → { "dataset": "accidents", "rows": 7728394, "elapsed_ms": 1842 }

In the Swagger UI, enter the token via the Authorize button (🔒) in the AdminToken field rather than typing the header by hand.

Reload publication is backend-specific. DataFusion uses a service-level double buffer: it builds a new DatasetState off to the side, then publishes it with an ArcSwap snapshot update. In-flight queries keep using the old Arrow buffers; later queries see the new state. Peak RSS can approach roughly twice the materialised dataset size during reload.

DuckDB delegates the heavy publication step to the engine with CREATE OR REPLACE TABLE ... AS SELECT .... DuckDB handles that as an ACID transaction over the table/catalog replacement: failures leave the existing table live, and successful reloads become visible atomically to later queries while in-flight queries continue against their starting snapshot. DataPress then refreshes its small cached schema and row-count metadata. Per-dataset reloads are serialised by an async mutex; reloads of different datasets run in parallel.

Authentication (OIDC / OAuth2)

Optional bearer-token enforcement against any OpenID Connect issuer (Keycloak, Auth0, Entra ID, Okta, Zitadel, …). Requires a wheel built with the auth Cargo feature:

maturin build --release --features auth

Pre-built PyPI wheels include it by default.

from datap_rs.datapress import (
    DataPress, DataPressConfig, DatasetConfig, AuthConfig,
)

auth = AuthConfig(
    enabled=True,
  issuer="https://issuer.example.com",
    audience="datapress-api",
    read_scopes=["datasets:read"],
    reload_scopes=["datasets:reload"],
    # anonymous_read=False,
    # algorithms=["RS256"],
    # leeway_secs=60,
    # jwks_refresh_secs=3600,
    # tenant_claim="/tenant_id",
    # allowed_tenants=["acme"],
    # admin_token_fallback=True,    # honour legacy X-Admin-Token
    # start_degraded=True,          # boot even if JWKS fetch fails
)

server = DataPress(cfg, datasets=[ds], auth=auth)
await server.run()

When enabled=False (default) all other fields are ignored and the server behaves exactly as before. Validation errors (missing issuer, malformed tenant_claim, …) raise ValueError at construction time.

Call any endpoint with Authorization: Bearer <jwt>. Reload endpoints require reload_scopes; read endpoints require read_scopes unless anonymous_read=True.

Use your provider's issuer URL exactly as it appears in the discovery document or JWT iss claim. /realms/<realm> is Keycloak-specific; many providers use URLs such as https://tenant.us.auth0.com/, https://login.microsoftonline.com/<tenant-id>/v2.0, or an Okta authorization-server URL.

AuthConfig applies to one server instance. For strict per-dataset scope boundaries from Python, run one DataPress instance per dataset or access domain and use scopes such as datasets:accidents:read / datasets:accidents:reload on that instance.

Try it locally

The repo ships a one-command Keycloak stack at examples/keycloak/ with a pre-provisioned realm, service-account client, scopes and a test user. docker compose up -d and point issuer at http://localhost:8080/realms/datapress.

PostgreSQL wire protocol (pgwire)

When the datap-rs wheel is built with the pgwire feature (the default for published PyPI wheels), the datafusion backend can expose a native PostgreSQL wire-protocol endpoint alongside the HTTP API. Any PostgreSQL client — psql, JDBC/ODBC drivers, BI tools like Power BI and Tableau — can then query your datasets directly without knowing anything about the DataPress HTTP API.

cfg = DataPressConfig(
    backend="datafusion",
    pgwire_enabled=True,
    pgwire_listen="127.0.0.1",   # loopback-only unless password + TLS set
    pgwire_port=5432,
    pgwire_username="datapress",
    # pgwire_password="change-me",   # required for non-loopback binds
    # pgwire_tls_cert="/etc/datapress/pg.crt",
    # pgwire_tls_key="/etc/datapress/pg.key",
)

Security rules enforced at startup (the server refuses to start otherwise):

A loopback bind (127.0.0.1 / ::1) may omit the password.
Any non-loopback bind (e.g. 0.0.0.0) requires both a password and TLS — credentials must never cross the network in the clear.
tls_cert and tls_key must be set together (both or neither).

JDBC and ODBC driver compatibility

Two independent JDBC paths are available:

Dedicated JDBC driver (datapress-jdbc) — a pure-Java Type-4 driver that talks directly to the DataPress HTTP API (Arrow IPC), without going through pgwire. It is the recommended choice for JVM applications and SQL tools like DBeaver and DataGrip. Requires sql_enabled=True on the server. Available on Maven Central:

<dependency>
  <groupId>org.datap-rs</groupId>
  <artifactId>datapress-jdbc</artifactId>
  <version>0.1.0</version>
</dependency>

See central.sonatype.com/artifact/org.datap-rs/datapress-jdbc for the latest version and Gradle coordinates.

PostgreSQL wire protocol (JDBC/ODBC) — any standard PostgreSQL JDBC driver (org.postgresql:postgresql) or ODBC driver (psqlODBC) also works once pgwire is enabled. Use a standard PostgreSQL connection string:

jdbc:postgresql://127.0.0.1:5432/datapress?user=datapress&password=change-me&sslmode=disable

Power BI Desktop connects via its built-in "PostgreSQL database" connector, which uses the ODBC driver under the hood on Windows.

Note: The pgwire endpoint is experimental. It emulates pg_catalog and information_schema to satisfy client introspection queries (schema browsing, type loading), but some advanced driver features — server-side cursors, COPY, write statements — are not supported. Standard SELECT / WITH … SELECT queries work across all tested clients. See the full pgwire client guide for per-client setup, TLS, and known limitations.

Choosing a backend

DuckDB — the safe default. Handles arbitrary SQL well, manages its own buffer pool, starts up in milliseconds because it lazily reads parquet pages on demand.
DataFusion — pick when the data fits in RAM and you repeatedly query the same columns with equality / IN predicates; the eq-index turns those into O(1) lookups. Also produces a leaner static binary (no vendored C++).

Both engines are compiled into the same wheel — switching is one keyword argument away.

Logging

datapress initialises env_logger on import. Control verbosity with the standard RUST_LOG variable:

RUST_LOG=info  python example.py
RUST_LOG=debug python example.py

License

MIT. See LICENSE in the source repo.

Source, issue tracker and Rust crates: https://github.com/jeroenflvr/datapress

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.8.5

Jul 22, 2026

0.7.2

Jul 21, 2026

0.7.0

Jul 17, 2026

0.6.3

Jul 9, 2026

0.6.2

Jul 8, 2026

0.6.1

Jul 7, 2026

0.6.0

Jul 6, 2026

0.5.6

Jul 4, 2026

0.4.27

Jun 30, 2026

0.4.13

Jun 8, 2026

0.4.11

Jun 7, 2026

0.4.10

Jun 7, 2026

0.4.9

Jun 6, 2026

0.4.0

Jun 4, 2026

0.3.3

Jun 3, 2026

0.3.2

Jun 3, 2026

0.3.0

Jun 2, 2026

0.2.18

Jun 2, 2026

0.2.17

Jun 1, 2026

0.2.16

May 31, 2026

0.2.15

May 31, 2026

0.2.14

May 31, 2026

0.2.13

May 31, 2026

0.2.12

May 31, 2026

0.2.11

May 31, 2026

0.2.10

May 31, 2026

0.2.9

May 31, 2026

0.2.8

May 31, 2026

0.2.7

May 31, 2026

0.2.6

May 30, 2026

0.2.5

May 30, 2026

0.2.2

May 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datap_rs-0.8.5.tar.gz (445.5 kB view details)

Uploaded Jul 22, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datap_rs-0.8.5-cp39-abi3-win_amd64.whl (88.1 MB view details)

Uploaded Jul 22, 2026 CPython 3.9+Windows x86-64

datap_rs-0.8.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (97.0 MB view details)

Uploaded Jul 22, 2026 CPython 3.9+manylinux: glibc 2.17+ x86-64

datap_rs-0.8.5-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (92.1 MB view details)

Uploaded Jul 22, 2026 CPython 3.9+manylinux: glibc 2.17+ ARM64

datap_rs-0.8.5-cp39-abi3-macosx_11_0_arm64.whl (89.0 MB view details)

Uploaded Jul 22, 2026 CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file datap_rs-0.8.5.tar.gz.

File metadata

Download URL: datap_rs-0.8.5.tar.gz
Upload date: Jul 22, 2026
Size: 445.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.14

File hashes

Hashes for datap_rs-0.8.5.tar.gz
Algorithm	Hash digest
SHA256	`71cd7722a17c83027f9c2a3b1b9607bd7bc1a964308a1dfaf7d51b950a82de50`
MD5	`b379ffc4efe804d5d827deab9ef4fc19`
BLAKE2b-256	`585bc2cad2ad75b909b97bfb878ef7792b22f1cfdeab62298405102975ae4f32`

See more details on using hashes here.

File details

Details for the file datap_rs-0.8.5-cp39-abi3-win_amd64.whl.

File metadata

Download URL: datap_rs-0.8.5-cp39-abi3-win_amd64.whl
Upload date: Jul 22, 2026
Size: 88.1 MB
Tags: CPython 3.9+, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.14

File hashes

Hashes for datap_rs-0.8.5-cp39-abi3-win_amd64.whl
Algorithm	Hash digest
SHA256	`988beb0aa24862d404105be43f737f195d47e07e045d1397a8ec35524484f60d`
MD5	`eea07b273ddf8b64825535faa917013c`
BLAKE2b-256	`ffe3412e1fbafbdf71e62009a65afea40ef1f1f77ee4429f5ba14fa3fdc3239c`

See more details on using hashes here.

File details

Details for the file datap_rs-0.8.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: datap_rs-0.8.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Jul 22, 2026
Size: 97.0 MB
Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.14

File hashes

Hashes for datap_rs-0.8.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`29f88b56b350b0084686400aea7c622b0f3b3e1af3fbab52dc204a8071242da2`
MD5	`b7f81fea062e12e5fa6a2a1f56fe753d`
BLAKE2b-256	`5caa878fcd7899556b6416163bf837e3d0fe8bc346130d4b377eaff677f070f4`

See more details on using hashes here.

File details

Details for the file datap_rs-0.8.5-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

Download URL: datap_rs-0.8.5-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Upload date: Jul 22, 2026
Size: 92.1 MB
Tags: CPython 3.9+, manylinux: glibc 2.17+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.14

File hashes

Hashes for datap_rs-0.8.5-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`1e821b28df2ba7f884b9e59e9ad750f1792c9a75af5b0208641581e6fc61d7d5`
MD5	`f6d26550248634416ea776ec52d59ca9`
BLAKE2b-256	`1d772487bc2f31fc68904d453af8d570d2169a11ef40eacf9c8a33b21f5238b5`

See more details on using hashes here.

File details

Details for the file datap_rs-0.8.5-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: datap_rs-0.8.5-cp39-abi3-macosx_11_0_arm64.whl
Upload date: Jul 22, 2026
Size: 89.0 MB
Tags: CPython 3.9+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.14

File hashes

Hashes for datap_rs-0.8.5-cp39-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`ba3cfd142cb2880cb7f90eabb0e2d5e2bc95976fce362b985c6a4ca31b2b86dd`
MD5	`32f567be9a86089b0891076a35bc09a6`
BLAKE2b-256	`79bdd819770f0aa8724df1277a57dba334428b2965d185408f32603a6d7350de`

See more details on using hashes here.

datap-rs 0.8.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

datap-rs

Install

Quick start

API surface

S3 / S3-compatible sources

Dynamic credentials provider

Behind a reverse proxy

Response compression

Request limits & timeouts

DataFusion performance tuning

Graceful shutdown

Client

Equality-index policy (DataFusion only)

HTTP API

Query body

Predicate operators

Grouping / aggregation

Distinct

Arrow IPC responses

Count body

Raw SQL

Admin reload

Authentication (OIDC / OAuth2)

Try it locally

PostgreSQL wire protocol (pgwire)

JDBC and ODBC driver compatibility

Choosing a backend

Logging

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes