Fast multi-backend (DuckDB / DataFusion) dataset HTTP server.

These details have not been verified by PyPI

Project links

Project description

datapress

A fast multi-backend dataset HTTP server, built in Rust and driven from Python.

datapress exposes one or more Parquet or Delta datasets over a small JSON HTTP API. It ships with two pluggable engines bundled into a single wheel — pick one at runtime:

DuckDB — battle-tested SQL, lazy parquet reads, low startup.
DataFusion — pure-Rust, in-memory RecordBatch + equality index for low-latency point lookups.

Identical request/response shapes across both, so you can A/B them under your real workload.

Install

pip install datapress
# or
uv pip install datapress

Wheels are published for macOS (arm64/x86_64), Linux (x86_64/aarch64) and Windows (x86_64) against CPython 3.9+ (abi3).

Quick start

import asyncio
from datapress import DataPress, DataPressConfig, DatasetConfig

async def main() -> None:
    ds = DatasetConfig(
        name="accidents",
        source="data/accidents.parquet",
        format="parquet",          # or "delta"
        mode="auto",               # eq-index policy: "auto" | "none" | "list"
        description="US accidents 2016-2023",
    )
    cfg = DataPressConfig(
        backend="datafusion",      # or "duckdb"
        listen="0.0.0.0",
        port=8000,
        workers=8,
    )
    server = DataPress(cfg, datasets=[ds])
    await server.run()              # blocks until SIGINT

if __name__ == "__main__":
    asyncio.run(main())

Hit it:

curl http://localhost:8000/api/datasets
curl http://localhost:8000/api/datasets/accidents/schema
curl -X POST http://localhost:8000/api/datasets/accidents/query \
  -H 'Content-Type: application/json' \
  -d '{
    "columns": ["ID","Severity","City","State"],
    "predicates": [
      { "col": "State",    "op": "eq",  "val": "TX" },
      { "col": "Severity", "op": "gte", "val": 3   }
    ],
    "page": 1, "page_size": 50
  }'

API surface

Four classes, no module-level state:

Class	Purpose
`DataPressConfig`	Server tuning: `backend`, `listen`, `port`, `workers`, `prefix`.
`DatasetConfig`	One dataset: `name`, `source`, `format`, `mode`, optional S3 + index.
`S3Config`	S3 / S3-compatible credentials and endpoint config.
`DataPress`	Built from a `DataPressConfig` + list of `DatasetConfig`. `await .run()`.

Hover any of them in your IDE for full kwarg docs.

S3 / S3-compatible sources

from datapress import DataPress, DataPressConfig, DatasetConfig, S3Config

s3 = S3Config(
    region="us-east-1",
    endpoint="http://localhost:9000",   # MinIO / R2 / Wasabi / Backblaze
    addressing_style="path",            # or "virtual"
    allow_http=True,                    # only for non-https endpoints
)

ds = DatasetConfig(
    name="events",
    source="s3://events/2025/",
    format="parquet",                    # or "delta"
    s3=s3,
)

Credentials fall back to the standard AWS env vars (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, AWS_REGION) when not set inline.

Behind a reverse proxy

Set prefix to mount every route under a URL path — handy when nginx / Traefik / Caddy forwards the prefix verbatim:

DataPressConfig(backend="datafusion", port=8000, prefix="/datapress")
# → GET /datapress/api/datasets, GET /datapress/health, ...

prefix must start with / and not end with /. Empty string (default) mounts at the root.

Equality-index policy (DataFusion only)

DatasetConfig(
    name="big",
    source="data/big.parquet",
    mode="list",                                  # "auto" | "none" | "list"
    index_columns=["State", "Severity"],          # required for "list"
    index_max_cardinality=100_000,                # used by "auto"
)

auto — index every column whose distinct count stays below index_max_cardinality.
none — skip the index; every query goes through DataFusion SQL.
list — index only index_columns. Best for very wide datasets.

DuckDB ignores this block.

HTTP API

Same five routes for both backends.

Method	Path	Purpose
GET	`/health`	Liveness probe.
GET	`/api/datasets`	List configured datasets.
GET	`/api/datasets/{name}/schema`	Inferred columns + sample row.
POST	`/api/datasets/{name}/query`	Filter + paginate.
POST	`/api/datasets/{name}/count`	Total or filtered row count.
POST	`/api/datasets/{name}/reload`	Atomic dataset reload (requires admin token).

Query body

{
  "columns":   ["ID","City","State","Severity"],
  "predicates": [
    { "col": "State",    "op": "eq",  "val": "TX" },
    { "col": "Severity", "op": "gte", "val": 3   }
  ],
  "page":      1,
  "page_size": 50
}

Field	Type	Default	Notes
`columns`	`string[]`	`[]`	Empty = all columns.
`predicates`	`Predicate[]`	`[]`	ANDed together.
`page`	`int >= 1`	`1`	1-based.
`page_size`	`int 1..=1000`	`100`	Clamped.

Predicate operators

`op`	`val`	Meaning
`eq`	scalar	`col = val`
`neq`	scalar	`col <> val`
`gt` / `gte`	number / string	`col > val` / `col >= val`
`lt` / `lte`	number / string	`col < val` / `col <= val`
`like`	string with `%`/`_`	SQL `LIKE`
`ilike`	string with `%`/`_`	Case-insensitive `LIKE`
`in`	non-empty array	`col IN (v1, v2, …)`
`is_null`	omit	`col IS NULL`
`is_not_null`	omit	`col IS NOT NULL`

Count body

Same predicate shape, no projection or pagination:

{ "predicates": [ { "col": "State", "op": "eq", "val": "TX" } ] }

Response: { "count": <int> }. Empty body ({}) counts every row. On materialised DataFusion datasets, the no-predicate case is O(1) and indexed eq / in predicates short-circuit through the equality index.

curl -X POST http://localhost:8000/api/datasets/accidents/count \
  -H 'Content-Type: application/json' -d '{}'
# → { "count": 7728394 }

Admin reload

POST /api/datasets/{name}/reload rebuilds a dataset from its source and atomically swaps it in. Requires the X-Admin-Token header to match the ADMIN_TOKEN env var. Endpoint is disabled when ADMIN_TOKEN is unset (secure default).

import os
os.environ["ADMIN_TOKEN"] = "supersecret"     # before constructing DataPress

curl -X POST -H "X-Admin-Token: supersecret" \
  http://localhost:8000/api/datasets/accidents/reload
# → { "dataset": "accidents", "rows": 7728394, "elapsed_ms": 1842 }

Choosing a backend

DuckDB — the safe default. Handles arbitrary SQL well, manages its own buffer pool, starts up in milliseconds because it lazily reads parquet pages on demand.
DataFusion — pick when the data fits in RAM and you repeatedly query the same columns with equality / IN predicates; the eq-index turns those into O(1) lookups. Also produces a leaner static binary (no vendored C++).

Both engines are compiled into the same wheel — switching is one keyword argument away.

Logging

datapress initialises env_logger on import. Control verbosity with the standard RUST_LOG variable:

RUST_LOG=info  python example.py
RUST_LOG=debug python example.py

License

MIT. See LICENSE in the source repo.

Source, issue tracker and Rust crates: https://github.com/jeroenflvr/fast-api

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.1

Jun 4, 2026

0.4.0

Jun 4, 2026

0.3.3

Jun 3, 2026

0.3.2

Jun 3, 2026

0.3.1

Jun 3, 2026

0.3.0

Jun 2, 2026

0.2.18

Jun 2, 2026

0.2.17

Jun 1, 2026

0.2.16

May 31, 2026

0.2.15

May 31, 2026

0.2.14

May 31, 2026

0.2.13

May 31, 2026

0.2.12

May 31, 2026

0.2.11

May 31, 2026

0.2.10

May 31, 2026

0.2.9

May 31, 2026

0.2.8

May 31, 2026

0.2.7

May 31, 2026

0.2.6

May 30, 2026

0.2.5

May 30, 2026

0.2.4

May 30, 2026

0.2.3

May 29, 2026

0.2.2

May 28, 2026

0.2.1

May 28, 2026

0.2.0

May 28, 2026

0.1.18

May 27, 2026

0.1.17

May 25, 2026

0.1.16

May 25, 2026

0.1.15

May 24, 2026

0.1.14

May 24, 2026

0.1.13

May 24, 2026

0.1.12

May 24, 2026

This version

0.1.11

May 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datap_rs-0.1.11.tar.gz (86.1 kB view details)

Uploaded May 24, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datap_rs-0.1.11-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (66.3 MB view details)

Uploaded May 24, 2026 CPython 3.9+manylinux: glibc 2.17+ x86-64

datap_rs-0.1.11-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (62.4 MB view details)

Uploaded May 24, 2026 CPython 3.9+manylinux: glibc 2.17+ ARM64

datap_rs-0.1.11-cp39-abi3-macosx_11_0_arm64.whl (58.5 MB view details)

Uploaded May 24, 2026 CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file datap_rs-0.1.11.tar.gz.

File metadata

Download URL: datap_rs-0.1.11.tar.gz
Upload date: May 24, 2026
Size: 86.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datap_rs-0.1.11.tar.gz
Algorithm	Hash digest
SHA256	`8de7e792ae65526d1fa101f40a1c4b71263e56b9f896b83c3e4095c844a55606`
MD5	`70bf1d9f20bf125ae1f90f53fd79b6ba`
BLAKE2b-256	`0384d904f9ff74c7524e00f0bcf05f87ac5226185bbdefcc68af28df38d84674`

See more details on using hashes here.

File details

Details for the file datap_rs-0.1.11-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: datap_rs-0.1.11-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: May 24, 2026
Size: 66.3 MB
Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datap_rs-0.1.11-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`07100052a9d103766cebc78bddbfd0f9f1d5d17120cde516a388f8607c2b42d8`
MD5	`b78d0cc9e38812e5e38c550fb20529f2`
BLAKE2b-256	`5aee18509ec1849d74dbd6d5e62c0c79f0181d57852de1edd984e1c8440f03b1`

See more details on using hashes here.

File details

Details for the file datap_rs-0.1.11-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

Download URL: datap_rs-0.1.11-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Upload date: May 24, 2026
Size: 62.4 MB
Tags: CPython 3.9+, manylinux: glibc 2.17+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datap_rs-0.1.11-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`fbae42ef0e71cb601adf7462b9705970fba40faf9f2c47da82e4c3dccde4a2a1`
MD5	`5fc9cc84b6674c0f372409e2839da706`
BLAKE2b-256	`f4b6784a829fe1d7c462085228fee6f0221bad4b525640dd73c9d75b00e40b2c`

See more details on using hashes here.

File details

Details for the file datap_rs-0.1.11-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: datap_rs-0.1.11-cp39-abi3-macosx_11_0_arm64.whl
Upload date: May 24, 2026
Size: 58.5 MB
Tags: CPython 3.9+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datap_rs-0.1.11-cp39-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`08f7fde5f20afde6a85480805f64057f0503594154e7054a577b144c950a593f`
MD5	`97eef5cc2bc2473099ad2258122c6283`
BLAKE2b-256	`a7ad56e675fb00b81b3c7db346e4f9d32fb542d5f2b0ef6296b47020f7afdebc`

See more details on using hashes here.

datap-rs 0.1.11

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

datapress

Install

Quick start

API surface

S3 / S3-compatible sources

Behind a reverse proxy

Equality-index policy (DataFusion only)

HTTP API

Query body

Predicate operators

Count body

Admin reload

Choosing a backend

Logging

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes