Skip to main content

Opteryx Query Engine

Project description

Opteryx Core

Opteryx Core is the SQL execution engine behind opteryx.app. It is a fork of Opteryx with a smaller, more opinionated API and configuration surface, shaped around the workloads used by the hosted service.

This library is designed for fast, read-heavy analytical queries over Parquet-backed data. It handles SQL parsing, planning, predicate pushdown, projection pruning, and execution so you can query datasets from Python without standing up a separate warehouse.

This project is opinionated toward the needs of opteryx.app. It is still useful as a standalone library if you want to query local Parquet, NDJSON, and CSV datasets, embed SQL into a Python service or notebook, or experiment with engine internals directly.

Requirements

  • Python 3.13
  • A C/C++ toolchain for local source builds
  • Rust/Cargo for the Rust extension in src/

Install

pip install opteryx-core

Import it as:

import opteryx

Quick Start: Query Local Files

If your current working directory contains local Parquet data, the simplest way to use Opteryx Core is to register a local workspace and query it with dot-separated names.

import opteryx
from opteryx.connectors import DiskConnector

opteryx.register_workspace("data", DiskConnector)

session = opteryx.session()
result = session.execute_to_arrow(
    "SELECT id, name FROM data.planets WHERE id < 5"
)

print(result)

In this model, dataset names are resolved relative to the current working directory. For example, data.planets resolves to ./data/planets, and Opteryx Core reads the Parquet files it finds there.

What It Is For

  • Powering the execution layer used by opteryx.app
  • Running analytical SQL against local Parquet-backed datasets
  • Embedding a query engine inside Python applications, scripts, notebooks, and services
  • Working on engine internals such as planning, execution, and Parquet performance

Local Development

The supported local build path is the repository Makefile:

make dev-install
make compile
make q

Useful targets:

Target Purpose
make compile Clean in-place build of Cython, C++, and Rust extensions
make c Incremental extension build
make q Fast SQL shape smoke test
make test Full pytest suite after compiling
make dt Draken native unit tests
make check Ruff and import-order checks without modifying files

Do not use pip install . as the primary development build path; make compile matches the layout expected by this repository.

Repository Layout

Path Purpose
opteryx/ Python package, planner, operators, connectors, expression evaluation, and Cython modules
draken/ Native columnar vector substrate used by the execution engine
rugo/ Internal Parquet and JSONL reader used by scans and metadata paths
src/ Rust extension code, currently including the SQL dialect integration
tests/ Unit, integration, fuzzing, sqllogictest, and benchmark harnesses
testdata/ Local datasets and benchmark fixtures
dev/ Development, release, vendoring, and analysis scripts
scratch/ Experimental prototypes and one-off investigations
third_party/ Vendored native dependencies

Best With Opteryx Catalog

Opteryx Core works best when paired with the opteryx_catalog library. That is the intended model for named datasets, catalog-backed tables, and the general experience used in opteryx.app.

Typical setup:

import os

import opteryx

from opteryx import set_default_connector
from opteryx.connectors import OpteryxConnector
from opteryx_catalog import OpteryxCatalog

set_default_connector(
    OpteryxConnector,
    catalog=OpteryxCatalog,
    firestore_project=os.environ["GCP_PROJECT_ID"],
    firestore_database=os.environ["FIRESTORE_DATABASE"],
    gcs_bucket=os.environ["GCS_BUCKET"],
)

Once configured, you can query catalog-backed datasets using dot-separated names such as public.space.planets or opteryx.ops.billing.

For local data, Opteryx Core is typically used through registered workspaces such as testdata, scratch, or data. Queries refer to datasets by dot-separated names relative to the workspace root, for example testdata.planets, testdata.satellites, or scratch.signals.

Where It Fits

Opteryx Core is best thought of as an embedded analytical engine rather than a full end-user platform. If you want a hosted experience, multi-tenant service features, and the broader product workflow, use opteryx.app. If you want the core engine in your own environment, this package gives you that engine directly. If you want the intended table-resolution model, pair it with opteryx_catalog.

Contributing

If you use Opteryx-Core yourself, we want to hear from you.

  • Use it on your own datasets
  • Raise bugs when queries, schemas, or performance do not behave as expected
  • Open pull requests for fixes, tests, docs, or performance improvements
  • Share repro cases, failing queries, and edge-case Parquet files

This project is being actively built, and outside usage helps make it better.

Docs: https://docs.opteryx.app/ Source: https://github.com/mabel-dev/opteryx-core License: Apache-2.0

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opteryx_core-0.8.6.tar.gz (10.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

opteryx_core-0.8.6-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (97.6 MB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.17+ x86-64

opteryx_core-0.8.6-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (96.0 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

File details

Details for the file opteryx_core-0.8.6.tar.gz.

File metadata

  • Download URL: opteryx_core-0.8.6.tar.gz
  • Upload date:
  • Size: 10.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for opteryx_core-0.8.6.tar.gz
Algorithm Hash digest
SHA256 500fea117a6a1cddc453b314d3b5cb7a890482245cc76e5f1f729f0bcd37639b
MD5 1371f46bc4bed6722e13d1e0bb4a819d
BLAKE2b-256 ffc2b253f6b822f014e3d7827990839af4376c24d810adaf432e2e87515b505e

See more details on using hashes here.

File details

Details for the file opteryx_core-0.8.6-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for opteryx_core-0.8.6-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 43a5bfc01a75055f9761f56121d638c98e1d6e5a0776c242efc222be6cfa38de
MD5 f335249f5c36c2c18ecdb283e5a1a9f8
BLAKE2b-256 20ae9e7436d8136c7e3248951925d58252da07ac7ad12a422c3ad3bf74d044b0

See more details on using hashes here.

File details

Details for the file opteryx_core-0.8.6-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for opteryx_core-0.8.6-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 66f683d01c5255e438937398c2d41dfcae15f41ce98cf4bb68d61b10aa60f0c2
MD5 ec26a5fd5d31f82430714b144c8ca079
BLAKE2b-256 0226068baa859e2823fa8df9d23704686def695baa10f5ad4442c9b0f3000d43

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page