Skip to main content

Opteryx Query Engine

Project description

Opteryx-Core

Opteryx-Core is the SQL execution engine behind opteryx.app. It is a fork of Opteryx with a smaller, more opinionated API and configuration surface, shaped around the workloads we run in the hosted service.

This library is designed for fast, read-heavy analytical queries over Parquet-backed data. It handles SQL parsing, planning, predicate pushdown, projection pruning, and execution so you can query datasets from Python without standing up a separate warehouse.

It is fair to say this project is opinionated toward the needs of opteryx.app. That said, it is still useful as a standalone library, especially if you want to query local Parquet-backed datasets via registered workspaces, embed SQL into a Python service or notebook, or experiment with the engine directly.

Install

pip install opteryx-core

Import it as:

import opteryx

Quick Start: Query Local Files

If your current working directory contains local Parquet data, the simplest way to use Opteryx-Core is to register a local workspace and query it with dot-separated names.

import opteryx
from opteryx.connectors import DiskConnector

opteryx.register_workspace("data", DiskConnector)

session = opteryx.session()
result = session.execute_to_arrow(
    "SELECT id, name FROM data.planets WHERE id < 5"
)

print(result)

In this model, dataset names are resolved relative to the current working directory. For example, data.planets resolves to ./data/planets, and Opteryx-Core will read the Parquet files it finds there.

What It Is For

  • Powering the execution layer used by opteryx.app
  • Running analytical SQL against local Parquet-backed datasets
  • Embedding a query engine inside Python applications, scripts, notebooks, and services
  • Working on engine internals such as planning, execution, and Parquet performance

Best With Opteryx Catalog

Opteryx-Core works best when paired with the opteryx_catalog library. That is the intended model for named datasets, catalog-backed tables, and the general experience used in opteryx.app.

Typical setup:

import os

import opteryx

from opteryx import set_default_connector
from opteryx.connectors import OpteryxConnector
from opteryx_catalog import OpteryxCatalog

set_default_connector(
    OpteryxConnector,
    catalog=OpteryxCatalog,
    firestore_project=os.environ["GCP_PROJECT_ID"],
    firestore_database=os.environ["FIRESTORE_DATABASE"],
    gcs_bucket=os.environ["GCS_BUCKET"],
)

Once configured, you can query catalog-backed datasets using dot-separated names such as public.space.planets or opteryx.ops.billing.

For local data, Opteryx-Core is typically used through registered workspaces such as testdata, scratch, or data. Queries refer to datasets by dot-separated names relative to the workspace root, for example testdata.planets, testdata.satellites, or scratch.signals.

Where It Fits

Opteryx-Core is best thought of as an embedded analytical engine rather than a full end-user platform. If you want a hosted experience, multi-tenant service features, and the broader product workflow, use opteryx.app. If you want the core engine in your own environment, this package gives you that engine directly. If you want the intended table-resolution model, pair it with opteryx_catalog.

Contributing

If you use Opteryx-Core yourself, we want to hear from you.

  • Use it on your own datasets
  • Raise bugs when queries, schemas, or performance do not behave as expected
  • Open pull requests for fixes, tests, docs, or performance improvements
  • Share repro cases, failing queries, and edge-case Parquet files

This project is being actively built, and outside usage helps make it better.

Docs: https://docs.opteryx.app/ • Source: https://github.com/mabel-dev/opteryx-core • License: Apache-2.0

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opteryx_core-0.6.45.tar.gz (14.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

opteryx_core-0.6.45-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (76.6 MB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.17+ x86-64

opteryx_core-0.6.45-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (76.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

File details

Details for the file opteryx_core-0.6.45.tar.gz.

File metadata

  • Download URL: opteryx_core-0.6.45.tar.gz
  • Upload date:
  • Size: 14.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for opteryx_core-0.6.45.tar.gz
Algorithm Hash digest
SHA256 ed0aef1b20fac1a7a1efa59d1d2a4ae0bd9c1ae2e1311f71216579b78c6f0dab
MD5 885f279f119427505fa016d5eaef9d2a
BLAKE2b-256 c4603fb6373c3ccf78b7ab3b6dbe16a58641196d0e700ebf37fdb77c2b5dc898

See more details on using hashes here.

File details

Details for the file opteryx_core-0.6.45-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for opteryx_core-0.6.45-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 f8d8f6c12972109a1e2c100f67a5312ec4f14224b946bfcc5dbafa6a89ac2459
MD5 9c4f9eb141c5a683972f6fd5d3555452
BLAKE2b-256 3e0d8850b32f66cba188b8100617eb85daf9de8a85f6d7d928b8f9f2e9784a61

See more details on using hashes here.

File details

Details for the file opteryx_core-0.6.45-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for opteryx_core-0.6.45-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 8e7ac2f236b8f62036a3b6548cc0938dff4100697f5a0dbf2ed8f9fd637dd3a7
MD5 f3389e633e3a4b1ce13dbc4225840b7b
BLAKE2b-256 275dcbd20c51d5d665b62e0fc059f728fce54a929342758c8380a6ac7b9fd045

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page