Skip to main content

Distributed SQL Engine

Project description

archaeopteryx

A query engine for your data, no database required

Documentation | Examples | Contributing | Blog


NOTE !Opteryx is an alpha product. Alpha means different things to different people, to us, being alpha means:

  • Some features you may expect from a Query Engine may not be available
  • Some features may have undetected bugs in them
  • Some previously working features may break

Opteryx has no server component, Opteryx just runs when you need it making it ideal for deployments to platforms like Kubernetes, GCP Cloud Run, AWS Fargate and Knative.

Status Regression Suite Static Analysis PyPI Latest Release opteryx Downloads Code style: black commit_freq last_commit

How Can I Contribute?

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.

If you have a suggestion for an improvement or a bug, raise a ticket or start a discussion.

Want to help build mabel? See the contribution guidance.

What Opteryx is

How is it different to SQLite or DuckDB?

Opteryx is solving a different problem in the same space as these solutions. Opteryx avoids loading the dataset into memory unless there is no other option, as such it can query petabytes of data on a single, modest sized node.

This also means that queries are not as fast as solutions like SQLite or DuckDB.

How is it different to MySQL or BigQuery?

Opteryx is an ad hoc database, if it can read the files, it can be used to query the contents of them. This means it can leverage data files used by other systems.

Opteryx is read-only, you can't update or delete data, and it also doesn't have or enforce indexes in your data.

How is it differnt to Trino?

Opteryx is designed to run in a serverless environment where there is no persistent state. There is no server or coordinator for Opteryx, the Engine is only running when it is serving queries.

When you are not running queries, your cost to run Opteryx is nil (+). This is particularly useful if you have a small team accessing data.

This also means the Query Engine can scale quickly to respond to demand, running Opteryx in an environment like Cloud Run on GCP, you can scale from 0 to 1000 concurrent queries within seconds - back to 0 almost as quickly.

(+) depending on specifics of your hosting arrangement.

Security

See the project security policy for information about reporting vulnerabilities.

License

License

The foundational technologies in Opteryx are:

  • Apache Arrow memory model and compute kernels for efficient processing of data
  • Parts of PyArrow_Ops by Tom Scheffers has been integrated into the codebase to support handling Arrow structured data
  • sqloxide is used to parse SQL queries to syntax trees
  • cython
  • numpy
  • orjson
  • [cityhash] is used for non-cryptographic hashing

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opteryx-0.0.0a54.tar.gz (328.3 kB view hashes)

Uploaded Source

Built Distributions

opteryx-0.0.0a54-cp39-cp39-win_amd64.whl (293.0 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

opteryx-0.0.0a54-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (918.7 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

opteryx-0.0.0a54-cp39-cp39-macosx_10_15_x86_64.whl (296.5 kB view hashes)

Uploaded CPython 3.9 macOS 10.15+ x86-64

opteryx-0.0.0a54-cp38-cp38-win_amd64.whl (292.7 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

opteryx-0.0.0a54-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (922.0 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

opteryx-0.0.0a54-cp38-cp38-macosx_10_14_x86_64.whl (292.0 kB view hashes)

Uploaded CPython 3.8 macOS 10.14+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page