Skip to main content

Python SQL Query Engine

Project description

Opteryx

Query your data, where it lives.

A unified SQL interface to unlock insights across your diverse data sources, from blobs stores to databases - effortless cross-platform data analytics.

Resource Location
Source Code https://github.com/mabel-dev/opteryx
Documentation https://opteryx.dev/
Download https://pypi.org/project/opteryx/

PyPI Latest Release Downloads codecov opteryx PyPI Latest Release

InstallExamplesGet Involved

What is Opteryx?

Opteryx champions the SQL-on-everything approach, streamlining cross-platform data analytics by federating SQL queries across diverse data sources, including database systems like Postgres and datalake file formats like Parquet. The goal is to enhance your data analytics process by offering a unified way to access data from across your organization.

Opteryx is a Python library that combines elements of in-process database engines like SQLite and DuckDB with federative features found in systems like Presto and Trino. The result is a versatile tool for querying data across multiple data sources in a seamless fashion.

Opteryx offers the following features:

  • SQL queries on data files generated by other processes, such as logs
  • A command-line tool for filtering, transforming, and combining files
  • Integration with familiar tools like pandas and Polars
  • Embeddable as a low-cost engine, enabling portability and allowing for hundreds of analysts to leverage ad hoc databases with ease
  • Unified and federated access to data on disk, in the cloud, and in on-premises databases, not only through the same interface but in the same query

How Does it Work?

Opteryx processes queries by first determining the appropriate query language to interact with different downstream data platforms. It translates your query into SQL, CQL, or another suitable format for document stores like MongoDB, based on the data source. This enables Opteryx to efficiently retrieve the necessary data from systems such as MySQL or MongoDB to respond to your query.

Opteryx

Why Use Opteryx?

Familiar Interface

Opteryx supports key parts of the Python DBAPI and SQL92 standard standards which many analysts and engineers will already know how to use.

Consistent Syntax

Opteryx creates a common SQL-layer over multiple data platforms, allowing backend systems to be upgraded, migrated or consolidated without changing any Opteryx code.

Where possible, errors and warnings returned by Opteryx help the user to understand how to fix their statement to reduce time-to-success for even novice SQL users.

Consumption-Based Billing Friendly

Opteryx is well-suited for deployments to environments which are pay-as-you-use, like Google Cloud Run. Great for situations where you have low-volume usage, or multiple environments, where the costs of many traditional database deployment can quickly add up.

Python Ecosystem

Opteryx is Open Source Python, it quickly and easily integrates into Python code, including Jupyter Notebooks, so you can start querying your data within a few minutes. Opteryx integrates with many of your favorite Python data tools, you can use Opteryx to run SQL against pandas and Polars DataFrames, and even execute a JOIN on an in-memory DataFrame and a remote SQL dataset.

Time Travel

Designed for data analytics in environments where decisions need to be replayable, Opteryx allows you to query data as at a point in time in the past to replay decision algorithms against facts as they were known in the past. You can even self-join tables historic data, great for finding deltas in datasets over time. (data must be structured to enable temporal queries)

Fast

Benchmarks on M2 Pro Mac running an ad hoc GROUP BY over a 6 million row parquet file via the CLI in ~1/4th of a second from a cold start (no caching and predefined schema). (different systems will have different performance characteristics)

Instant Elasticity

Designed to run in Knative and similar environments like Google Cloud Run, Opteryx can scale down to zero, and scale up to respond to thousands of concurrent queries within seconds.

Bring your own Data

Opteryx

Opteryx supports multiple query engines, dataframe APIs and storage formats. You can mix-and-match sources in a single query. Opteryx can even JOIN datasets stored in different formats and different platforms in the same query, such as Parquet and MySQL.

Opteryx allows you to query your data directly in the systems where they are stored, eliminating the need to duplicate data into a common store for analytics. This saves you the cost and effort of maintaining duplicates.

Opteryx can push parts of your query to the source query engine, allowing queries to run at the speed of the backend, rather than your local computer.

And if there's not a connector in the box for your data platform; feel free to submit a pull request to add one.

Install

Installing from PyPI is recommended.

pip install opteryx

To build Opteryx from source, refer to the contribution guides.

Opteryx installs with a small set of libraries it needs for core functionality, such as Numpy, PyArrow, and orjson. Some features require additional libraries to be installed, you are notified of these libraries as they are required.

Examples

Filter a Dataset on the Command Line

In this example, we are running Opteryx from the command line to filter one of the internal example datasets and display the results on the console.

python -m opteryx "SELECT * FROM \$astronauts WHERE 'Apollo 11' IN UNNEST(missions);"

Opteryx this example is complete and should run as-is

Execute a Simple Query in Python

In this example, we are showing the basic usage of the Python API by executing a simple query that makes no references to any datasets.

# Import the Opteryx SQL query engine library.
import opteryx

# Execute a SQL query to evaluate the expression 4 * 7.
# The result is stored in the 'result' variable.
result = opteryx.query("SELECT 4 * 7;")

# Display the first row(s) of the result to verify the query executed correctly.
result.head()
ID 4 * 7
1 28

this example is complete and should run as-is

Execute SQL on a pandas DataFrame

In this example, we are running a SQL statement on a pandas DataFrame and returning the result as a new pandas DataFrame.

# Required imports
import opteryx
import pandas

# Read data from the exoplanets.csv file hosted on Google Cloud Storage
# The resulting DataFrame is stored in the variable `pandas_df`.
pandas_df = pandas.read_csv("https://storage.googleapis.com/opteryx/exoplanets/exoplanets.csv")

# Register the pandas DataFrame with Opteryx under the alias "exoplanets"
# This makes the DataFrame available for SQL-like queries.
opteryx.register_df("exoplanets", pandas_df)

# Perform an SQL query to group the data by `koi_disposition` and count the number
# of occurrences of each distinct `koi_disposition`.
# The result is stored in `aggregated_df`.
aggregated_df = opteryx.query("SELECT koi_disposition, COUNT(*) FROM exoplanets GROUP BY koi_disposition;").pandas()

# Display the aggregated DataFrame to get a preview of the result.
aggregated_df.head()
  koi_disposition  COUNT(*)
0       CONFIRMED      2293
1  FALSE POSITIVE      5023
2       CANDIDATE      2248 

this example is complete and should run as-is

Query Data on Local Disk

In this example, we are querying and filtering a file directly. This example will not run as written because the file being queried does not exist.

# Import the Opteryx query engine.
import opteryx

# Execute a SQL query to select the first 5 rows from the 'space_missions.parquet' table.
# The result will be stored in the 'result' variable.
result = opteryx.query("SELECT * FROM 'space_missions.parquet' LIMIT 5;")

# Display the result.
# This is useful for quick inspection of the data.
result.head()
ID Company Location Price Launched_at Rocket Rocket_Status Mission Mission_Status
0 RVSN USSR Site 1/5, Baikonur Cosmodrome, null 1957-10-04 19:28:00 Sputnik 8K71PS Retired Sputnik-1 Success
1 RVSN USSR Site 1/5, Baikonur Cosmodrome, null 1957-11-03 02:30:00 Sputnik 8K71PS Retired Sputnik-2 Success
2 US Navy LC-18A, Cape Canaveral AFS, Fl null 1957-12-06 16:44:00 Vanguard Retired Vanguard TV3 Failure
3 AMBA LC-26A, Cape Canaveral AFS, Fl null 1958-02-01 03:48:00 Juno I Retired Explorer 1 Success
4 US Navy LC-18A, Cape Canaveral AFS, Fl null 1958-02-05 07:33:00 Vanguard Retired Vanguard TV3BU Failure

this example requires a data file, space_missions.parquet.

Query Data in SQLite

In this example, we are querying a SQLite database via Opteryx. This example will not run as written because the file being queried does not exist.

# Import the Opteryx query engine and the SqlConnector from its connectors module.
import opteryx
from opteryx.connectors import SqlConnector

# Register a new data store with the prefix "sql", specifying the SQL Connector to handle it.
# This allows queries with the 'sql' prefix to be routed to the appropriate SQL database.
opteryx.register_store(
   prefix="sql",  # Prefix for distinguishing this particular store
   connector=SqlConnector,  # Specify the connector to handle queries for this store
   remove_prefix=True,  # Remove the prefix from the table name when querying SQLite
   connection="sqlite:///database.db"  # SQLAlchemy connection string for the SQLite database
)

# Execute a SQL query to select specified columns from the 'planets' table in the SQL store,
# limiting the output to 5 rows. The result is stored in the 'result' variable.
result = opteryx.query("SELECT name, mass, diameter, density FROM sql.planets LIMIT 5;")

# Display the result.
# This is useful for quickly verifying that the query executed correctly.
result.head()
ID name mass diameter density
1 Mercury 0.33 4879 5427
2 Venus 4.87 12104 5243
3 Earth 5.97 12756 5514
4 Mars 0.642 6792 3933
5 Jupiter 1898.0 142984 1326

this example requires a data file, database.db.

Query Data on GCS

In this example, we are to querying a dataset on GCS in a public bucket called 'opteryx'.

# Import the Opteryx query engine and the GcpCloudStorageConnector from its connectors module.
import opteryx
from opteryx.connectors import GcpCloudStorageConnector

# Register a new data store named 'opteryx', specifying the GcpCloudStorageConnector to handle it.
# This allows queries for this particular store to be routed to the appropriate GCP Cloud Storage bucket.
opteryx.register_store(
    "opteryx",  # Name of the store to register
    GcpCloudStorageConnector  # Connector to handle queries for this store
)

# Execute a SQL query to select all columns from the 'space_missions' table located in the 'opteryx' store,
# and limit the output to 5 rows. The result is stored in the 'result' variable.
result = opteryx.query("SELECT * FROM opteryx.space_missions LIMIT 5;")

# Display the result.
# This is useful for quickly verifying that the query executed correctly.
result.head()
ID Company Location Price Launched_at Rocket Rocket_Status Mission Mission_Status
0 RVSN USSR Site 1/5, Baikonur Cosmodrome, null 1957-10-04 19:28:00 Sputnik 8K71PS Retired Sputnik-1 Success
1 RVSN USSR Site 1/5, Baikonur Cosmodrome, null 1957-11-03 02:30:00 Sputnik 8K71PS Retired Sputnik-2 Success
2 US Navy LC-18A, Cape Canaveral AFS, Fl null 1957-12-06 16:44:00 Vanguard Retired Vanguard TV3 Failure
3 AMBA LC-26A, Cape Canaveral AFS, Fl null 1958-02-01 03:48:00 Juno I Retired Explorer 1 Success
4 US Navy LC-18A, Cape Canaveral AFS, Fl null 1958-02-05 07:33:00 Vanguard Retired Vanguard TV3BU Failure

this example is complete and should run as-is


You can also try Opteryx right now using our interactive labs on Binder.

Binder

Community

Discord X Follow Medium

Get Involved

  • :star: Star this repo
  • Contribute — join us in building Opteryx, through writing code, or inspiring others to use it.
  • Let us know your ideas, how you are using Opteryx, or report a bug or feature request.
  • See the contributor documentation for Opteryx. It's easy to get started, and we're really friendly if you need any help!
  • If you're interested in contributing to the code now, check out GitHub issues. Feel free to ask questions or open a draft PR.

Security

Static Analysis Vulnerabilities Security Rating

See the project Security Policy for information about reporting vulnerabilities.

License

License FOSSA Status

Opteryx is licensed under Apache 2.0 except where specific modules note otherwise.

Status

Status

Opteryx is in beta. Beta means different things to different people, to us, being beta means:

  • Core functionality has good regression test coverage to help ensure stability
  • Some edge cases may have undetected bugs
  • Performance tuning is incomplete
  • Changes are focused on feature completion, bugs, performance, reducing debt, and security
  • Code structure and APIs are not stable and may change

Related Projects

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opteryx-0.18.2b854.tar.gz (1.6 MB view details)

Uploaded Source

Built Distributions

opteryx-0.18.2b854-cp312-cp312-win_amd64.whl (3.2 MB view details)

Uploaded CPython 3.12 Windows x86-64

opteryx-0.18.2b854-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.8 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

opteryx-0.18.2b854-cp312-cp312-macosx_10_15_universal2.whl (6.1 MB view details)

Uploaded CPython 3.12 macOS 10.15+ universal2 (ARM64, x86-64)

opteryx-0.18.2b854-cp311-cp311-win_amd64.whl (3.2 MB view details)

Uploaded CPython 3.11 Windows x86-64

opteryx-0.18.2b854-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

opteryx-0.18.2b854-cp311-cp311-macosx_10_15_universal2.whl (6.1 MB view details)

Uploaded CPython 3.11 macOS 10.15+ universal2 (ARM64, x86-64)

opteryx-0.18.2b854-cp310-cp310-win_amd64.whl (3.2 MB view details)

Uploaded CPython 3.10 Windows x86-64

opteryx-0.18.2b854-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.3 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

opteryx-0.18.2b854-cp310-cp310-macosx_10_15_universal2.whl (6.1 MB view details)

Uploaded CPython 3.10 macOS 10.15+ universal2 (ARM64, x86-64)

opteryx-0.18.2b854-cp39-cp39-win_amd64.whl (3.2 MB view details)

Uploaded CPython 3.9 Windows x86-64

opteryx-0.18.2b854-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.3 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

opteryx-0.18.2b854-cp39-cp39-macosx_10_15_universal2.whl (6.1 MB view details)

Uploaded CPython 3.9 macOS 10.15+ universal2 (ARM64, x86-64)

File details

Details for the file opteryx-0.18.2b854.tar.gz.

File metadata

  • Download URL: opteryx-0.18.2b854.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for opteryx-0.18.2b854.tar.gz
Algorithm Hash digest
SHA256 b44acb9b76542b2516a99d84ea332fed3a2d44b9efd837c1ecc516c69bc357ec
MD5 a647fdca302dca287933f998da8ffb78
BLAKE2b-256 1021da150a56f62095fde1e2114d2336a7d7f2fb2666dc9fcdb3441b1eb8538a

See more details on using hashes here.

File details

Details for the file opteryx-0.18.2b854-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for opteryx-0.18.2b854-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 48705b86bc1265dee529e1745c28cbbec29f0cd8e5db25ec2739c2cabb7a7be1
MD5 961edb3ddf9fdaf08742cd56b776961e
BLAKE2b-256 218637d1f49ef833185c9f8eb88dc1b1b9a04555b009612de9044bbfac217436

See more details on using hashes here.

File details

Details for the file opteryx-0.18.2b854-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for opteryx-0.18.2b854-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3cd004fe2b8bee92b3f71a69ea5343378c818365116b9d2cac31935779fe351a
MD5 7ce71534656918fec844ae3b8afbea55
BLAKE2b-256 b916d94efa02c60b11f72b8f5752155c3fc786c523b79c787a62e7ec62bf754f

See more details on using hashes here.

File details

Details for the file opteryx-0.18.2b854-cp312-cp312-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for opteryx-0.18.2b854-cp312-cp312-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 fbbda035f2a75357663de305d8bc27a4927205b751e87c78ebf3ad86b990739a
MD5 11fcd15fa361ee81c1731d2bfe8b7125
BLAKE2b-256 024e425a7da65294a20930f226adbccac92a69e2774ce1ad45f2b75ac6a5d113

See more details on using hashes here.

File details

Details for the file opteryx-0.18.2b854-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for opteryx-0.18.2b854-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 be0e80212001c4dbb4ffe53f30a1715116383ababf01ac030b371a56c8c0ec0d
MD5 06a71b32faa14e38c1580e6a22c51405
BLAKE2b-256 6d78365a2d575e2beb39b4e60412cd58a59c0dab65d086db90192e75987f270a

See more details on using hashes here.

File details

Details for the file opteryx-0.18.2b854-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for opteryx-0.18.2b854-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d5ffc96922b6089ede559939c4dae91de83b181e2a18674822625ec2c1139f03
MD5 396a50be9faa7b234631444b76068e5f
BLAKE2b-256 a5f79640f1c35cd47099c06c86872e3787b8a13d67d90d240b3f18c9fe5225bb

See more details on using hashes here.

File details

Details for the file opteryx-0.18.2b854-cp311-cp311-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for opteryx-0.18.2b854-cp311-cp311-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 cd08254ee4c11002a4b06fc0bb912334e1d42f962b7bacda7d9421e2f6373cfb
MD5 70eaa9279cf7554d04569b792d9e4a32
BLAKE2b-256 27db18754120b9bd8e04086f5345b43e7f02cca36c46ff212709bb7afbc44bce

See more details on using hashes here.

File details

Details for the file opteryx-0.18.2b854-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for opteryx-0.18.2b854-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 65020adcbd4cbda9c2c317aadae5d9e8cec6c9bafc960c07db90c2b1c0ec132d
MD5 c83ac912aac08d88ed4788b09148fff7
BLAKE2b-256 8a2aeda67a5fb57c0c171c08cbcd6952f43ed96ac2547bdbea26407ebcf4fc5e

See more details on using hashes here.

File details

Details for the file opteryx-0.18.2b854-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for opteryx-0.18.2b854-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d6a28fbd330bff608e4f646fe1162f673c87162431232f87871a4673c8f7b5e8
MD5 f93f40dfb93e290f005e3bed3e903581
BLAKE2b-256 7ff5ab1579d9c5359ecd42e90aa515f8437fbba38cf5b4d6042a0774a633ea45

See more details on using hashes here.

File details

Details for the file opteryx-0.18.2b854-cp310-cp310-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for opteryx-0.18.2b854-cp310-cp310-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 e9f301ffc56a0f4ff61ac94818fe1bb66da7b2c9fb1cd281746b5c59d38ea376
MD5 10343b9331546b99774527ef3abe887b
BLAKE2b-256 2ece3d9e10193f6b3b3d3fd26ef0595ed3c1bb8271ce620ac25231c0fe5d3009

See more details on using hashes here.

File details

Details for the file opteryx-0.18.2b854-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for opteryx-0.18.2b854-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 6ccb5959009cbd01f82731096d2e3d4ba00561e7aea50e6c72f04b21c33ed324
MD5 37caa4ae7d7eadb5ccb235ce0e096864
BLAKE2b-256 17e334f44af264fd2984d8297773d2ceb9790b0689506fe8858d2c8ba22c2635

See more details on using hashes here.

File details

Details for the file opteryx-0.18.2b854-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for opteryx-0.18.2b854-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 693bed9c41687de1e92bfcc397420d2c9a9197aaf7b3c7dab09bcab68d45a990
MD5 dc966908d6a3b283370a88114590956e
BLAKE2b-256 0f7ff5c7387a08edb23ab465db5815a434dbfb1e01956da2426dc5f0d82cc25d

See more details on using hashes here.

File details

Details for the file opteryx-0.18.2b854-cp39-cp39-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for opteryx-0.18.2b854-cp39-cp39-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 10182b85d57ec91961a7d2107171d491e1e96cbc38c656b357fc26f9e731e282
MD5 dcbc34d2fa0234be86dbc83bb0848d01
BLAKE2b-256 5b2ed3623d04927db5b4c58dd117018d2833183a2b325cdc2d0d508c124d09ba

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page