Skip to main content

Python SQL Query Engine

Project description

Opteryx

Query your data, where it lives.

A unified SQL interface to unlock insights across your diverse data sources, from blobs stores to databases - effortless cross-platform data analytics.

Opteryx

Resource Location
Source Code https://github.com/mabel-dev/opteryx
Documentation https://opteryx.dev/
Download https://pypi.org/project/opteryx/

PyPI Latest Release Downloads codecov opteryx PyPI Latest Release

InstallExamplesGet Involved

What is Opteryx?

Opteryx is a Python library enabling SQL queries across diverse data sources like Apache Iceberg, Postgres, Parquet, and MongoDB. Opteryx champions the SQL-on-everything approach, streamlining cross-platform data analytics by federating SQL queries across diverse data sources. The goal is to enhance your data analytics process by offering a unified way to access data from across your organization. Opteryx combines the simplicity of SQLite and DuckDB with federated query capabilities found in Presto and Trino.

Opteryx offers the following features:

  • SQL queries on data files generated by other processes, such as logs
  • A command-line tool for filtering, transforming, and combining files
  • Integration with familiar tools like pandas and Polars
  • Embeddable as a low-cost engine, enabling portability and allowing for hundreds of analysts to leverage ad hoc databases with ease
  • Unified and federated access to data on disk, in the cloud, and in on-premises databases, not only through the same interface but in the same query

How Does it Work?

Opteryx processes queries by first determining the appropriate query language to interact with different downstream data platforms. It translates your query into SQL, CQL, or another suitable query language based on the data source. This enables Opteryx to efficiently retrieve the necessary data from systems such as MySQL or MongoDB to respond to your query.

Opteryx

Why Use Opteryx?

Familiar Interface

Opteryx supports key parts of the Python DBAPI and SQL92 standard standards which many analysts and engineers will already know how to use.

Consistent Syntax

Opteryx creates a common SQL-layer over multiple data platforms, allowing backend systems to be upgraded, migrated or consolidated without changing any Opteryx code.

Where possible, errors and warnings returned by Opteryx help the user to understand how to fix their statement to reduce time-to-success for even novice SQL users.

Consumption-Based Billing Friendly

Opteryx is well-suited for deployments to environments which are pay-as-you-use, like Google Cloud Run. Great for situations where you have low-volume usage, or multiple environments, where the costs of many traditional database deployment can quickly add up.

Python Ecosystem

Opteryx is Open Source Python, it quickly and easily integrates into Python code, including Jupyter Notebooks, so you can start querying your data within a few minutes. Opteryx integrates with many of your favorite Python data tools, you can use Opteryx to run SQL against pandas and Polars DataFrames, and even execute a JOIN on an in-memory DataFrame and a remote SQL dataset.

Time Travel

Designed for data analytics in environments where decisions need to be replayable, Opteryx allows you to query data as at a point in time in the past to replay decision algorithms against facts as they were known in the past. You can even self-join tables historic data, great for finding deltas in datasets over time. (data must be structured to enable temporal queries)

Fast

Benchmarks on M2 Pro Mac running an ad hoc GROUP BY over a 6 million row parquet file via the CLI in ~1/4th of a second from a cold start (no caching and predefined schema). (different systems will have different performance characteristics)

Instant Elasticity

Designed to run in Knative and similar environments like Google Cloud Run, Opteryx can scale down to zero, and scale up to respond to thousands of concurrent queries within seconds.

Bring your own Data

Opteryx

Opteryx supports multiple query engines, dataframe APIs and storage formats. You can mix-and-match sources in a single query. Opteryx can even JOIN datasets stored in different formats and different platforms in the same query, such as Parquet, Apache Iceberg and MySQL.

Opteryx allows you to query your data directly in the systems where they are stored, eliminating the need to duplicate data into a common store for analytics. This saves you the cost and effort of maintaining duplicates.

Opteryx can push parts of your query to the source query engine, allowing queries to run at the speed of the backend, rather than your local computer.

And if there's not a connector in the box for your data platform; feel free to submit a pull request to add one.

Install

Installing from PyPI is recommended.

pip install opteryx

To build Opteryx from source, refer to the contribution guides.

Opteryx installs with a small set of libraries it needs for core functionality, such as Numpy, PyArrow, and orjson. Some features require additional libraries to be installed, you are notified of these libraries as they are required.

Examples

Filter a Dataset on the Command Line

In this example, we are running Opteryx from the command line to filter one of the internal example datasets and display the results on the console.

python -m opteryx "SELECT * FROM \$astronauts WHERE 'Apollo 11' IN UNNEST(missions);"

Opteryx this example is complete and should run as-is

Execute a Simple Query in Python

In this example, we are showing the basic usage of the Python API by executing a simple query that makes no references to any datasets.

# Import the Opteryx SQL query engine library.
import opteryx

# Execute a SQL query to evaluate the expression 4 * 7.
# The result is stored in the 'result' variable.
result = opteryx.query("SELECT 4 * 7;")

# Display the first row(s) of the result to verify the query executed correctly.
result.head()
ID 4 * 7
1 28

this example is complete and should run as-is

Execute SQL on a pandas DataFrame

In this example, we are running a SQL statement on a pandas DataFrame and returning the result as a new pandas DataFrame.

# Required imports
import opteryx
import pandas

# Read data from the exoplanets.csv file hosted on Google Cloud Storage
# The resulting DataFrame is stored in the variable `pandas_df`.
pandas_df = pandas.read_csv("https://storage.googleapis.com/opteryx/exoplanets/exoplanets.csv")

# Register the pandas DataFrame with Opteryx under the alias "exoplanets"
# This makes the DataFrame available for SQL-like queries.
opteryx.register_df("exoplanets", pandas_df)

# Perform an SQL query to group the data by `koi_disposition` and count the number
# of occurrences of each distinct `koi_disposition`.
# The result is stored in `aggregated_df`.
aggregated_df = opteryx.query("SELECT koi_disposition, COUNT(*) FROM exoplanets GROUP BY koi_disposition;").pandas()

# Display the aggregated DataFrame to get a preview of the result.
aggregated_df.head()
  koi_disposition  COUNT(*)
0       CONFIRMED      2293
1  FALSE POSITIVE      5023
2       CANDIDATE      2248 

this example is complete and should run as-is

Query Data on Local Disk

In this example, we are querying and filtering a file directly. This example will not run as written because the file being queried does not exist.

# Import the Opteryx query engine.
import opteryx

# Execute a SQL query to select the first 5 rows from the 'space_missions.parquet' table.
# The result will be stored in the 'result' variable.
result = opteryx.query("SELECT * FROM 'space_missions.parquet' LIMIT 5;")

# Display the result.
# This is useful for quick inspection of the data.
result.head()
ID Company Location Price Launched_at Rocket Rocket_Status Mission Mission_Status
0 RVSN USSR Site 1/5, Baikonur Cosmodrome, null 1957-10-04 19:28:00 Sputnik 8K71PS Retired Sputnik-1 Success
1 RVSN USSR Site 1/5, Baikonur Cosmodrome, null 1957-11-03 02:30:00 Sputnik 8K71PS Retired Sputnik-2 Success
2 US Navy LC-18A, Cape Canaveral AFS, Fl null 1957-12-06 16:44:00 Vanguard Retired Vanguard TV3 Failure
3 AMBA LC-26A, Cape Canaveral AFS, Fl null 1958-02-01 03:48:00 Juno I Retired Explorer 1 Success
4 US Navy LC-18A, Cape Canaveral AFS, Fl null 1958-02-05 07:33:00 Vanguard Retired Vanguard TV3BU Failure

this example requires a data file, space_missions.parquet.

Query Data in SQLite

In this example, we are querying a SQLite database via Opteryx. This example will not run as written because the file being queried does not exist.

# Import the Opteryx query engine and the SqlConnector from its connectors module.
import opteryx
from opteryx.connectors import SqlConnector

# Register a new data store with the prefix "sql", specifying the SQL Connector to handle it.
# This allows queries with the 'sql' prefix to be routed to the appropriate SQL database.
opteryx.register_store(
   prefix="sql",  # Prefix for distinguishing this particular store
   connector=SqlConnector,  # Specify the connector to handle queries for this store
   remove_prefix=True,  # Remove the prefix from the table name when querying SQLite
   connection="sqlite:///database.db"  # SQLAlchemy connection string for the SQLite database
)

# Execute a SQL query to select specified columns from the 'planets' table in the SQL store,
# limiting the output to 5 rows. The result is stored in the 'result' variable.
result = opteryx.query("SELECT name, mass, diameter, density FROM sql.planets LIMIT 5;")

# Display the result.
# This is useful for quickly verifying that the query executed correctly.
result.head()
ID name mass diameter density
1 Mercury 0.33 4879 5427
2 Venus 4.87 12104 5243
3 Earth 5.97 12756 5514
4 Mars 0.642 6792 3933
5 Jupiter 1898.0 142984 1326

this example requires a data file, database.db.

Query Data on GCS

In this example, we are to querying a dataset on GCS in a public bucket called 'opteryx'.

# Import the Opteryx query engine and the GcpCloudStorageConnector from its connectors module.
import opteryx
from opteryx.connectors import GcpCloudStorageConnector

# Register a new data store named 'opteryx', specifying the GcpCloudStorageConnector to handle it.
# This allows queries for this particular store to be routed to the appropriate GCP Cloud Storage bucket.
opteryx.register_store(
    "opteryx",  # Name of the store to register
    GcpCloudStorageConnector  # Connector to handle queries for this store
)

# Execute a SQL query to select all columns from the 'space_missions' table located in the 'opteryx' store,
# and limit the output to 5 rows. The result is stored in the 'result' variable.
result = opteryx.query("SELECT * FROM opteryx.space_missions LIMIT 5;")

# Display the result.
# This is useful for quickly verifying that the query executed correctly.
result.head()
ID Company Location Price Launched_at Rocket Rocket_Status Mission Mission_Status
0 RVSN USSR Site 1/5, Baikonur Cosmodrome, null 1957-10-04 19:28:00 Sputnik 8K71PS Retired Sputnik-1 Success
1 RVSN USSR Site 1/5, Baikonur Cosmodrome, null 1957-11-03 02:30:00 Sputnik 8K71PS Retired Sputnik-2 Success
2 US Navy LC-18A, Cape Canaveral AFS, Fl null 1957-12-06 16:44:00 Vanguard Retired Vanguard TV3 Failure
3 AMBA LC-26A, Cape Canaveral AFS, Fl null 1958-02-01 03:48:00 Juno I Retired Explorer 1 Success
4 US Navy LC-18A, Cape Canaveral AFS, Fl null 1958-02-05 07:33:00 Vanguard Retired Vanguard TV3BU Failure

this example is complete and should run as-is


You can also try Opteryx right now using our interactive labs on Binder.

Binder

Community

Discord Medium

Get Involved

🌟 Star this repo to show your support and help others discover Opteryx.
💬 Join the discussion — share your ideas, tell us how you’re using Opteryx, or suggest features.
🛠️ Contribute to the code — join us in building Opteryx! It’s easy to get started, and we’re happy to guide you.
📚 Check out the contributor documentation. No matter your skill level, there are ways to contribute.
❤️ We welcome sponsorships of any size. Every contribution helps us make Opteryx even better!

We’re excited to have you join our journey. Let us know how we can help!

Security

We take security seriously. If you find any weaknesses please review our Security Policy let us know through our reporting process.

Static Analysis Vulnerabilities Security Rating

License

License

Opteryx is licensed under Apache 2.0 except where specific modules note otherwise.

Status

Status

Opteryx is in beta. Beta means different things to different people, to us, being beta means:

  • Core functionality has good regression test coverage to help ensure stability
  • Some edge cases may have undetected bugs
  • Performance tuning is incomplete
  • Changes are focused on feature completion, bugs, performance, reducing debt, and security
  • Code structure and APIs are not stable and may change

We’re actively adding features and improving performance.

Related Projects

  • orso DataFrame library
  • mabel Streaming data APIs

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

opteryx-0.23.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (19.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

opteryx-0.23.0-cp313-cp313-macosx_10_15_universal2.whl (8.9 MB view details)

Uploaded CPython 3.13macOS 10.15+ universal2 (ARM64, x86-64)

opteryx-0.23.0-cp312-cp312-win_amd64.whl (4.4 MB view details)

Uploaded CPython 3.12Windows x86-64

opteryx-0.23.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (19.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

opteryx-0.23.0-cp312-cp312-macosx_10_15_universal2.whl (9.0 MB view details)

Uploaded CPython 3.12macOS 10.15+ universal2 (ARM64, x86-64)

opteryx-0.23.0-cp311-cp311-win_amd64.whl (4.3 MB view details)

Uploaded CPython 3.11Windows x86-64

opteryx-0.23.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (19.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

opteryx-0.23.0-cp311-cp311-macosx_10_15_universal2.whl (9.0 MB view details)

Uploaded CPython 3.11macOS 10.15+ universal2 (ARM64, x86-64)

opteryx-0.23.0-cp310-cp310-win_amd64.whl (4.3 MB view details)

Uploaded CPython 3.10Windows x86-64

opteryx-0.23.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (18.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

opteryx-0.23.0-cp310-cp310-macosx_10_15_universal2.whl (9.0 MB view details)

Uploaded CPython 3.10macOS 10.15+ universal2 (ARM64, x86-64)

opteryx-0.23.0-cp39-cp39-macosx_10_15_universal2.whl (9.0 MB view details)

Uploaded CPython 3.9macOS 10.15+ universal2 (ARM64, x86-64)

File details

Details for the file opteryx-0.23.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for opteryx-0.23.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 984d1e01400c14610ad7cc626157f45689128be8c0f6f826f1c5159d4cad2ba9
MD5 f469ba2a3646a2894e9bfa8577ba8de3
BLAKE2b-256 ba89f46949bb0d1889c98aedb7f15023dccd1196ebbc5b0e4cb51b870cee9238

See more details on using hashes here.

File details

Details for the file opteryx-0.23.0-cp313-cp313-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for opteryx-0.23.0-cp313-cp313-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 07911770219b4f18d7c42afc0b7b8553a8018ed357f1ee1ee3574f41ce977cda
MD5 b264c059aa8fd9d7695336ca59da2e53
BLAKE2b-256 57470eeba91e9b57f5940d3b3f55b4404b307a4feaf5b44507ea34e35b509f2f

See more details on using hashes here.

File details

Details for the file opteryx-0.23.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: opteryx-0.23.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 4.4 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for opteryx-0.23.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 ceb8ba8ba12105aff05c7a7099127f2468511134507cce6077c36f51e5a56033
MD5 0f7cdd5c91293593adef0edffd61f803
BLAKE2b-256 905dfb25c49d7f344479fa166cf3334ba2b0849d1698585b163b53485756d070

See more details on using hashes here.

File details

Details for the file opteryx-0.23.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for opteryx-0.23.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 acc930f8d82a52e62b7cc9504e63559bcfb480368d798f5b9d9b59a9057e6a92
MD5 ee880a4185e302fd85bc8ad83e31ccd6
BLAKE2b-256 f1344520c66fc44a76405caedd4bb53d626db3e29fe328ae6be50e2f14d99a05

See more details on using hashes here.

File details

Details for the file opteryx-0.23.0-cp312-cp312-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for opteryx-0.23.0-cp312-cp312-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 5ac6cfaf0b414c8807188e02077f20f91524d7d9fa60ecf2d71b8ac9930a5fb5
MD5 e962b32a58dd05bda49c4cc73c222853
BLAKE2b-256 fe45b729769a446f182a43fa0aacb8bb470925a0f77b2def3700483d64cd102d

See more details on using hashes here.

File details

Details for the file opteryx-0.23.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: opteryx-0.23.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 4.3 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for opteryx-0.23.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 619a7c2f42dfe64d4e3312c9da27afda5845e041afde9f83696da39a8b80ee8d
MD5 476dd401fa68e4c7127e3d133da407e1
BLAKE2b-256 4f0080df2040da74612b4933c2e4fb81f84fe0d4c72dfff40bec180694489cb5

See more details on using hashes here.

File details

Details for the file opteryx-0.23.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for opteryx-0.23.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 a8b1a60aaf873669e2d9d1b980137f3a2134df8bacac8ae8247b7327a1586884
MD5 aa63f964740791ad412e3bceafcddfee
BLAKE2b-256 45c431ec6797a92abfc7cfede016bfb05b196b80371fa81eb2a51b58c29621b1

See more details on using hashes here.

File details

Details for the file opteryx-0.23.0-cp311-cp311-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for opteryx-0.23.0-cp311-cp311-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 9bcf1538a16c89f3a2b19187f92a010677914e0445b71ba94cafa7a05770c7d9
MD5 db194cca47aaef59bedecaac491f6ba6
BLAKE2b-256 0a661a0af4171a24d3f4ccd1d3e0434ce8ce956d2a867a3adec1ccfc904a7717

See more details on using hashes here.

File details

Details for the file opteryx-0.23.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: opteryx-0.23.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 4.3 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for opteryx-0.23.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 1815748ac82fa3f48ea06b34541ef295581bc3a5df9e03859c0c5c2315c5ef6f
MD5 a7f14e22b8cf76428c53b84f67064810
BLAKE2b-256 5653fcfb1d60bdd615816ae4cf1e9dc402c003001ec3676c1460c013e713b9ec

See more details on using hashes here.

File details

Details for the file opteryx-0.23.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for opteryx-0.23.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 fe2d3b3e3610ef8e2da70106cfd53004105f1138164f449f16bf2888ff1eb1e3
MD5 76f5ca1492db7b2d77b08c1bfceb6797
BLAKE2b-256 b01ffd7066143b4f8ad494dd25dc74575d3a1bdbfd74b89a27a8edc444cf83ff

See more details on using hashes here.

File details

Details for the file opteryx-0.23.0-cp310-cp310-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for opteryx-0.23.0-cp310-cp310-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 e1b8b6390c5c919091839a15ab0d0a5b557776b38316a1c9721acb844cfd3a18
MD5 d2bad0997ce32b7e9dd67976a42d26fc
BLAKE2b-256 83e3e56538a5101836a5b80ce9e0cd8ee57cdc57abf5a95b795502c1e2c37352

See more details on using hashes here.

File details

Details for the file opteryx-0.23.0-cp39-cp39-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for opteryx-0.23.0-cp39-cp39-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 9b1621073bcbeaa0cebbd0a45f5dc4f20c0fe2babe336519667272e26bd97993
MD5 0962b5cdd2fc47ef19e6b0ec6adbcd68
BLAKE2b-256 d8d3391306da2e87449c66d8640797007817b4cfc2c8fbaad2e39ba7ea981cd4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page