Skip to main content

Python SQL Query Engine

Project description

Opteryx

Query your data, where it lives.

A unified SQL interface to unlock insights across your diverse data sources, from blobs stores to databases - effortless cross-platform data analytics.

Resource Location
Source Code https://github.com/mabel-dev/opteryx
Documentation https://opteryx.dev/
Download https://pypi.org/project/opteryx/

PyPI Latest Release Downloads codecov opteryx PyPI Latest Release

InstallExamplesGet Involved

What is Opteryx?

Opteryx champions the SQL-on-everything approach, streamlining cross-platform data analytics by federating SQL queries across diverse data sources, including database systems like Postgres and datalake file formats like Parquet. The goal is to enhance your data analytics process by offering a unified way to access data from across your organization.

Opteryx is a Python library that combines elements of in-process database engines like SQLite and DuckDB with federative features found in systems like Presto and Trino. The result is a versatile tool for querying data across multiple data sources in a seamless fashion.

Opteryx offers the following features:

  • SQL queries on data files generated by other processes, such as logs
  • A command-line tool for filtering, transforming, and combining files
  • Integration with familiar tools like pandas and Polars
  • Embeddable as a low-cost engine, enabling portability and allowing for hundreds of analysts to leverage ad hoc databases with ease
  • Unified and federated access to data on disk, in the cloud, and in on-premises databases, not only through the same interface but in the same query

How Does it Work?

Opteryx processes queries by first determining the appropriate query language to interact with different downstream data platforms. It translates your query into SQL, CQL, or another suitable format for document stores like MongoDB, based on the data source. This enables Opteryx to efficiently retrieve the necessary data from systems such as MySQL or MongoDB to respond to your query.

Opteryx

Why Use Opteryx?

Familiar Interface

Opteryx supports key parts of the Python DBAPI and SQL92 standard standards which many analysts and engineers will already know how to use.

Consistent Syntax

Opteryx creates a common SQL-layer over multiple data platforms, allowing backend systems to be upgraded, migrated or consolidated without changing any Opteryx code.

Where possible, errors and warnings returned by Opteryx help the user to understand how to fix their statement to reduce time-to-success for even novice SQL users.

Consumption-Based Billing Friendly

Opteryx is well-suited for deployments to environments which are pay-as-you-use, like Google Cloud Run. Great for situations where you have low-volume usage, or multiple environments, where the costs of many traditional database deployment can quickly add up.

Python Ecosystem

Opteryx is Open Source Python, it quickly and easily integrates into Python code, including Jupyter Notebooks, so you can start querying your data within a few minutes. Opteryx integrates with many of your favorite Python data tools, you can use Opteryx to run SQL against pandas and Polars DataFrames, and even execute a JOIN on an in-memory DataFrame and a remote SQL dataset.

Time Travel

Designed for data analytics in environments where decisions need to be replayable, Opteryx allows you to query data as at a point in time in the past to replay decision algorithms against facts as they were known in the past. You can even self-join tables historic data, great for finding deltas in datasets over time. (data must be structured to enable temporal queries)

Fast

Benchmarks on M2 Pro Mac running an ad hoc GROUP BY over a 6 million row parquet file via the CLI in ~1/4th of a second from a cold start (no caching and predefined schema). (different systems will have different performance characteristics)

Instant Elasticity

Designed to run in Knative and similar environments like Google Cloud Run, Opteryx can scale down to zero, and scale up to respond to thousands of concurrent queries within seconds.

Bring your own Data

Opteryx

Opteryx supports multiple query engines, dataframe APIs and storage formats. You can mix-and-match sources in a single query. Opteryx can even JOIN datasets stored in different formats and different platforms in the same query, such as Parquet and MySQL.

Opteryx allows you to query your data directly in the systems where they are stored, eliminating the need to duplicate data into a common store for analytics. This saves you the cost and effort of maintaining duplicates.

Opteryx can push parts of your query to the source query engine, allowing queries to run at the speed of the backend, rather than your local computer.

And if there's not a connector in the box for your data platform; feel free to submit a pull request to add one.

Install

Installing from PyPI is recommended.

pip install opteryx

To build Opteryx from source, refer to the contribution guides.

Opteryx installs with a small set of libraries it needs for core functionality, such as Numpy, PyArrow, and orjson. Some features require additional libraries to be installed, you are notified of these libraries as they are required.

Examples

Filter a Dataset on the Command Line

In this example, we are running Opteryx from the command line to filter one of the internal example datasets and display the results on the console.

python -m opteryx "SELECT * FROM \$astronauts WHERE 'Apollo 11' IN UNNEST(missions);"

Opteryx this example is complete and should run as-is

Execute a Simple Query in Python

In this example, we are showing the basic usage of the Python API by executing a simple query that makes no references to any datasets.

# Import the Opteryx SQL query engine library.
import opteryx

# Execute a SQL query to evaluate the expression 4 * 7.
# The result is stored in the 'result' variable.
result = opteryx.query("SELECT 4 * 7;")

# Display the first row(s) of the result to verify the query executed correctly.
result.head()
ID 4 * 7
1 28

this example is complete and should run as-is

Execute SQL on a pandas DataFrame

In this example, we are running a SQL statement on a pandas DataFrame and returning the result as a new pandas DataFrame.

# Required imports
import opteryx
import pandas

# Read data from the exoplanets.csv file hosted on Google Cloud Storage
# The resulting DataFrame is stored in the variable `pandas_df`.
pandas_df = pandas.read_csv("https://storage.googleapis.com/opteryx/exoplanets/exoplanets.csv")

# Register the pandas DataFrame with Opteryx under the alias "exoplanets"
# This makes the DataFrame available for SQL-like queries.
opteryx.register_df("exoplanets", pandas_df)

# Perform an SQL query to group the data by `koi_disposition` and count the number
# of occurrences of each distinct `koi_disposition`.
# The result is stored in `aggregated_df`.
aggregated_df = opteryx.query("SELECT koi_disposition, COUNT(*) FROM exoplanets GROUP BY koi_disposition;").pandas()

# Display the aggregated DataFrame to get a preview of the result.
aggregated_df.head()
  koi_disposition  COUNT(*)
0       CONFIRMED      2293
1  FALSE POSITIVE      5023
2       CANDIDATE      2248 

this example is complete and should run as-is

Query Data on Local Disk

In this example, we are querying and filtering a file directly. This example will not run as written because the file being queried does not exist.

# Import the Opteryx query engine.
import opteryx

# Execute a SQL query to select the first 5 rows from the 'space_missions.parquet' table.
# The result will be stored in the 'result' variable.
result = opteryx.query("SELECT * FROM 'space_missions.parquet' LIMIT 5;")

# Display the result.
# This is useful for quick inspection of the data.
result.head()
ID Company Location Price Launched_at Rocket Rocket_Status Mission Mission_Status
0 RVSN USSR Site 1/5, Baikonur Cosmodrome, null 1957-10-04 19:28:00 Sputnik 8K71PS Retired Sputnik-1 Success
1 RVSN USSR Site 1/5, Baikonur Cosmodrome, null 1957-11-03 02:30:00 Sputnik 8K71PS Retired Sputnik-2 Success
2 US Navy LC-18A, Cape Canaveral AFS, Fl null 1957-12-06 16:44:00 Vanguard Retired Vanguard TV3 Failure
3 AMBA LC-26A, Cape Canaveral AFS, Fl null 1958-02-01 03:48:00 Juno I Retired Explorer 1 Success
4 US Navy LC-18A, Cape Canaveral AFS, Fl null 1958-02-05 07:33:00 Vanguard Retired Vanguard TV3BU Failure

this example requires a data file, space_missions.parquet.

Query Data in SQLite

In this example, we are querying a SQLite database via Opteryx. This example will not run as written because the file being queried does not exist.

# Import the Opteryx query engine and the SqlConnector from its connectors module.
import opteryx
from opteryx.connectors import SqlConnector

# Register a new data store with the prefix "sql", specifying the SQL Connector to handle it.
# This allows queries with the 'sql' prefix to be routed to the appropriate SQL database.
opteryx.register_store(
   prefix="sql",  # Prefix for distinguishing this particular store
   connector=SqlConnector,  # Specify the connector to handle queries for this store
   remove_prefix=True,  # Remove the prefix from the table name when querying SQLite
   connection="sqlite:///database.db"  # SQLAlchemy connection string for the SQLite database
)

# Execute a SQL query to select specified columns from the 'planets' table in the SQL store,
# limiting the output to 5 rows. The result is stored in the 'result' variable.
result = opteryx.query("SELECT name, mass, diameter, density FROM sql.planets LIMIT 5;")

# Display the result.
# This is useful for quickly verifying that the query executed correctly.
result.head()
ID name mass diameter density
1 Mercury 0.33 4879 5427
2 Venus 4.87 12104 5243
3 Earth 5.97 12756 5514
4 Mars 0.642 6792 3933
5 Jupiter 1898.0 142984 1326

this example requires a data file, database.db.

Query Data on GCS

In this example, we are to querying a dataset on GCS in a public bucket called 'opteryx'.

# Import the Opteryx query engine and the GcpCloudStorageConnector from its connectors module.
import opteryx
from opteryx.connectors import GcpCloudStorageConnector

# Register a new data store named 'opteryx', specifying the GcpCloudStorageConnector to handle it.
# This allows queries for this particular store to be routed to the appropriate GCP Cloud Storage bucket.
opteryx.register_store(
    "opteryx",  # Name of the store to register
    GcpCloudStorageConnector  # Connector to handle queries for this store
)

# Execute a SQL query to select all columns from the 'space_missions' table located in the 'opteryx' store,
# and limit the output to 5 rows. The result is stored in the 'result' variable.
result = opteryx.query("SELECT * FROM opteryx.space_missions LIMIT 5;")

# Display the result.
# This is useful for quickly verifying that the query executed correctly.
result.head()
ID Company Location Price Launched_at Rocket Rocket_Status Mission Mission_Status
0 RVSN USSR Site 1/5, Baikonur Cosmodrome, null 1957-10-04 19:28:00 Sputnik 8K71PS Retired Sputnik-1 Success
1 RVSN USSR Site 1/5, Baikonur Cosmodrome, null 1957-11-03 02:30:00 Sputnik 8K71PS Retired Sputnik-2 Success
2 US Navy LC-18A, Cape Canaveral AFS, Fl null 1957-12-06 16:44:00 Vanguard Retired Vanguard TV3 Failure
3 AMBA LC-26A, Cape Canaveral AFS, Fl null 1958-02-01 03:48:00 Juno I Retired Explorer 1 Success
4 US Navy LC-18A, Cape Canaveral AFS, Fl null 1958-02-05 07:33:00 Vanguard Retired Vanguard TV3BU Failure

this example is complete and should run as-is


You can also try Opteryx right now using our interactive labs on Binder.

Binder

Community

Discord X Follow Medium

Get Involved

  • :star: Star this repo
  • Contribute — join us in building Opteryx, through writing code, or inspiring others to use it.
  • Let us know your ideas, how you are using Opteryx, or report a bug or feature request.
  • See the contributor documentation for Opteryx. It's easy to get started, and we're really friendly if you need any help!
  • If you're interested in contributing to the code now, check out GitHub issues. Feel free to ask questions or open a draft PR.

Security

Static Analysis Vulnerabilities Security Rating

See the project Security Policy for information about reporting vulnerabilities.

License

License FOSSA Status

Opteryx is licensed under Apache 2.0 except where specific modules note otherwise.

Status

Status

Opteryx is in beta. Beta means different things to different people, to us, being beta means:

  • Core functionality has good regression test coverage to help ensure stability
  • Some edge cases may have undetected bugs
  • Performance tuning is incomplete
  • Changes are focused on feature completion, bugs, performance, reducing debt, and security
  • Code structure and APIs are not stable and may change

Related Projects

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opteryx-0.18.1.tar.gz (1.6 MB view details)

Uploaded Source

Built Distributions

opteryx-0.18.1-cp312-cp312-win_amd64.whl (3.2 MB view details)

Uploaded CPython 3.12 Windows x86-64

opteryx-0.18.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.8 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

opteryx-0.18.1-cp312-cp312-macosx_10_15_universal2.whl (6.1 MB view details)

Uploaded CPython 3.12 macOS 10.15+ universal2 (ARM64, x86-64)

opteryx-0.18.1-cp311-cp311-win_amd64.whl (3.2 MB view details)

Uploaded CPython 3.11 Windows x86-64

opteryx-0.18.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

opteryx-0.18.1-cp311-cp311-macosx_10_15_universal2.whl (6.1 MB view details)

Uploaded CPython 3.11 macOS 10.15+ universal2 (ARM64, x86-64)

opteryx-0.18.1-cp310-cp310-win_amd64.whl (3.2 MB view details)

Uploaded CPython 3.10 Windows x86-64

opteryx-0.18.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.3 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

opteryx-0.18.1-cp310-cp310-macosx_10_15_universal2.whl (6.1 MB view details)

Uploaded CPython 3.10 macOS 10.15+ universal2 (ARM64, x86-64)

opteryx-0.18.1-cp39-cp39-win_amd64.whl (3.2 MB view details)

Uploaded CPython 3.9 Windows x86-64

opteryx-0.18.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.3 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

opteryx-0.18.1-cp39-cp39-macosx_10_15_universal2.whl (6.1 MB view details)

Uploaded CPython 3.9 macOS 10.15+ universal2 (ARM64, x86-64)

File details

Details for the file opteryx-0.18.1.tar.gz.

File metadata

  • Download URL: opteryx-0.18.1.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for opteryx-0.18.1.tar.gz
Algorithm Hash digest
SHA256 bfafec3ce2dce44b2c5a1342bd31eafd3238123fe57a81689d8862357363c52f
MD5 cd7440f2167d837b6cc9b11a814ac208
BLAKE2b-256 19c41ce1f83e0c2144bdb7f56b7b2b007e83c5d161e2aac19d42ca8f13bcec10

See more details on using hashes here.

File details

Details for the file opteryx-0.18.1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for opteryx-0.18.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 901a13a15ddcca66ba02575119400e5d2432d1da745ab13728e4c772f3917d70
MD5 c3442ed587c09ec99c533704e1a5c893
BLAKE2b-256 00691e0508549737171554882d717d89fb7bdf8db790d03b6b5b2f7ee3a0281b

See more details on using hashes here.

File details

Details for the file opteryx-0.18.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for opteryx-0.18.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3d0650dfbf8f56a1427f6413057933caf71162d58d938616cdaf4e89d3e16a1e
MD5 130a22a8992e380aed648c48de2bec2e
BLAKE2b-256 1d2f5fb1ddeaa7359bd9b5c97860ca541dcc6b34b9f2675c53cd2661e090659b

See more details on using hashes here.

File details

Details for the file opteryx-0.18.1-cp312-cp312-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for opteryx-0.18.1-cp312-cp312-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 06059a8443fd61eafd57939b2e4e6166e0a16f24fce588559728601a9b6dc172
MD5 d71dc5525f50808b6bd64839a68db38e
BLAKE2b-256 1266d9648ed36d0164a2a96e885dc8c65246f043f750e0e5bcf99c4eb480a201

See more details on using hashes here.

File details

Details for the file opteryx-0.18.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for opteryx-0.18.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 4f454031b697530372b12a7ec26e792c5e4403c5200cbc58d57f71b924f58c44
MD5 af1c06374c7472b5feb59c4214ca50a9
BLAKE2b-256 6f1541247d9401b2b3619543d5b036bcec2c28bf5dbf80206a8759495f7a35de

See more details on using hashes here.

File details

Details for the file opteryx-0.18.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for opteryx-0.18.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7f9535a085ba0c8406f838e673c5b8910162c8744a9b0d0b48e111e4d6b9e58e
MD5 a04f0868f242ea1f7f1939ba829432e1
BLAKE2b-256 0804ffc37ad1cd8bcaa616732860d761095b4bd6e0f48248c83f15058ae0318b

See more details on using hashes here.

File details

Details for the file opteryx-0.18.1-cp311-cp311-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for opteryx-0.18.1-cp311-cp311-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 2d5389fff51f929cd749be84ebfaead45311c46d8373594cb64e82f7b492439e
MD5 28222bfedbbae897e358e9ec11be05d0
BLAKE2b-256 be319249ab84d00c51afdc6cd120d0a4d50109816a25c500454db324e63a62e9

See more details on using hashes here.

File details

Details for the file opteryx-0.18.1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for opteryx-0.18.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 cb61b231da22a1de05a55f766fa4fded4aae3927e56cbfc74af6db2764ec789f
MD5 0d57565cdad2c3c82eaa8ee75dfad526
BLAKE2b-256 54a67a32e3e32f00aa6dc7552cc1f8cf4e44f8bc9e08133ec8aa24e451b8f0be

See more details on using hashes here.

File details

Details for the file opteryx-0.18.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for opteryx-0.18.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 50728b2549381f7ced66f6a16ea9b53466db933acad4b570b573b99724896b6e
MD5 098b2529e35de4985fdcbc482ccfc863
BLAKE2b-256 0770e646f19460d6be7d999a5a6ea80ce2f82cc4346149aa8f0781af9a00cd6c

See more details on using hashes here.

File details

Details for the file opteryx-0.18.1-cp310-cp310-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for opteryx-0.18.1-cp310-cp310-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 bfc84636541d8b18e440b6e5216728575ebef73a95516da698dfb37484155b2a
MD5 e7ab951dc00fb00ce095c96646943e02
BLAKE2b-256 f9d3d58a5c8650a967544eadb724b730fea74be444346b2af51995f026719171

See more details on using hashes here.

File details

Details for the file opteryx-0.18.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: opteryx-0.18.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for opteryx-0.18.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 c5d17fc1225d6d6a681c73a3c987db3d089d331aaace52a1578159323b191f71
MD5 6ae84db7ae0d6a7b5c02b1d2e54f8158
BLAKE2b-256 9656fb8df2b47076616969c756742f14d1859562fd3d9f05c1009a65d9949ac3

See more details on using hashes here.

File details

Details for the file opteryx-0.18.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for opteryx-0.18.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 53b150c0bfb062bbe172baa7127c47c60a4308cc70d857b6fc752fe3d08cf128
MD5 8fbc194fa4d873a2ea82b4938084c9b6
BLAKE2b-256 3a880df665c20a232023fcb4f63ed61d43ee6535f33e532b3708473467ad082e

See more details on using hashes here.

File details

Details for the file opteryx-0.18.1-cp39-cp39-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for opteryx-0.18.1-cp39-cp39-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 190406dedcc39fec735412f7274b57e19d20cd24c84795f8f3a58ff3087418c7
MD5 d62202b492ae43126c9faa5758b257e7
BLAKE2b-256 4d081289d72080754ef3b9719735308ad98757f47ca804c81c005f0042165a0a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page