Python SQL Query Engine
Project description
Query your data, where it lives.
Opteryx is a SQL Engine designed for embedded and cloud-native environments, and with command-line skills.
Documentation | Examples | Contributing
What is Opteryx?
Opteryx is a Python library for data wrangling and analytics. It creates a common interface to interact with various data platforms, enabling users to transform and explore data anywhere.
- Using SQL to query data files written by another process - such as logs.
- As a command line tool - Run SQL directly on files - bring the power and flexibility of SQL to filter, transform and combine files, or as a command line viewer and converter for Parquet, ORC or Avro format files.
- As an embeddable engine - a low-cost option to allow hundreds of analysts to each have part-time databases.
- Adding SQL to existing familar tools like pandas and Polars, Executing SQL against and returning result sets as Polars and pandas DataFrames
Why Use Opteryx?
Familiar Interface
Opteryx supports key parts of the Python ADODB and SQL92 standard standards which many analysts and engineers will already know how to use.
Consistent Syntax
Opteryx creates a common SQL-layer over multiple data platforms, allowing backend systems to be upgraded, migrated or consolidated without changing any Opteryx code.
Bring your own Data
Opteryx supports multiple query engines, dataframe APIs and storage formats. You can mix-and-match sources in a single query, so one dataset can be in Parquet, and another MySQL, and Opteryx is able to JOIN across them.
Opteryx queries your data in the systems you store them in saving you from the cost and effort of maintaining duplicates your data into a common store for analytics.
Opteryx is able to push parts of your query to the source query engine, meaning queries run at the speed of the backend, not your local computer.
Consumption-Based Billing Friendly
Opteryx is well-suited for deployments to environments which are pay-as-you-use, like Google Cloud Run. Great for situations where you low-volume usage, or many environments, where the costs of many traditional database deployment can quickly add up.
Python Native
Opteryx is Open Source Python, it quickly and easily integrates into Python code, including Jupyter Notebooks, so you can start querying your data within a few minutes. You can even use Opteryx to run SQL against pandas DataFrames, and even execute a join with an in-memory DataFrame with a remote dataset.
Time Travel
Designed for data analytics in environments where decisions need to be replayable, Opteryx allows you to query data as at a point in time in the past to replay decision algorithms against facts as they were known in the past. (data must be structured to enable temporal queries)
Fast
Benchmarks on M1 Pro Mac running an ad hoc GROUP BY
over 1Gb of data via the CLI in 1/5th of a second. (different systems will have different performance characteristics)
Rows | Columns | File Size | Query Time |
---|---|---|---|
561225 | 81 | 1Gb | 0.22sec |
1064539 | 81 | 2Gb | 0.27sec |
Instant Elasticity
Designed to run in Knative and similar environments like Google Cloud Run, Opteryx can scale down to zero, and scale up to respond to thousands of concurrent queries within seconds.
Examples
Install from PyPI
Filter a Dataset on the Command Line
Execute a Simple Query in Python
Execute SQL on a pandas DataFrame
Query Data on Local Disk
Query Data on GCS
Query Data in SQLite
Further Examples
Install from PyPI
pip install opteryx
Filter a Dataset on the Command Line
In this example, we are running Opteryx from the command line to filter one of the internal example datasets and display the results on the console.
python -m opteryx "SELECT * FROM \$astronauts WHERE 'Apollo 11' IN UNNEST(missions);"
this example is complete and should run as-is
Execute a Simple Query in Python
In this example, we are showing the basic usage of the Python API by executing a simple query that makes no references to any datasets.
import opteryx
result = opteryx.query("SELECT 4 * 7;")
result.head()
| 4.0*7.0
===#=========
0 | 28.0
this example is complete and should run as-is
Execute SQL on a pandas DataFrame
In this example, we are running a SQL statement on a pandas DataFrame and returning the result as a new pandas DataFrame.
import opteryx
import pandas
pandas_df = pandas.read_csv("https://storage.googleapis.com/opteryx/exoplanets/exoplanets.csv")
opteryx.register_df("exoplanets", pandas_df)
aggregated_df = opteryx.query("SELECT koi_disposition, COUNT(*) FROM exoplanets GROUP BY koi_disposition;").to_df()
aggregated_df.head()
koi_disposition COUNT(*)
0 CONFIRMED 2293
1 FALSE POSITIVE 5023
2 CANDIDATE 2248
this example is complete and should run as-is
Query Data on Local Disk
In this example, we are querying and filtering a file directly. This example will not run as written because the file being queried does not exist.
import opteryx
result = opteryx.query("SELECT * FROM 'space_missions.parquet' LIMIT 5;")
result.head()
| Company | Location | Price | Launched_at | Rocket | Rocket_Status | Mission | Mission_Status
=====#===========#================================#=======#=====================#================#===============#================#================
0 | RVSN USSR | Site 1/5, Baikonur Cosmodrome, | None | 1957-10-04 19:28:00 | Sputnik 8K71PS | Retired | Sputnik-1 | Success
1 | RVSN USSR | Site 1/5, Baikonur Cosmodrome, | None | 1957-11-03 02:30:00 | Sputnik 8K71PS | Retired | Sputnik-2 | Success
2 | US Navy | LC-18A, Cape Canaveral AFS, Fl | None | 1957-12-06 16:44:00 | Vanguard | Retired | Vanguard TV3 | Failure
3 | AMBA | LC-26A, Cape Canaveral AFS, Fl | None | 1958-02-01 03:48:00 | Juno I | Retired | Explorer 1 | Success
4 | US Navy | LC-18A, Cape Canaveral AFS, Fl | None | 1958-02-05 07:33:00 | Vanguard | Retired | Vanguard TV3BU | Failure
this example requires a data file, space_missions.parquet.
Query Data in SQLite
In this example, we are querying a SQLite database via Opteryx. This example will not run as written because the file being queried does not exist.
import opteryx
from opteryx.connectors import SqlConnector
# Register the store, so we know queries for the 'sql' store should be handled by
# the SQL Connector
opteryx.register_store(
prefix="sql",
connector=SqlConnector,
remove_prefix=True, # the prefix isn't part of the SQLite table name
connection="sqlite:///database.db" # SQLAlchemy connection string
)
result = opteryx.query("SELECT name, mass, diameter, density FROM sql.planets LIMIT 5;")
result.head()
│ name │ mass │ diameter │ density
═══#═════════#════════#══════════#════════
0 │ Mercury │ 0.33 │ 4879.0 │ 5427.0
1 │ Venus │ 4.87 │ 12104.0 │ 5243.0
2 │ Earth │ 5.97 │ 12756.0 │ 5514.0
3 │ Mars │ 0.642 │ 6792.0 │ 3933.0
4 │ Jupiter │ 1898.0 │ 142984.0 │ 1326.0
this example requires a data file, database.db.
Query Data on GCS
In this example, we are to querying a dataset on GCS in a public bucket called 'opteryx'.
import opteryx
from opteryx.connectors import GcpCloudStorageConnector
# Register the store, so we know queries for this store should be handled by
# the GCS connector
opteryx.register_store("opteryx", GcpCloudStorageConnector)
result = opteryx.query("SELECT * FROM opteryx.space_missions LIMIT 5;")
result.head()
| Company | Location | Price | Launched_at | Rocket | Rocket_Status | Mission | Mission_Status
=====#===========#================================#=======#=====================#================#===============#================#================
0 | RVSN USSR | Site 1/5, Baikonur Cosmodrome, | None | 1957-10-04 19:28:00 | Sputnik 8K71PS | Retired | Sputnik-1 | Success
1 | RVSN USSR | Site 1/5, Baikonur Cosmodrome, | None | 1957-11-03 02:30:00 | Sputnik 8K71PS | Retired | Sputnik-2 | Success
2 | US Navy | LC-18A, Cape Canaveral AFS, Fl | None | 1957-12-06 16:44:00 | Vanguard | Retired | Vanguard TV3 | Failure
3 | AMBA | LC-26A, Cape Canaveral AFS, Fl | None | 1958-02-01 03:48:00 | Juno I | Retired | Explorer 1 | Success
4 | US Navy | LC-18A, Cape Canaveral AFS, Fl | None | 1958-02-05 07:33:00 | Vanguard | Retired | Vanguard TV3BU | Failure
this example is complete and should run as-is
Further Examples
For prepared example usage, see the below Notebooks on Google Colab.
Community
How do I get Support?
For support ask our Gitter Community.
How Can I Contribute?
We are looking for volunteers to help build and direct Opteryx. If you are interested please use the Issues to let use know.
All contributions, bug reports, documentation improvements, enhancements, and ideas are welcome.
Want to help build Opteryx? See the Contribution and Set Up Guides.
Security
See the project Security Policy for information about reporting vulnerabilities.
License
Opteryx is licensed under Apache 2.0 unless otherwise noted.
Status
Opteryx is in beta. Beta means different things to different people, to us, being beta means:
- Core functionality has good regression test coverage to help ensure stability
- Some edge cases may have undetected bugs
- Performance tuning is incomplete
- Changes are focused on feature completion, bugs, performance, reducing debt, and security
- Code structure and APIs are not stable and may change
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for opteryx-0.9.3-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bffa96dcde16f6e348866b8875a208f36a2a059e84765886da9f2f038acb092e |
|
MD5 | a919aa5ea3d7a7a36d149edc8b703dcb |
|
BLAKE2b-256 | 500db4f1fe30f374b8312e6b31afb7b08656457d3e5094df023d336270531f94 |
Hashes for opteryx-0.9.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff393b62ec4e97539e9a85ce2f0821bc914c3cd59fc85a76fc8c654f6658487b |
|
MD5 | bcb82837d28717f977cc9cebcd129f0e |
|
BLAKE2b-256 | 31984c22c711312784a65c201db976cfca1192c251680d30df60a873544799d4 |
Hashes for opteryx-0.9.3-cp311-cp311-macosx_10_15_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86e1d9638b0ac8900a238cd2c2827d6f2c2383b175c2bf31e15238c038cf284a |
|
MD5 | 465f5004dcef38e13038a49c6aa604ed |
|
BLAKE2b-256 | 1c35362705b5b9226d82759b542bee149f08a1410dd8043da4272466300c54da |
Hashes for opteryx-0.9.3-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c1d1de69182abff02e6221d09d3c6745680ea1d93876594b65942b2eb881d96 |
|
MD5 | f2d02bf27abad9f9f6eb83a78187a131 |
|
BLAKE2b-256 | ca88c10c1e9fbf1e9ef9344163ef7d540525c2f64d15b6c52d34ef195a646dd8 |
Hashes for opteryx-0.9.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5851f127c090bce178b576dc9e631dd7771bd24a81ce8681e6dfb1057b5c2f22 |
|
MD5 | e7cf9fd681481417d4f30350ca9e3082 |
|
BLAKE2b-256 | ec3c189a461bad211f3c0b5171bb80d092fd1a395ce195d402289c426f704ff6 |
Hashes for opteryx-0.9.3-cp310-cp310-macosx_10_15_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 23205e987afb00a33dff4eed2843e310bb821768cbb8cd01842622e4f644b072 |
|
MD5 | 0cc9e16965779b71fceb67141f4181d8 |
|
BLAKE2b-256 | a7cc91fcb523ec5c7f0ba574bb4ec6a3e0479cf99159aea2f4198c2be755046c |
Hashes for opteryx-0.9.3-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12fff5e1155d23900065487a352bb0f54e7618d52ef077c3779fc77d5b93eaea |
|
MD5 | 35afcc6c0e8fe4fb21f633b53e32de44 |
|
BLAKE2b-256 | 8707566db601a8ed7f91dfbd6c526ddffdd437091d504263be5732f9c0dca9ca |
Hashes for opteryx-0.9.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a74fef408138ecce5a4994220b332c07af7390277ae636bcb5ae661ce6ce8e8 |
|
MD5 | 847275bd4f0c300bf05e0ad0083eb2e4 |
|
BLAKE2b-256 | 2de6d08b07c8056c4b8b82065a4eadc5138405a2a2763d9ae5bd4ee97b801976 |
Hashes for opteryx-0.9.3-cp39-cp39-macosx_10_15_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6190b4762f0db9ce092661ccf3bf355698a7efda0cb3e9786620a64cb27f1967 |
|
MD5 | 35be2feed04b9dbde1a12f7e345b8a65 |
|
BLAKE2b-256 | f7b836afaed6e23a0a06b8a6ca512d5e1df0066b5799022868cb8f6f238cb50d |
Hashes for opteryx-0.9.3-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d282bf5fd226e50b52a522e532048d8b68ef383230c4825cfde94935abca72f |
|
MD5 | 818f2e2a27e6f91fc204dd91bd88e783 |
|
BLAKE2b-256 | 725b660d251eb4c45afddccc7917e0986c0e28c00b4cb24f24ceccb5ffae46ac |
Hashes for opteryx-0.9.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41d6b358145dd8098e826669b1c96bfe6715335fc3a40b353d1cfe226307e1eb |
|
MD5 | 7f70326a3d74230614db4f35104e0726 |
|
BLAKE2b-256 | 5bc36f54d4582a49a64b3e4a2474548e89b3853f750744d5bd8f63555985b02e |
Hashes for opteryx-0.9.3-cp38-cp38-macosx_10_15_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1734423ac3fb4cbd1a67a10b77248c314c264823ef8510729d8a847494c9a8cc |
|
MD5 | dc466af3ddd903f0fc237128784e5a6f |
|
BLAKE2b-256 | 935841a2b806e47fc8d1d83d662f277406cc7265e29774afa9abcb838be7a665 |