OSS Implementation of a DataJunction Query Service
Project description
DataJunction Query Service
This repository (DJQS) is an open source implementation of a DataJunction query service. It allows you to create catalogs and engines that represent sqlalchemy connections. Configuring a DJ server to use a DJQS server allows DJ to query any of the database technologies supported by sqlalchemy.
Quickstart
To get started, clone this repo and start up the docker compose environment.
git clone https://github.com/DataJunction/djqs
cd djqs
docker compose up
Creating Catalogs
Catalogs can be created using the POST /catalogs/
endpoint.
curl -X 'POST' \
'http://localhost:8001/catalogs/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"name": "djdb"
}'
Creating Engines
Engines can be created using the POST /engines/
endpoint.
curl -X 'POST' \
'http://localhost:8001/engines/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"name": "sqlalchemy-postgresql",
"version": "15.2",
"uri": "postgresql://dj:dj@postgres-roads:5432/djdb"
}'
Engines can be attached to existing catalogs using the POST /catalogs/{name}/engines/
endpoint.
curl -X 'POST' \
'http://localhost:8001/catalogs/djdb/engines/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '[
{
"name": "sqlalchemy-postgresql",
"version": "15.2"
}
]'
Executing Queries
Queries can be submitted to DJQS for a specified catalog and engine.
curl -X 'POST' \
'http://localhost:8001/queries/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"catalog_name": "djdb",
"engine_name": "sqlalchemy-postgresql",
"engine_version": "15.2",
"submitted_query": "SELECT * from roads.repair_orders",
"async_": false
}'
Async queries can be submitted as well.
curl -X 'POST' \
'http://localhost:8001/queries/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"catalog_name": "djdb",
"engine_name": "sqlalchemy-postgresql",
"engine_version": "15.2",
"submitted_query": "SELECT * from roads.repair_orders",
"async_": true
}'
response
{
"catalog_name": "djdb",
"engine_name": "sqlalchemy-postgresql",
"engine_version": "15.2",
"id": "<QUERY ID HERE>",
"submitted_query": "SELECT * from roads.repair_orders",
"executed_query": null,
"scheduled": null,
"started": null,
"finished": null,
"state": "ACCEPTED",
"progress": 0,
"results": [],
"next": null,
"previous": null,
"errors": []
}
The query id provided in the response can then be used to check the status of the running query and get the results once it’s completed.
curl -X 'GET' \
'http://localhost:8001/queries/<QUERY ID HERE>/' \
-H 'accept: application/json'
response
{
"catalog_name": "djdb",
"engine_name": "sqlalchemy-postgresql",
"engine_version": "15.2",
"id": "$QUERY_ID",
"submitted_query": "SELECT * from roads.repair_orders",
"executed_query": "SELECT * from roads.repair_orders",
"scheduled": "2023-02-28T07:27:55.367162",
"started": "2023-02-28T07:27:55.367387",
"finished": "2023-02-28T07:27:55.502412",
"state": "FINISHED",
"progress": 1,
"results": [
{
"sql": "SELECT * from roads.repair_orders",
"columns": [...],
"rows": [...],
"row_count": 25
}
],
"next": null,
"previous": null,
"errors": []
}
Reflection
If running a [reflection service](https://github.com/DataJunction/djrs), that service can leverage the
POST /table/{table}/columns/
endpoint of DJQS to get column names and types for a given table.
curl -X 'GET' \
'http://localhost:8001/table/djdb.roads.repair_orders/columns/?engine=sqlalchemy-postgresql&engine_version=15.2' \
-H 'accept: application/json'
response
{
"name": "djdb.roads.repair_orders",
"columns": [
{
"name": "repair_order_id",
"type": "INT"
},
{
"name": "municipality_id",
"type": "STR"
},
{
"name": "hard_hat_id",
"type": "INT"
},
{
"name": "order_date",
"type": "DATE"
},
{
"name": "required_date",
"type": "DATE"
},
{
"name": "dispatched_date",
"type": "DATE"
},
{
"name": "dispatcher_id",
"type": "INT"
}
]
}
DuckDB
DJQS includes an example of using DuckDB as an engine and it comes preloaded with the roads example database.
Create a djduckdb
catalog and a duckdb
engine.
curl -X 'POST' \
'http://localhost:8001/catalogs/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"name": "djduckdb"
}'
curl -X 'POST' \
'http://localhost:8001/engines/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"name": "duckdb",
"version": "0.7.1",
"uri": "duckdb://local[*]"
}'
curl -X 'POST' \
'http://localhost:8001/catalogs/djduckdb/engines/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '[
{
"name": "duckdb",
"version": "0.7.1"
}
]'
Now you can submit DuckDB SQL queries.
curl -X 'POST' \
'http://localhost:8001/queries/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"catalog_name": "djduckdb",
"engine_name": "duckdb",
"engine_version": "0.7.1",
"submitted_query": "SELECT * FROM roads.us_states LIMIT 10",
"async_": false
}'
Spark
DJQS includes an example of using Spark as an engine. To try it, start up the docker compose environment and then load the example roads database into Spark.
docker exec -it djqs /bin/bash -c "python /code/docker/spark_load_roads.py"
Next, create a djspark
catalog and a spark
engine.
curl -X 'POST' \
'http://localhost:8001/catalogs/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"name": "djspark"
}'
curl -X 'POST' \
'http://localhost:8001/engines/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"name": "spark",
"version": "3.3.2",
"uri": "spark://local[*]"
}'
curl -X 'POST' \
'http://localhost:8001/catalogs/djspark/engines/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '[
{
"name": "spark",
"version": "3.3.2"
}
]'
Now you can submit Spark SQL queries.
curl -X 'POST' \
'http://localhost:8001/queries/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"catalog_name": "djspark",
"engine_name": "spark",
"engine_version": "3.3.2",
"submitted_query": "SELECT * FROM roads.us_states LIMIT 10",
"async_": false
}'
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file datajunction_query-0.0.1a67.tar.gz
.
File metadata
- Download URL: datajunction_query-0.0.1a67.tar.gz
- Upload date:
- Size: 140.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.27.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | beb7b8d74c45bd053726d35c6b5a2f0fe398c97a52457fdefc60a0af5ba43ff5 |
|
MD5 | 1e4831c408a7e1aaa2d20b8914bc611d |
|
BLAKE2b-256 | 970957cb750219037603bc2857eaa43a48988f6d5294f751f16b1a4931948c52 |
File details
Details for the file datajunction_query-0.0.1a67-py3-none-any.whl
.
File metadata
- Download URL: datajunction_query-0.0.1a67-py3-none-any.whl
- Upload date:
- Size: 23.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.27.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | de776e85061ebeb818cf0eaec528de25648f8601f292f31354c8e110185edc80 |
|
MD5 | a010fbd74d00152e28ae55c10fc6b2d2 |
|
BLAKE2b-256 | 460a83ef07f6526745458bb4ed0913b1cdf31c7bcc268391c591c01feeb0da9b |