Skip to main content

OSS Implementation of a DataJunction Query Service

Project description

DataJunction Query Service

This repository (DJQS) is an open source implementation of a DataJunction query service. It allows you to create catalogs and engines that represent sqlalchemy connections. Configuring a DJ server to use a DJQS server allows DJ to query any of the database technologies supported by sqlalchemy.

Quickstart

To get started, clone this repo and start up the docker compose environment.

git clone https://github.com/DataJunction/djqs
cd djqs
docker compose up

Creating Catalogs

Catalogs can be created using the POST /catalogs/ endpoint.

curl -X 'POST' \
  'http://localhost:8001/catalogs/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "djdb"
}'

Creating Engines

Engines can be created using the POST /engines/ endpoint.

curl -X 'POST' \
  'http://localhost:8001/engines/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "sqlalchemy-postgresql",
  "version": "15.2",
  "uri": "postgresql://dj:dj@postgres-roads:5432/djdb"
}'

Engines can be attached to existing catalogs using the POST /catalogs/{name}/engines/ endpoint.

curl -X 'POST' \
  'http://localhost:8001/catalogs/djdb/engines/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '[
  {
    "name": "sqlalchemy-postgresql",
    "version": "15.2"
  }
]'

Executing Queries

Queries can be submitted to DJQS for a specified catalog and engine.

curl -X 'POST' \
  'http://localhost:8001/queries/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "catalog_name": "djdb",
  "engine_name": "sqlalchemy-postgresql",
  "engine_version": "15.2",
  "submitted_query": "SELECT * from roads.repair_orders",
  "async_": false
}'

Async queries can be submitted as well.

curl -X 'POST' \
  'http://localhost:8001/queries/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "catalog_name": "djdb",
  "engine_name": "sqlalchemy-postgresql",
  "engine_version": "15.2",
  "submitted_query": "SELECT * from roads.repair_orders",
  "async_": true
}'

response

{
  "catalog_name": "djdb",
  "engine_name": "sqlalchemy-postgresql",
  "engine_version": "15.2",
  "id": "<QUERY ID HERE>",
  "submitted_query": "SELECT * from roads.repair_orders",
  "executed_query": null,
  "scheduled": null,
  "started": null,
  "finished": null,
  "state": "ACCEPTED",
  "progress": 0,
  "results": [],
  "next": null,
  "previous": null,
  "errors": []
}

The query id provided in the response can then be used to check the status of the running query and get the results once it’s completed.

curl -X 'GET' \
  'http://localhost:8001/queries/<QUERY ID HERE>/' \
  -H 'accept: application/json'

response

{
  "catalog_name": "djdb",
  "engine_name": "sqlalchemy-postgresql",
  "engine_version": "15.2",
  "id": "$QUERY_ID",
  "submitted_query": "SELECT * from roads.repair_orders",
  "executed_query": "SELECT * from roads.repair_orders",
  "scheduled": "2023-02-28T07:27:55.367162",
  "started": "2023-02-28T07:27:55.367387",
  "finished": "2023-02-28T07:27:55.502412",
  "state": "FINISHED",
  "progress": 1,
  "results": [
    {
      "sql": "SELECT * from roads.repair_orders",
      "columns": [...],
      "rows": [...],
      "row_count": 25
    }
  ],
  "next": null,
  "previous": null,
  "errors": []
}

Reflection

If running a [reflection service](https://github.com/DataJunction/djrs), that service can leverage the POST /table/{table}/columns/ endpoint of DJQS to get column names and types for a given table.

curl -X 'GET' \
  'http://localhost:8001/table/djdb.roads.repair_orders/columns/?engine=sqlalchemy-postgresql&engine_version=15.2' \
  -H 'accept: application/json'

response

{
  "name": "djdb.roads.repair_orders",
  "columns": [
    {
      "name": "repair_order_id",
      "type": "INT"
    },
    {
      "name": "municipality_id",
      "type": "STR"
    },
    {
      "name": "hard_hat_id",
      "type": "INT"
    },
    {
      "name": "order_date",
      "type": "DATE"
    },
    {
      "name": "required_date",
      "type": "DATE"
    },
    {
      "name": "dispatched_date",
      "type": "DATE"
    },
    {
      "name": "dispatcher_id",
      "type": "INT"
    }
  ]
}

DuckDB

DJQS includes an example of using DuckDB as an engine and it comes preloaded with the roads example database.

Create a djduckdb catalog and a duckdb engine.

curl -X 'POST' \
  'http://localhost:8001/catalogs/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "djduckdb"
}'
curl -X 'POST' \
  'http://localhost:8001/engines/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "duckdb",
  "version": "0.7.1",
  "uri": "duckdb://local[*]"
}'
curl -X 'POST' \
  'http://localhost:8001/catalogs/djduckdb/engines/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '[
  {
    "name": "duckdb",
    "version": "0.7.1"
  }
]'

Now you can submit DuckDB SQL queries.

curl -X 'POST' \
  'http://localhost:8001/queries/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "catalog_name": "djduckdb",
  "engine_name": "duckdb",
  "engine_version": "0.7.1",
  "submitted_query": "SELECT * FROM roads.us_states LIMIT 10",
  "async_": false
}'

Spark

DJQS includes an example of using Spark as an engine. To try it, start up the docker compose environment and then load the example roads database into Spark.

docker exec -it djqs /bin/bash -c "python /code/docker/spark_load_roads.py"

Next, create a djspark catalog and a spark engine.

curl -X 'POST' \
  'http://localhost:8001/catalogs/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "djspark"
}'
curl -X 'POST' \
  'http://localhost:8001/engines/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "spark",
  "version": "3.3.2",
  "uri": "spark://local[*]"
}'
curl -X 'POST' \
  'http://localhost:8001/catalogs/djspark/engines/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '[
  {
    "name": "spark",
    "version": "3.3.2"
  }
]'

Now you can submit Spark SQL queries.

curl -X 'POST' \
  'http://localhost:8001/queries/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "catalog_name": "djspark",
  "engine_name": "spark",
  "engine_version": "3.3.2",
  "submitted_query": "SELECT * FROM roads.us_states LIMIT 10",
  "async_": false
}'

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datajunction_query-0.0.18.tar.gz (147.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datajunction_query-0.0.18-py3-none-any.whl (23.3 kB view details)

Uploaded Python 3

File details

Details for the file datajunction_query-0.0.18.tar.gz.

File metadata

  • Download URL: datajunction_query-0.0.18.tar.gz
  • Upload date:
  • Size: 147.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.1 cpython/3.10.19 HTTPX/0.28.1

File hashes

Hashes for datajunction_query-0.0.18.tar.gz
Algorithm Hash digest
SHA256 3aae333b721ec5321590df7bda2e79bc6a3ff6340471c1331d2e60d1d0510d59
MD5 b4ba7a2441292fa9bacc36382b67cc63
BLAKE2b-256 b1631d9e13b5db96d6df87a5a10363e8793ce33c0eddd65f0a96ba8bee6f2dc3

See more details on using hashes here.

File details

Details for the file datajunction_query-0.0.18-py3-none-any.whl.

File metadata

File hashes

Hashes for datajunction_query-0.0.18-py3-none-any.whl
Algorithm Hash digest
SHA256 f3de05864f81bba6ba2763fcf5dbd1e836569ae9356448decf0fb3203a55ea7c
MD5 aa9179ae5f3df27af1a0832981f20f07
BLAKE2b-256 5cefd9b199a3990c82fdf754ce7c0d1c904310c68882bb12eba99c5a7bbfb0d4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page