Skip to main content

OSS Implementation of a DataJunction Query Service

Project description

DataJunction Query Service

This repository (DJQS) is an open source implementation of a DataJunction query service. It allows you to create catalogs and engines that represent sqlalchemy connections. Configuring a DJ server to use a DJQS server allows DJ to query any of the database technologies supported by sqlalchemy.

Quickstart

To get started, clone this repo and start up the docker compose environment.

git clone https://github.com/DataJunction/djqs
cd djqs
docker compose up

Creating Catalogs

Catalogs can be created using the POST /catalogs/ endpoint.

curl -X 'POST' \
  'http://localhost:8001/catalogs/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "djdb"
}'

Creating Engines

Engines can be created using the POST /engines/ endpoint.

curl -X 'POST' \
  'http://localhost:8001/engines/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "sqlalchemy-postgresql",
  "version": "15.2",
  "uri": "postgresql://dj:dj@postgres-roads:5432/djdb"
}'

Engines can be attached to existing catalogs using the POST /catalogs/{name}/engines/ endpoint.

curl -X 'POST' \
  'http://localhost:8001/catalogs/djdb/engines/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '[
  {
    "name": "sqlalchemy-postgresql",
    "version": "15.2"
  }
]'

Executing Queries

Queries can be submitted to DJQS for a specified catalog and engine.

curl -X 'POST' \
  'http://localhost:8001/queries/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "catalog_name": "djdb",
  "engine_name": "sqlalchemy-postgresql",
  "engine_version": "15.2",
  "submitted_query": "SELECT * from roads.repair_orders",
  "async_": false
}'

Async queries can be submitted as well.

curl -X 'POST' \
  'http://localhost:8001/queries/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "catalog_name": "djdb",
  "engine_name": "sqlalchemy-postgresql",
  "engine_version": "15.2",
  "submitted_query": "SELECT * from roads.repair_orders",
  "async_": true
}'

response

{
  "catalog_name": "djdb",
  "engine_name": "sqlalchemy-postgresql",
  "engine_version": "15.2",
  "id": "<QUERY ID HERE>",
  "submitted_query": "SELECT * from roads.repair_orders",
  "executed_query": null,
  "scheduled": null,
  "started": null,
  "finished": null,
  "state": "ACCEPTED",
  "progress": 0,
  "results": [],
  "next": null,
  "previous": null,
  "errors": []
}

The query id provided in the response can then be used to check the status of the running query and get the results once it’s completed.

curl -X 'GET' \
  'http://localhost:8001/queries/<QUERY ID HERE>/' \
  -H 'accept: application/json'

response

{
  "catalog_name": "djdb",
  "engine_name": "sqlalchemy-postgresql",
  "engine_version": "15.2",
  "id": "$QUERY_ID",
  "submitted_query": "SELECT * from roads.repair_orders",
  "executed_query": "SELECT * from roads.repair_orders",
  "scheduled": "2023-02-28T07:27:55.367162",
  "started": "2023-02-28T07:27:55.367387",
  "finished": "2023-02-28T07:27:55.502412",
  "state": "FINISHED",
  "progress": 1,
  "results": [
    {
      "sql": "SELECT * from roads.repair_orders",
      "columns": [...],
      "rows": [...],
      "row_count": 25
    }
  ],
  "next": null,
  "previous": null,
  "errors": []
}

Reflection

If running a [reflection service](https://github.com/DataJunction/djrs), that service can leverage the POST /table/{table}/columns/ endpoint of DJQS to get column names and types for a given table.

curl -X 'GET' \
  'http://localhost:8001/table/djdb.roads.repair_orders/columns/?engine=sqlalchemy-postgresql&engine_version=15.2' \
  -H 'accept: application/json'

response

{
  "name": "djdb.roads.repair_orders",
  "columns": [
    {
      "name": "repair_order_id",
      "type": "INT"
    },
    {
      "name": "municipality_id",
      "type": "STR"
    },
    {
      "name": "hard_hat_id",
      "type": "INT"
    },
    {
      "name": "order_date",
      "type": "DATE"
    },
    {
      "name": "required_date",
      "type": "DATE"
    },
    {
      "name": "dispatched_date",
      "type": "DATE"
    },
    {
      "name": "dispatcher_id",
      "type": "INT"
    }
  ]
}

DuckDB

DJQS includes an example of using DuckDB as an engine and it comes preloaded with the roads example database.

Create a djduckdb catalog and a duckdb engine.

curl -X 'POST' \
  'http://localhost:8001/catalogs/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "djduckdb"
}'
curl -X 'POST' \
  'http://localhost:8001/engines/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "duckdb",
  "version": "0.7.1",
  "uri": "duckdb://local[*]"
}'
curl -X 'POST' \
  'http://localhost:8001/catalogs/djduckdb/engines/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '[
  {
    "name": "duckdb",
    "version": "0.7.1"
  }
]'

Now you can submit DuckDB SQL queries.

curl -X 'POST' \
  'http://localhost:8001/queries/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "catalog_name": "djduckdb",
  "engine_name": "duckdb",
  "engine_version": "0.7.1",
  "submitted_query": "SELECT * FROM roads.us_states LIMIT 10",
  "async_": false
}'

Spark

DJQS includes an example of using Spark as an engine. To try it, start up the docker compose environment and then load the example roads database into Spark.

docker exec -it djqs /bin/bash -c "python /code/docker/spark_load_roads.py"

Next, create a djspark catalog and a spark engine.

curl -X 'POST' \
  'http://localhost:8001/catalogs/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "djspark"
}'
curl -X 'POST' \
  'http://localhost:8001/engines/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "spark",
  "version": "3.3.2",
  "uri": "spark://local[*]"
}'
curl -X 'POST' \
  'http://localhost:8001/catalogs/djspark/engines/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '[
  {
    "name": "spark",
    "version": "3.3.2"
  }
]'

Now you can submit Spark SQL queries.

curl -X 'POST' \
  'http://localhost:8001/queries/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "catalog_name": "djspark",
  "engine_name": "spark",
  "engine_version": "3.3.2",
  "submitted_query": "SELECT * FROM roads.us_states LIMIT 10",
  "async_": false
}'

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datajunction_query-0.0.1a66.tar.gz (140.5 kB view details)

Uploaded Source

Built Distribution

datajunction_query-0.0.1a66-py3-none-any.whl (23.1 kB view details)

Uploaded Python 3

File details

Details for the file datajunction_query-0.0.1a66.tar.gz.

File metadata

File hashes

Hashes for datajunction_query-0.0.1a66.tar.gz
Algorithm Hash digest
SHA256 5d73b7719028838d4ac772e9902258a6203871fa01d1f193037a188745256293
MD5 b5bb53ca3764de3d569ce6a60ee174e6
BLAKE2b-256 4c6305c8d3148c1fc158623853cc6610ebbc24dce598458309f75126c72bacd8

See more details on using hashes here.

File details

Details for the file datajunction_query-0.0.1a66-py3-none-any.whl.

File metadata

File hashes

Hashes for datajunction_query-0.0.1a66-py3-none-any.whl
Algorithm Hash digest
SHA256 6e591b4edb91b70f120afbc3caa6c2b52ca842aac9646174d358e4a98df79087
MD5 cc642381a95b6b2b6469119aeca750ac
BLAKE2b-256 df8f2bf2a756bb54d9eb1653a5f0e4f7690e90ee159dcabd8991c831760683ab

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page