Skip to main content

A Jupyter Kernel for DuckDB with Unity Catalog

Project description

Github Actions Status

Dunky

A Jupyter Kernel for DuckDB with Unity Catalog.

Dunky Demo

Description

Dunky is a Jupyter kernel that allows you to run DuckDB queries with Unity Catalog integration directly from your Jupyter notebooks.

I created this extension because existing solutions such as jupysql require you to use magics, load uc_catalog, delta, and manage secrets and don't work well with duckdb's uc_catalog extension.

Features

  • Run DuckDB queries in Jupyter notebooks
  • Unity Catalog integration
  • No need to use magics
  • Nice output formatting
  • No need to load uc_catalog, delta and manage secrets
  • CREATE EXTERNAL TABLE [table_name] LOCATION [location] OPTIONS [options] to create a Unity Catalog delta table

Installation

To install Dunky, you can use the following commands:

pip install dunky

Usage

Select the "Dunky" kernel from the kernel selection menu in JupyterLab or Jupyter Notebook.

Only required step is to attach a catalog from the Unity Catalog using the ATTACH DATABASE command.

ATTACH DATABASE 'unity' AS unity (TYPE UC_CATALOG);

You don't need to set up a connection or manage credentials, as Dunky handles all of that for you.

Configure Unity Catalog

You can set the following environment variables to configure Unity Catalog:

Make sure to set these env variables are available before the kernel is started.

  • UC_ENDPOINT: The endpoint of the Unity Catalog server.
  • UC_TOKEN: The token to authenticate with the Unity Catalog server.
  • UC_AWS_REGION: The AWS region to use for the Unity Catalog server.

These settings default to localhost:8080/api/2.1/unity-catalog, not-used, and eu-west-1 respectively.

If you want to update these settings after the kernel has started, you can use the ENV command. e.g.,

ENV UC_ENDPOINT=http://localhost:8080/api/2.1/unity-catalog
    UC_TOKEN=your-token
    UC_AWS_REGION=eu-west-1

For these changes to take effect, you will need to reload the secret.

RELOAD SECRET;

If database is already attached, you can detach and reattach it to apply the changes. e.g.,

S3 Integration

Dunky supports AWS S3 integration with Unity Catalog.

  • prerequisite:
    • Make sure the unity catalog has S3 bucket authentication configured
  • Writing to S3: in the CREATE EXTERNAL TABLE set location to your s3:// location

writing to s3 runs via delta-rs. you can provide additional storage options for delta-rs with the OPTIONS clause. e.g., OPTIONS (storage_account='your-storage-account', storage_key='your-storage-key', storage_container='your-storage-container') If writing to S3, storage credentials are obtained from the Unity Catalog server using the provided token.

ps. Dunky might also work with gcp and azure, but have not tested this. depends on whether unity and duckdb uc_catalog support it. I've seen some people confirming that unity catalog and duckdb can work with Azure and gcp.

Example docker

In the docker folder, you can find an example of how to run JupyterLab with Dunky and Unity Catalog in Docker containers. To run the example, execute:

cd docker
docker compose up --build -d

token/password = dunky

If not already selected, you can find Dunky kernel in the kernel list.

Remarks

  • This kernel is still in development and may have some bugs.
  • This extension works well together with the junity extension.

Issues?

If you encounter any issues, please open an issue on the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dunky-0.3.0.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dunky-0.3.0-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file dunky-0.3.0.tar.gz.

File metadata

  • Download URL: dunky-0.3.0.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dunky-0.3.0.tar.gz
Algorithm Hash digest
SHA256 3d5f64fc2539b19356e95016a8bf5fecc87d59e58daee46c37bbb574ea96ea49
MD5 84b0974b5c0741583c3cb129d4d3cc11
BLAKE2b-256 3a7a5cdf37613aa46846d047fa0d31b8bb25d45494bb358383b5abe561a0ed86

See more details on using hashes here.

Provenance

The following attestation bundles were made for dunky-0.3.0.tar.gz:

Publisher: build-and-publish.yml on dan1elt0m/dunky

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dunky-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: dunky-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dunky-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f7d2a924cc18b2203f37ab8fffaf0e8d79eca845e688806d20b5e34b7ff0e9a
MD5 52fc0d97b1117562f0a29dfe06b47e7d
BLAKE2b-256 73d59bb15807113244725b08b44e05661c9cb3c989638a951bef17fe75e8a628

See more details on using hashes here.

Provenance

The following attestation bundles were made for dunky-0.3.0-py3-none-any.whl:

Publisher: build-and-publish.yml on dan1elt0m/dunky

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page