Skip to main content

DuckDB Cell (%%dql) and Line (%dql) Magics for Jupyter and VSCode

Project description

magic_duckdb

DuckDB Cell (%%dql) and Line (%dql) Magics for Jupyter and VSCode

Motivation

magic_duckdb was created to:

  • Provide simple cell/line magics with minimal code and zero dependencies
  • Match performance of DuckDB python API
  • Be a simple starting point to add other useful features with minimal complexity
  • Bundle useful features, like using OpenAI to improve SQL, sql formatting (beautifying) and explain analyze graphics

Why not the %sql magics (jupysql or ipython-sql)?

The goal of this project is to expose the native features of duckdb, with minimal dependencies, such as exporting to arrow tables or using DuckDB relation objects.

Simplicity

The goal of this project was to provide minimal line & cell magics for DuckDB in Jupyter notebooks with minimal dependencies and as simply as possible.

The core code is concentrated in two places:

  • magic.py: Barebones cell and line magic that parses arguments, and executes statements
  • duckdb_mode.py: execute() calls the appropriate method based on the user selected output type.

Features that require additional dependencies, such as Jinja2 for the --jinja2 feature, are imported dynamically.

Quick Start

%pip install magic_duckdb --upgrade --quiet
%load_ext magic_duckdb
%dql select * from range(100)

Usage

Connection:
-cn <connection_string>: Create a new connection to a DuckDB. If passed without a query, this changes the global connection, otherwise used just for the current query.
    %dql -cn myfile.db
-co <connection_object>: Use an existing DuckDB Connection. If passed without a query, this changes the global connection, otherwise used just for the current query.
    con = duckdb.connect("somefile.db")
    %dql -co con
-d: Use the duckdb.default_connection. If passed without a query, this changes the global connection, otherwise used just for the current query.
-g | --getcon: Get the current connection
    con = %dql --getcon
--close: Close current connection

Modes:
-t <type> [default: df]: Selects the type for this request. If passed without a query, this changes the global default, otherwise used just for the current query.
-e <explain_mode>: Display the explain plan, or explain analyze, or AST

Options:
-l | --listtypes: Returns a list of available output types, for -t

-j | --jinja2: Process the SQL as a jinja template, using the user_ns environment.
    `%dql -j select * from {{var1}}`
-p | --params: Pass the specified parameter(s) as a SQL parameters
    `%dql -p obj1 select
-o <var>: Stores the resulting output in a variable named <var>
-tp <name> <value>: Pass a kwarg to the type function. Intended to be used to pass parameters to .show(). Must be passed on each call, not saved.
    `%dql -t show -tp max_rows 10 <query>`

Extras:
--tables: Returns tables used by the query
-f: Format the string using npx sql-formatter
Other:
-ai / -aichat: Route request to OpenAI
    %dql -ai Fix selct star frm mytable

See notebooks for usage examples.

Enabling Frame Scanning

To reference objects that are in the Jupyter notebook local scope, enable python_scan_all_frames. This is a DuckDB feature that searches the locals of the frame stack to find dataframes and other objects. %dql set python_scan_all_frames = True;

Modifying the MAGIC NAME from DQL to SQL

To use %sql and %%sql instead of the default dql, do the following (before loading the extension):

import magic_duckdb
magic_duckdb.MAGIC_NAME = "sql"

%load_ext magic_duckdb

Using "sql" as the name may help the LSP automatically choose SQL syntax highlighting.

Usage Details

  • %dql -t [df | arrow | pl | relation | show | df_markdown] <query>: Equivalent to - connection.sql(query).<type>(). df_markdown requires tabulate package.
  • %dql -e [explain | explain_analyze_tree | explain_analyze_json | explain_analyze_draw | analyze | ast_json | ast_tree | ast_draw] <query>:
    • explain is equivalent to connection.sql(query).explain()
    • explain_analyze_* options enable profiling (PRAGMA enable_profiling), and use connection.sql(query).explain(type='analyze')
    • explain_analyze_draw requires graphviz to be installed on your system, and "dot" or "dot.exe" must be in your PATH or added via magic.explain_analyze_graphviz.dot_path = 'path_to_dot'. The graphviz python module must also be installed: pip install graphviz
    • The AST options use json_serialize_sql to describe the SQL. ast_json displays the raw json result, ast_tree displays an indented tree, and ast_draw uses graphviz to draw a graphical version of the tree
  • %dql --format <query> uses sql-formatter. This is a javascript library, so it needs to be installed separately, although it's executed via npx so should be fine as long as you have npx / node in your path.
  • %dql --tables <query> returns the list of tables used by the query, equivalent to: duckdb.get_table_names("SELECT * FROM xyz")
  • %dql [-ai | -aichat] fix <query> passes the current schema to OpenAI and askes OpenAI to "fix" the query. An OpenAI developer key is required.
  • %dql -r [readfile] loads the query from the file path specified. Variable expansion is not supported at this time, file path must be a string.
            # to set openai key
            from magic_duckdb.extras import sql_ai
            sql_ai.OPENAI_KEY = openai_key

Autocompletion: Work in Progress

There are two different Autocompletion implementations, one for MatcherAPIv2 and the other (pre-ipython 8.6.0) for MatcherAPIv1. The MatcherAPIv2 version is tried first, and if it fails, the MatcherAPIv1 version is loaded. MatcherAPIv1 will not match the entire results of a cell: it's limited to a line by line match.

  • Phrase completion: %dql create <tab> will show common phrases, such as CREATE OR REPLACE
  • Pragma completion: %dql PRAGMA <tab> will list available pragma's.
  • Table completion: %dql select * from <tab> will list available tables and Pandas DataFrames. This is triggered by a list of keywords (ie: from) that are expected to be followed. See magic_duckdb.autocomplete.common for the keywords.
  • Column completion: %dql select tablename.<tab> from tablename will list available tables and Pandas DataFrames for tablename.

Notes:

Autocompletion can be disabled with:

from magic_duckdb import magic
magic.ENABLE_AUTOCOMPLETE=False
%load_ext magic_duckdb

Capturing output

Line Magics can be captured with assignment or -o:

# These are equivalent:
%dql -o varname <query>
varname = %dql <query>

Cell magics can only be captured with -o (var = %%dql doesn't work)

%%dql -o varname
<query>

To silence a cell, you can stack %%capture:

%%capture
%%iql -o bqldf
<query>

Versions

Python: 3.9.16 (main, Mar 8 2023, 10:39:24) [MSC v.1916 64 bit (AMD64)] DuckDB: library_version source_id 0 0.8.1-dev51 e84cc1acb8 Pandas : 2.0.1 jupysql 0.7.5

Test Setup

See benchmarking.ipynb for the test code.

Results

name 1 1000 1000000 description
0 test_magicddb_pandas 3.47 4.11 291.0 %dql -t df query
1 test_duckdb_execute_df 3.80 3.39 281.0 con.execute(query).df()
2 test_duckdb_execute_arrow 3.91 4.04 128.0 con.execute(query).arrow()
3 test_magicddb_arrow 4.10 5.38 127.0 %dql -t arrow query
4 test_duckdb_sql_df 6.93 7.94 318.0 con.sql(query).df()
5 test_jupysql 321.00 256.00 547.0 %config SqlMagic.autopandas = True
%sql query

Copyright © 2025 Iqmo Corporation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

magic_duckdb-0.2.4.tar.gz (19.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

magic_duckdb-0.2.4-py3-none-any.whl (21.8 kB view details)

Uploaded Python 3

File details

Details for the file magic_duckdb-0.2.4.tar.gz.

File metadata

  • Download URL: magic_duckdb-0.2.4.tar.gz
  • Upload date:
  • Size: 19.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for magic_duckdb-0.2.4.tar.gz
Algorithm Hash digest
SHA256 09f92ef13de4f6cb3a18d0d320f34ab86ecd6dc29e19f97538fccd4d862d9136
MD5 d24c5ad56ebc76d9eff6cee0aa0883fe
BLAKE2b-256 eb5800ef73bbdb6042e17270cf61afdcfcb27581fe1f5ae58d0f101cfc473f0e

See more details on using hashes here.

Provenance

The following attestation bundles were made for magic_duckdb-0.2.4.tar.gz:

Publisher: python-package.yml on iqmo-org/magic_duckdb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file magic_duckdb-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: magic_duckdb-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 21.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for magic_duckdb-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 208af1b5d94c51da55646b061a18c0fca4caf52795d2b0630f1dbc36c5b6afb9
MD5 2b3835e11f4560099f17ef587ff796fb
BLAKE2b-256 2519dee2844ab4a373eb364d869f51fde0742e692039b54ff54147110bbea338

See more details on using hashes here.

Provenance

The following attestation bundles were made for magic_duckdb-0.2.4-py3-none-any.whl:

Publisher: python-package.yml on iqmo-org/magic_duckdb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page