Testing
Project description
Experimental
This is experimental and unstable.
Pyodide + DuckDB
This is a proof of concept at executing duckdb_wasm from a Pyodide kernel. This unlocks a few paths for using duckdb, such as PyScript & JupyterLite.
** The project should probably be called Pyoduckwasm or something like that... it started with JupyterLite as the end goal.
Demonstration:
-
import micropip; await micropip.install('pandas'); await micropip.install('jupylite-duckdb'); import jupylite_duckdb as jd; conn = await jd.connect(); r1 = await jd.query("pragma version", conn); r2 = await jd.query("create or replace table xyz as select * from 'https://raw.githubusercontent.com/Teradata/kylo/master/samples/sample-data/parquet/userdata2.parquet'", conn); r3 = await jd.query("select gender, count(*) as c from xyz group by gender", conn); print(r1); print(r2); print(r3);
-
JupyerLite: Open a JupyterLite site, and use the examples from =notebooks
-
JupyterLite Code Console REPL
Note: reloading seems somewhat unreliable with pyodide. CTRL-F5 works more reliably.
Limitations:
- API: duckdb.connect() and duckdb.query()
- DataFrames are not (yet) registered in the DuckDB database.
- Data is copied from the duckdb_wasm arrow result to a python list[dict], and then to a dataframe. PyArrow is not available (yet) in Pyodide.
Observations:
- It takes about a minute to run the JupyterLite examples. Most of this time is prior to any DuckDB stuff. Some of this time could be shaved off with a custom pyodide build, but PyScript is much faster.
- JupyterLite was unreliable with page reloads, I ended up having to clear the cache a lot.
- Not thrilled with PyScript removing the top level await... will probably just auto-wrap it (like ipython %autoawait)
Demonstration
Code Console REPL Example
jupyterlite_duckdb_wasm
Python wrapper to run DuckDB_WASM within JupyterLite with a Pyodide Kernel See notebooks for example of running this within jupyterlite
Cell Magic %%dql
Following the example of magic_duckdb, there's an initial proof of concept for a duckdb for JupyterLite. See Magic Example
Pyodide Console
import micropip;
await micropip.install('pandas');
await micropip.install('jupylite-duckdb');
import jupylite_duckdb as jd;
conn = await jd.connect();
r1 = await jd.query("pragma version", conn);
r2 = await jd.query("create or replace table xyz as select * from 'https://raw.githubusercontent.com/Teradata/kylo/master/samples/sample-data/parquet/userdata2.parquet'", conn);
r3 = await jd.query("select gender, count(*) as c from xyz group by gender", conn);
print(r1);
print(r2);
print(r3);
Various Issues, Todos and Ideas
- Move examples into our hosted jupyterlite
- Implement a proof of concept version of dataframe registration
- Evaluate startup time reduction. Probably will never do this, given PyScript.
- Handling errors: detect and display errors in Jupyter: too much sfuff buried in console, such as CORS errors
- invalidate pip browser cache (as/if needed); annoying for development purposes
- think through async/await/transform_cell approach and whether there's a better solution.
- Zero copy data exchange (js/duckdb arrow -> python/dataframe and python/df -> js/duckdb): Blocked by Pyarrow support
- If you're adding local .py files, use importlib.invalidate_caches(). Even then, it was flaky to import.
- Careful with caching... %pip install will pull from browser cache. I had to clear frequently within dev tools
- To clear local storage, which is annoyingly persistent, https://superuser.com/questions/519628/clear-html5-local-storage-on-a-specific-page
- %autoawait is part of why this works in notebooks, which is enabled by default. The %%dql cell magic patches transform-cell to push an await into the cell transformation.: https://ipython.readthedocs.io/en/stable/interactive/autoawait.html
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for jupylite_duckdb-0.0.18a4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f28756a66ee25ef1502ef83a8108926ffe4827f9e276caaf461429a5aadadd58 |
|
MD5 | 64f0d4f4c036a10b6adc8d968f4234b4 |
|
BLAKE2b-256 | a02d3c002caad18b8c97b959b4b3519bd745197bf89e3723c2662ed3ca9c43cc |