Skip to main content

Arrow viewer for Jupyter

Project description

Arbalister

Github Actions Status

A JupyterLab extension for viewing tabular data files. Double-click to open Parquet, Avro, ORC, SQLite, and other Arrow-compatible formats directly in JupyterLab without writing code.

A Parquet file opened with Arbalister

Features

Existing:

  • 🗂️ Supported formats: Parquet, CSV, Avro, ORC, SQLite, Arrow IPC
  • Lazy loading: Streams chunks of data on-demand, handles files larger than memory
  • ⏱️ Prefetching: Load next chunk for smooth scrolling
  • ⚙️ Reading options: Interactive toolbar for CSV delimiters, SQLite table selection, etc.
  • 🔌 Extensible: Server extension provides Arrow IPC streams for building custom viewers

Planned (contributions welcome):

  • ☁️ S3 and data lakes: Support for Apache Iceberg, Delta Lake, and other cloud-native table formats over object storage
  • 🌐 Database viewer: Non-file (URL) database viewer
  • 💻 WASM/JupyterLite support: Run Arbalister in the browser without a Python backend
  • 📈 Alternative clients: Custom non default viewers for time-series and geospacial data
  • 🔎 Filters: Search and filter data with ease

Architecture

Data is divided into chunks across rows and columns. The client requests the chunks needed for the current viewport. The server reads the relevant portion using DataFusion and returns it as Arrow IPC stream format. Background pre-fetching ensures smooth scrolling.

Arbalister client-server architecture

This extension is composed of two packages both called arbalister:

  • A Python server extension available on PyPI
  • A TypeScript client extension available on NPM

Requirements

  • JupyterLab >= 4.5.0

Install

To install the extension, execute:

pip install arbalister

Uninstall

To remove the extension, execute:

pip uninstall arbalister

Troubleshoot

If you are seeing the frontend extension, but it is not working, check that the server extension is enabled:

jupyter server extension list

If the server extension is installed and enabled, but you are not seeing the frontend extension, check the frontend extension is installed:

jupyter labextension list

Contributing

Development install

We use the Pixi Conda-compatible environment manager for development. With this single tool, we can get most dependencies, including NodeJS and Python themselves. Head to their site for installation instructions.

Run pixi task list for details on all available tasks.

Only the javascript packages need to be installed. This is managed by the jlpm command, JupyterLab's pinned version of yarn that is installed with JupyterLab in the Pixi file. You may use yarn or npm in lieu of jlpm below.

To install the client-side extension for development, use:

pixi run install-dev

To run the extension in development mode in JupyterLab, run the following with pixi. This will launch jupyter after making sure the extension is installed. You can pass any other command line argument to forward them to JupyterLab.

pixi run serve-dev

You can watch the source directory and run JupyterLab at the same time in different terminals to watch for changes in the extension's source and automatically rebuild the extension.

pixi run jlpm watch

With the watch command running, every saved change will immediately be built locally and available in your running JupyterLab. Refresh JupyterLab to load the change in your browser (you may need to wait several seconds for the extension to be rebuilt).

By default, the pixi run jlpm build command generates the source maps for this extension to make it easier to debug using the browser dev tools. To also generate source maps for the JupyterLab core extensions, you can run the following command:

pixi run jupyter lab build --minimize=False

Testing the extension

Server tests

This extension is using Pytest for Python code testing. With pixi, simply run (pass any other command line argument to forward them to Pytest):

pixi run test-pytest

Frontend tests

This extension is using Jest for JavaScript code testing. With pixi, simply run (pass any other command line argument to forward them to Jest):

pixi run test-jest

Integration tests

This extension uses Playwright for the integration tests (aka user level tests). More precisely, the JupyterLab helper Galata is used to handle testing the extension in JupyterLab.

More information are provided within the ui-tests README.

Running code formatters and checks

To run all code formatters, use Pixi:

pixi run fmt

Similarly for the code checks:

pixi run check

Packaging the extension

See RELEASE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arbalister-0.2.1.tar.gz (883.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arbalister-0.2.1-py3-none-any.whl (97.4 kB view details)

Uploaded Python 3

File details

Details for the file arbalister-0.2.1.tar.gz.

File metadata

  • Download URL: arbalister-0.2.1.tar.gz
  • Upload date:
  • Size: 883.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for arbalister-0.2.1.tar.gz
Algorithm Hash digest
SHA256 eccbf3abeede086cef7ae5582bf0c2dab12634fdb99e332435846d0c240aee94
MD5 9baa65fa33f7f37b8e43d9b1ff195375
BLAKE2b-256 1705e83b9d87822f4f07e8c9885d316dca5b702388099197f9f9b187c3d21618

See more details on using hashes here.

File details

Details for the file arbalister-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: arbalister-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 97.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for arbalister-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9ad9969d2d5fe33da85eb7c04db267c6382284929e995d6bb51f7e47b0826f45
MD5 6be9a007a584b89fc90c2470ad099026
BLAKE2b-256 5fe20a2586f9f107938b931fc45accbfb0ad1be247b8718d5c5871a9cbb76724

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page