Skip to main content

Parquet viewer for your terminal.

Project description

ParqInspector

ParqInspector is a Parquet- and Deltatable viewer for your terminal, built with Textual.

ParqInspector can open local or remote Parquet files and delta-tables and lets you view their contents in a table format.

https://github.com/jkausti/parq-inspector/assets/19781820/7ef7657a-0598-4d3e-bab8-3faa8032ff70

👉 Installation

ParqInspector can be installed with pip (or pipx).

$ pip install parq-inspector

👉 Usage

You start ParqInspector simply by running inspector from your terminal.

Local Files

You can also instantly open a local file by using the options --filepath and --row_limit, or their short versions -f and -rl.

$ inspector --filepath ./data/my_data.parquet --row_limit 500

If row limit is not provided, it will get the default value of 200. Be careful, setting the row limit to a very high value might make the app take a long time to start, or it may not start at all depending on the size of your data.

Remote files

Currently, ParqInspector supports reading remote files from Azure Data Lake Storage Gen2, Amazon S3 and Google Cloud Storage. In case your storage service does not support anonymous access, you will need to set environment variables in order to make ParqInspector able to authenticate to the service. Currently, ParqInspector supports the following environment variables:

Azure:
AZURE_STORAGE_ACCOUNT_NAME
AZURE_STORAGE_SAS_KEY
AZURE_STORAGE_ACCOUNT_KEY
AZURE_STORAGE_CLIENT_ID
AZURE_STORAGE_CLIENT_SECRET
AZURE_STORAGE_TENANT_ID

AWS:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_REGION
AWS_DEFAULT_REGION

GCP:
GOOGLE_SERVICE_ACCOUNT
GOOGLE_SERVICE_ACCOUNT_KEY

Depending on your method of authentication, not all of the environment variables need to be set.

Remote files can only be opened through the Settings-pane in the UI. Pick the correct cloud provider and in the Path-field, you simply put the URL to your file instead of a local path. ParqInspector uses polars under the hood to read Parquet files and Delta-tales from remote storage, and the supported protocols and url-variants are determined by what polars supports. See more here.

👉 Roadmap

[✓] - reading local single Parquet files
[✓] - reading remote single Parquet files
[] - Reading Parquet datasets
[✓] - Reading Delta tables


If you encounter any issues, bugs or feel there is a feature missing that would be valuable, please create an issue in this repo!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parq_inspector-0.2.1.tar.gz (6.2 kB view hashes)

Uploaded Source

Built Distribution

parq_inspector-0.2.1-py3-none-any.whl (8.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page