Skip to main content

Parquet viewer for your terminal.

Project description

ParqInspector

ParqInspector is a Parquet- and Deltatable viewer for your terminal, built with Textual.

ParqInspector can open local or remote Parquet files and delta-tables and lets you view their contents in a table format.

https://github.com/jkausti/parq-inspector/assets/19781820/7ef7657a-0598-4d3e-bab8-3faa8032ff70

👉 Installation

ParqInspector can be installed with pip (or pipx).

$ pip install parq-inspector

👉 Usage

You start ParqInspector simply by running inspector from your terminal.

Local Files

You can also instantly open a local file by using the options --filepath and --row_limit, or their short versions -f and -rl.

$ inspector --filepath ./data/my_data.parquet --row_limit 500

If row limit is not provided, it will get the default value of 200. Be careful, setting the row limit to a very high value might make the app take a long time to start, or it may not start at all depending on the size of your data.

Remote files

Currently, ParqInspector supports reading remote files from Azure Data Lake Storage Gen2, Amazon S3 and Google Cloud Storage. In case your storage service does not support anonymous access, you will need to set environment variables in order to make ParqInspector able to authenticate to the service. Currently, ParqInspector supports the following environment variables:

Azure:
AZURE_STORAGE_ACCOUNT_NAME
AZURE_STORAGE_SAS_KEY
AZURE_STORAGE_ACCOUNT_KEY
AZURE_STORAGE_CLIENT_ID
AZURE_STORAGE_CLIENT_SECRET
AZURE_STORAGE_TENANT_ID

AWS:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_REGION
AWS_DEFAULT_REGION

GCP:
GOOGLE_SERVICE_ACCOUNT
GOOGLE_SERVICE_ACCOUNT_KEY

Depending on your method of authentication, not all of the environment variables need to be set.

Remote files can only be opened through the Settings-pane in the UI. Pick the correct cloud provider and in the Path-field, you simply put the URL to your file instead of a local path. ParqInspector uses polars under the hood to read Parquet files and Delta-tales from remote storage, and the supported protocols and url-variants are determined by what polars supports. See more here.

👉 Roadmap

[✓] - reading local single Parquet files
[✓] - reading remote single Parquet files
[] - Reading Parquet datasets
[✓] - Reading Delta tables


If you encounter any issues, bugs or feel there is a feature missing that would be valuable, please create an issue in this repo!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parq_inspector-0.2.1.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

parq_inspector-0.2.1-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file parq_inspector-0.2.1.tar.gz.

File metadata

  • Download URL: parq_inspector-0.2.1.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.8.10 Linux/5.15.133.1-microsoft-standard-WSL2

File hashes

Hashes for parq_inspector-0.2.1.tar.gz
Algorithm Hash digest
SHA256 99a5543eae66739c97c453ac9ced77ff7b03d29ccf4c474c837cc6d315fda616
MD5 eb1114b82ecc0958f014c3fe84fdc621
BLAKE2b-256 446fc7d562df2411abdefaf96f1e484d3b44958f5443ce425886ce4e770d4e13

See more details on using hashes here.

File details

Details for the file parq_inspector-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: parq_inspector-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.8.10 Linux/5.15.133.1-microsoft-standard-WSL2

File hashes

Hashes for parq_inspector-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 75b3f2f9168f2ba0105b6a5e4b272eb2987befd352db4d3569bb24ed602f83d6
MD5 3977ecfb851f3645308b5c0227828a73
BLAKE2b-256 8ded66708a7437886dcd7309acb8fc622578062c42fde7d1eb8142974ce029e9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page