Skip to main content

A powerful command-line tool for viewing Parquet files

Project description

Parquet Viewer

A powerful command-line tool for viewing, analyzing, and manipulating Parquet files with ease.

Features

  • 📊 View Parquet files in various table formats
  • 📤 Export to different formats (CSV, Excel, JSON, HTML)
  • 📈 Display dataset statistics and summaries
  • 🔍 Filter and sort data
  • 📉 Analyze correlations and missing values
  • 🎲 Sample data randomly
  • 💾 Memory-efficient handling of large files
  • 🎨 Multiple display format options

Installation

pip install parquet-viewer

Usage

Basic Commands

View Parquet File

# Basic viewing
pqview view data.parquet

# Customize display
pqview view data.parquet --max-rows 20 --format github
pqview view data.parquet -n 50 -f pretty --no-stats

Export to Other Formats

# Export to CSV
pqview export data.parquet output.csv

# Export to other formats
pqview export data.parquet output.xlsx --format excel
pqview export data.parquet output.json --format json
pqview export data.parquet output.html --format html

Analysis Commands

Summary Statistics

# Show summary statistics for numerical columns
pqview stats data.parquet

Value Counts

# Show value counts for a specific column
pqview counts data.parquet column_name

Missing Values Analysis

# Show statistics about missing values
pqview missing data.parquet

Correlation Analysis

# Show correlation matrix
pqview correlations data.parquet

# Use different correlation methods
pqview correlations data.parquet --method spearman

Data Manipulation Commands

Filter Data

# Filter data using pandas query syntax
pqview filter data.parquet "age > 25 and department == 'IT'"

Sort Data

# Sort by single column
pqview sort data.parquet "salary"

# Sort by multiple columns
pqview sort data.parquet "department,salary" --descending

Sample Data

# Sample specific number of rows
pqview sample data.parquet --rows 100

# Sample by fraction
pqview sample data.parquet --fraction 0.1 --seed 42

Display Formats

The tool supports various display formats for tables:

Format Description
grid ASCII grid table
pipe Markdown-compatible table
orgtbl Org-mode table
github GitHub-flavored Markdown table
pretty Pretty printed table
html HTML table
latex LaTeX table

Export Formats

Supported export formats:

  • CSV
  • Excel
  • JSON
  • HTML

File Size Limits

By default, the tool has a 5MB file size limit to prevent memory issues. This can be adjusted in the configuration.

Error Handling

The tool provides clear error messages for common issues:

  • File not found
  • Invalid file format
  • Memory limitations
  • Invalid query syntax
  • Data type conversion errors

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License

Author

Ashutosh Bele

Changelog

v0.1.0

  • Initial release
  • Basic viewing and export functionality
  • Statistical analysis features
  • Data manipulation capabilities

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parquet_viewer-0.1.3.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

parquet_viewer-0.1.3-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file parquet_viewer-0.1.3.tar.gz.

File metadata

  • Download URL: parquet_viewer-0.1.3.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.7

File hashes

Hashes for parquet_viewer-0.1.3.tar.gz
Algorithm Hash digest
SHA256 f30ca89cadf4161e7eee4e1bf043c35d30960b23a43b3cb70214a3ac615642d2
MD5 3eac7be2a2d10e76035b3a4b16d5ad3c
BLAKE2b-256 f9025d5594ed8d208a56c2511b49d7a5d53e3c870ab084b73f3e8f5dab7a7142

See more details on using hashes here.

File details

Details for the file parquet_viewer-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for parquet_viewer-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e183426cbd388fdd956bd04c3724b56ebf9573956eb07a24556c712a6f32913a
MD5 dba7b24717bbe948ab6113d0b50aeb50
BLAKE2b-256 c56c85bda3205358e9e5f3f629d065dfb312f10c5c42e532f0b21f2d66ef0a10

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page