Skip to main content

A powerful command-line tool for viewing Parquet files

Project description

Parquet Viewer

A powerful command-line tool for viewing, analyzing, and manipulating Parquet files with ease.

Features

  • 📊 View Parquet files in various table formats
  • 📤 Export to different formats (CSV, Excel, JSON, HTML)
  • 📈 Display dataset statistics and summaries
  • 🔍 Filter and sort data
  • 📉 Analyze correlations and missing values
  • 🎲 Sample data randomly
  • 💾 Memory-efficient handling of large files
  • 🎨 Multiple display format options

Installation

pip install parquet-viewer

Usage

Basic Commands

View Parquet File

# Basic viewing
pqview view data.parquet

# Customize display
pqview view data.parquet --max-rows 20 --format github
pqview view data.parquet -n 50 -f pretty --no-stats

Export to Other Formats

# Export to CSV
pqview export data.parquet output.csv

# Export to other formats
pqview export data.parquet output.xlsx --format excel
pqview export data.parquet output.json --format json
pqview export data.parquet output.html --format html

Analysis Commands

Summary Statistics

# Show summary statistics for numerical columns
pqview stats data.parquet

Value Counts

# Show value counts for a specific column
pqview counts data.parquet column_name

Missing Values Analysis

# Show statistics about missing values
pqview missing data.parquet

Correlation Analysis

# Show correlation matrix
pqview correlations data.parquet

# Use different correlation methods
pqview correlations data.parquet --method spearman

Data Manipulation Commands

Filter Data

# Filter data using pandas query syntax
pqview filter data.parquet "age > 25 and department == 'IT'"

Sort Data

# Sort by single column
pqview sort data.parquet "salary"

# Sort by multiple columns
pqview sort data.parquet "department,salary" --descending

Sample Data

# Sample specific number of rows
pqview sample data.parquet --rows 100

# Sample by fraction
pqview sample data.parquet --fraction 0.1 --seed 42

Display Formats

The tool supports various display formats for tables:

Format Description
grid ASCII grid table
pipe Markdown-compatible table
orgtbl Org-mode table
github GitHub-flavored Markdown table
pretty Pretty printed table
html HTML table
latex LaTeX table

Export Formats

Supported export formats:

  • CSV
  • Excel
  • JSON
  • HTML

File Size Limits

By default, the tool has a 5MB file size limit to prevent memory issues. This can be adjusted in the configuration.

Error Handling

The tool provides clear error messages for common issues:

  • File not found
  • Invalid file format
  • Memory limitations
  • Invalid query syntax
  • Data type conversion errors

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License

Author

Ashutosh Bele

Changelog

v0.1.0

  • Initial release
  • Basic viewing and export functionality
  • Statistical analysis features
  • Data manipulation capabilities

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parquet_viewer-0.1.2.tar.gz (3.4 kB view details)

Uploaded Source

Built Distribution

parquet_viewer-0.1.2-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file parquet_viewer-0.1.2.tar.gz.

File metadata

  • Download URL: parquet_viewer-0.1.2.tar.gz
  • Upload date:
  • Size: 3.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.7

File hashes

Hashes for parquet_viewer-0.1.2.tar.gz
Algorithm Hash digest
SHA256 029e5cad938ac663d89a17fabea041cefc611ec1f206ebeb396b622430feb1fe
MD5 677794bdc9473cfdfec6efbad41c8d4e
BLAKE2b-256 5a7e9f351920e76f8f8e88b2c462733ec7ce2531811f5e03836410946d204550

See more details on using hashes here.

File details

Details for the file parquet_viewer-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for parquet_viewer-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 910bdf4ba1598a8df8edc9d2c3fc7a32520884a20cae97d555117ecc69ef7ee7
MD5 7056bba52e17ccb0fd36aa23f990f7e8
BLAKE2b-256 6ecf3488bdac00e3f88070deb1223b8bdc204c13254604cd603ff8b7212057da

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page