Skip to main content

A powerful command-line tool for viewing Parquet files

Project description

Parquet Viewer

A powerful command-line tool for viewing, analyzing, and manipulating Parquet files with ease.

Features

  • 📊 View Parquet files in various table formats
  • 📤 Export to different formats (CSV, Excel, JSON, HTML)
  • 📈 Display dataset statistics and summaries
  • 🔍 Filter and sort data
  • 📉 Analyze correlations and missing values
  • 🎲 Sample data randomly
  • 💾 Memory-efficient handling of large files
  • 🎨 Multiple display format options

Installation

pip install parquet-viewer

Usage

Basic Commands

View Parquet File

# Basic viewing
pqview view data.parquet

# Customize display
pqview view data.parquet --max-rows 20 --format github
pqview view data.parquet -n 50 -f pretty --no-stats

Export to Other Formats

# Export to CSV
pqview export data.parquet output.csv

# Export to other formats
pqview export data.parquet output.xlsx --format excel
pqview export data.parquet output.json --format json
pqview export data.parquet output.html --format html

Analysis Commands

Summary Statistics

# Show summary statistics for numerical columns
pqview stats data.parquet

Value Counts

# Show value counts for a specific column
pqview counts data.parquet column_name

Missing Values Analysis

# Show statistics about missing values
pqview missing data.parquet

Correlation Analysis

# Show correlation matrix
pqview correlations data.parquet

# Use different correlation methods
pqview correlations data.parquet --method spearman

Data Manipulation Commands

Filter Data

# Filter data using pandas query syntax
pqview filter data.parquet "age > 25 and department == 'IT'"

Sort Data

# Sort by single column
pqview sort data.parquet "salary"

# Sort by multiple columns
pqview sort data.parquet "department,salary" --descending

Sample Data

# Sample specific number of rows
pqview sample data.parquet --rows 100

# Sample by fraction
pqview sample data.parquet --fraction 0.1 --seed 42

Display Formats

The tool supports various display formats for tables:

Format Description
grid ASCII grid table
pipe Markdown-compatible table
orgtbl Org-mode table
github GitHub-flavored Markdown table
pretty Pretty printed table
html HTML table
latex LaTeX table

Export Formats

Supported export formats:

  • CSV
  • Excel
  • JSON
  • HTML

File Size Limits

By default, the tool has a 5MB file size limit to prevent memory issues. This can be adjusted in the configuration.

Error Handling

The tool provides clear error messages for common issues:

  • File not found
  • Invalid file format
  • Memory limitations
  • Invalid query syntax
  • Data type conversion errors

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License

Author

[Your Name]

Changelog

v0.1.0

  • Initial release
  • Basic viewing and export functionality
  • Statistical analysis features
  • Data manipulation capabilities

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parquet_viewer-0.1.1.tar.gz (3.4 kB view details)

Uploaded Source

Built Distribution

parquet_viewer-0.1.1-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file parquet_viewer-0.1.1.tar.gz.

File metadata

  • Download URL: parquet_viewer-0.1.1.tar.gz
  • Upload date:
  • Size: 3.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.7

File hashes

Hashes for parquet_viewer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0ac004c20c3336ddd1e0f03b85ba79ab4fb61c92c299ed70fb329f81a598ac8a
MD5 38bb08b85ee720699d9b9842fcd121d8
BLAKE2b-256 a740efcf67ce8f0dbe9b9ba0e32fce24ad527b1c094ef741d0615c2314b10618

See more details on using hashes here.

File details

Details for the file parquet_viewer-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for parquet_viewer-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5582d46ff7caa9c7663e60c99d1fd455fa8993a9bb83bd72e1bc4b5ae856b2a3
MD5 45f2109176bc8d70badb2e051df856f8
BLAKE2b-256 f9c9517d0e056bef1ee84d6981dfb0c00e601b12c326f755988478c314bf749a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page