Skip to main content

A powerful command-line tool for viewing Parquet files

Project description

Parquet Viewer

A powerful command-line tool for viewing, analyzing, and manipulating Parquet files with ease.

Features

  • 📊 View Parquet files in various table formats
  • 📤 Export to different formats (CSV, Excel, JSON, HTML)
  • 📈 Display dataset statistics and summaries
  • 🔍 Filter and sort data
  • 📉 Analyze correlations and missing values
  • 🎲 Sample data randomly
  • 💾 Memory-efficient handling of large files
  • 🎨 Multiple display format options

Installation

pip install parquet-viewer

Usage

Basic Commands

View Parquet File

# Basic viewing
pqview view data.parquet

# Customize display
pqview view data.parquet --max-rows 20 --format github
pqview view data.parquet -n 50 -f pretty --no-stats

Export to Other Formats

# Export to CSV
pqview export data.parquet output.csv

# Export to other formats
pqview export data.parquet output.xlsx --format excel
pqview export data.parquet output.json --format json
pqview export data.parquet output.html --format html

Analysis Commands

Summary Statistics

# Show summary statistics for numerical columns
pqview stats data.parquet

Value Counts

# Show value counts for a specific column
pqview counts data.parquet column_name

Missing Values Analysis

# Show statistics about missing values
pqview missing data.parquet

Correlation Analysis

# Show correlation matrix
pqview correlations data.parquet

# Use different correlation methods
pqview correlations data.parquet --method spearman

Data Manipulation Commands

Filter Data

# Filter data using pandas query syntax
pqview filter data.parquet "age > 25 and department == 'IT'"

Sort Data

# Sort by single column
pqview sort data.parquet "salary"

# Sort by multiple columns
pqview sort data.parquet "department,salary" --descending

Sample Data

# Sample specific number of rows
pqview sample data.parquet --rows 100

# Sample by fraction
pqview sample data.parquet --fraction 0.1 --seed 42

Display Formats

The tool supports various display formats for tables:

Format Description
grid ASCII grid table
pipe Markdown-compatible table
orgtbl Org-mode table
github GitHub-flavored Markdown table
pretty Pretty printed table
html HTML table
latex LaTeX table

Export Formats

Supported export formats:

  • CSV
  • Excel
  • JSON
  • HTML

File Size Limits

By default, the tool has a 5MB file size limit to prevent memory issues. This can be adjusted in the configuration.

Error Handling

The tool provides clear error messages for common issues:

  • File not found
  • Invalid file format
  • Memory limitations
  • Invalid query syntax
  • Data type conversion errors

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License

Author

[Your Name]

Changelog

v0.1.0

  • Initial release
  • Basic viewing and export functionality
  • Statistical analysis features
  • Data manipulation capabilities

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parquet-viewer-0.1.0.tar.gz (3.4 kB view details)

Uploaded Source

Built Distribution

parquet_viewer-0.1.0-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file parquet-viewer-0.1.0.tar.gz.

File metadata

  • Download URL: parquet-viewer-0.1.0.tar.gz
  • Upload date:
  • Size: 3.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.7

File hashes

Hashes for parquet-viewer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9628eb3ed311acb100d9a919d404e8687544ef6dc0122c97ab3406384d3dc845
MD5 3e533355adf83905554dd3207a0f19b0
BLAKE2b-256 c9960533db2f36d3596685512bb3650fe0bf01cf443eb015365d514084e1ac39

See more details on using hashes here.

File details

Details for the file parquet_viewer-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for parquet_viewer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0a38b71e784421fb46c86f9015fb5e60d6628db620046d19b9995d62b42870c7
MD5 e394453e462d0df22f40e0f08f09187b
BLAKE2b-256 7ddb98e482de867aa83f5d949c10a6db8f390068e1e81199307680664e60035a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page