A command-line tool for querying and manipulating Parquet datasets

These details have not been verified by PyPI

Project links

Project description

ParQL 🦆

A powerful command-line tool for querying and manipulating Parquet datasets directly from the terminal.

ParQL brings pandas-like operations and SQL capabilities to the command line, powered by DuckDB. Query, analyze, visualize, and transform Parquet data instantly without writing scripts or loading into memory. Perfect for data exploration, ETL pipelines, and data quality checks.

🚀 Key Features

25+ Commands - Complete data analysis toolkit from the CLI
Interactive Shell - REPL mode for exploratory data analysis
Built-in Visualizations - ASCII charts and plots in your terminal
Advanced Analytics - Correlations, profiling, percentiles, outliers
String Processing - Text manipulation and pattern matching
Cloud Storage - Native S3, GCS, Azure, and HTTP support
Smart Caching - Automatic query result caching for performance
Data Quality - Validation, assertions, and schema comparison
Multiple Formats - Output to CSV, JSON, Parquet, Markdown

🚀 Quick Start

Installation

# Install from PyPI (when published)
pip install parql

# Or install from source
git clone https://github.com/abdulrafey38/parql.git
cd parql
pip install -e .

Basic Usage

# Preview data
parql head data/sales.parquet -n 10

# Data analysis
parql profile data/sales.parquet
parql corr data/sales.parquet -c "quantity,price,revenue"

# Filtering and aggregation  
parql select data/sales.parquet -w "revenue > 1000" -c "country,revenue"
parql agg data/sales.parquet -g "country" -a "sum(revenue):total,count():orders"

# Visualizations
parql plot data/sales.parquet -c revenue --chart-type hist --bins 20

# Interactive exploration
parql shell
parql> \l data/sales.parquet sales
parql> SELECT country, SUM(revenue) FROM sales GROUP BY country;

# Export results
parql write data/sales.parquet output.csv --format csv -w "country='US'"

Complete Documentation

📖 View Live Documentation - Beautiful, interactive documentation with examples

📖 Commands Reference - Complete command reference with examples

📖 DOCUMENTATION.md - Markdown documentation for offline reference

📊 Command Categories

🔍 Data Exploration

head, tail, schema, sample - Quick data inspection
profile - Comprehensive data quality reports
corr - Correlation analysis between columns
percentiles - Detailed percentile statistics

📈 Analytics & Aggregation

agg - Group by and aggregate operations
window - Window functions (ranking, moving averages)
pivot - Pivot tables and data reshaping
sql - Custom SQL queries with full DuckDB support

🔧 Data Processing

select - Filter rows and select columns
join - Multi-table joins with various strategies
str - String manipulation and text processing
pattern - Advanced pattern matching with regex

📊 Visualization & Quality

plot - ASCII charts (histograms, bar charts, scatter plots)
assert - Data validation and quality checks
outliers - Statistical outlier detection
nulls - Missing value analysis

🖥️ System & Productivity

shell - Interactive REPL mode for exploration
config - Profile and settings management
cache - Query result caching for performance
write - Export to multiple formats

💡 Quick Examples

Data Exploration

# Get a quick overview
parql head data/sales.parquet -n 5
parql schema data/sales.parquet
parql profile data/sales.parquet

# Statistical analysis
parql corr data/sales.parquet -c "quantity,price,revenue"
parql percentiles data/sales.parquet -c "revenue"

Data Analysis

# Aggregations and grouping
parql agg data/sales.parquet -g "country" -a "sum(revenue):total,count():orders"

# Window functions
parql window data/sales.parquet --partition "user_id" --order "timestamp" --expr "row_number() as rank"

# SQL queries
parql sql "SELECT country, SUM(revenue) FROM t GROUP BY country ORDER BY 2 DESC" -p t=data/sales.parquet

Visualizations

# Charts in your terminal
parql plot data/sales.parquet -c revenue --chart-type hist --bins 20
parql plot data/sales.parquet -c country --chart-type bar

Interactive Mode

parql shell
parql> \l data/sales.parquet sales
parql> \l data/users.parquet users  
parql> SELECT u.country, AVG(s.revenue) FROM users u JOIN sales s ON u.user_id = s.user_id GROUP BY u.country;

🌐 Remote Data Sources

ParQL works with data anywhere:

# AWS S3
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret  
parql head s3://bucket/path/data.parquet

# Google Cloud Storage
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
parql agg gs://bucket/data/*.parquet -g region -a "count():total"

# Public GCS Datasets
parql head gs://anonymous@voltrondata-labs-datasets/diamonds/cut=Good/part-0.parquet
parql agg gs://anonymous@voltrondata-labs-datasets/diamonds/cut=Good/part-0.parquet -g color -a "avg(price):avg_price"

# Azure Blob Storage
export AZURE_STORAGE_ACCOUNT=your_account
export AZURE_STORAGE_KEY=your_key

# Azure Data Lake Storage (Gen2)
parql head abfs://container@account.dfs.core.windows.net/path/data.parquet

# Azure Blob Storage (Hadoop-style)
parql head wasbs://container@account.blob.core.windows.net/path/data.parquet

# Public Azure files via HTTPS
parql head https://account.blob.core.windows.net/container/path/data.parquet

# HDFS (Hadoop Distributed File System)
export HDFS_NAMENODE=localhost
export HDFS_PORT=9000
parql head hdfs://localhost/tmp/save/part-r-00000-6a3ccfae-5eb9-4a88-8ce8-b11b2644d5de.gz.parquet

# HTTP/HTTPS
parql head https://example.com/data.parquet

# Multiple files and glob patterns
parql head "data/2024/*.parquet" -n 10
parql agg "data/sales/year=*/month=*/*.parquet" -g year,month

🎯 Why ParQL?

Before ParQL

# Traditional approach - slow, memory intensive
import pandas as pd
df = pd.read_parquet("large_file.parquet")  # Load entire file
result = df[df['revenue'] > 1000].groupby('country')['revenue'].sum()
print(result)

With ParQL

# Fast, memory efficient, one command
parql agg data.parquet -g country -a "sum(revenue):total" -w "revenue > 1000"

📈 Performance

Columnar Processing - Only reads necessary columns
Parallel Execution - Multi-threaded operations
Memory Efficient - Streams large datasets
Cloud Optimized - Predicate pushdown for remote data

🛠️ Development

# Clone and setup
git clone https://github.com/abdulrafey38/parql.git
cd parql
python -m venv .env
source .env/bin/activate
pip install -e .

# Run tests
pytest tests/

# Check all features
parql --help

📝 License

MIT License - see LICENSE file for details.

🙏 Built With

DuckDB - High-performance analytical database
Rich - Beautiful terminal output
Click - Command-line interface framework

⭐ If ParQL helps you, please star this repo! ⭐

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.4

Aug 29, 2025

1.0.3

Aug 27, 2025

1.0.2

Aug 27, 2025

1.0.1

Aug 21, 2025

1.0.0

Aug 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parql-1.0.4.tar.gz (39.5 kB view details)

Uploaded Aug 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

parql-1.0.4-py3-none-any.whl (37.9 kB view details)

Uploaded Aug 29, 2025 Python 3

File details

Details for the file parql-1.0.4.tar.gz.

File metadata

Download URL: parql-1.0.4.tar.gz
Upload date: Aug 29, 2025
Size: 39.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for parql-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`4aa923510ed1987e78323929247dcd2306674d1956e3d478cad9b680901fb3e1`
MD5	`907b9b57610ec99610dee7def83864f4`
BLAKE2b-256	`aabaa9cbd258b6ba19e5a3f37beb3abebca80774902debab784cbd14908708b4`

See more details on using hashes here.

File details

Details for the file parql-1.0.4-py3-none-any.whl.

File metadata

Download URL: parql-1.0.4-py3-none-any.whl
Upload date: Aug 29, 2025
Size: 37.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for parql-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e1953b30214d1506a8defb28ae654e59eb49fbaec3b0100c03f58ba06f9c319d`
MD5	`2eed34a12ac4a2c0a336062522bac005`
BLAKE2b-256	`a62c13402e135be90257f6d1e1c38243cf773bae2a9e955e3aa7298c8845885e`

See more details on using hashes here.

parql 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ParQL 🦆

🚀 Key Features

🚀 Quick Start

Installation

Basic Usage

Complete Documentation

📊 Command Categories

🔍 Data Exploration

📈 Analytics & Aggregation

🔧 Data Processing

📊 Visualization & Quality

🖥️ System & Productivity

💡 Quick Examples

Data Exploration

Data Analysis

Visualizations

Interactive Mode

🌐 Remote Data Sources

🎯 Why ParQL?

Before ParQL

With ParQL

📈 Performance

🛠️ Development

📝 License

🙏 Built With

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes