A data analysis cli tool using polars lazyframes
Project description
pldatacli
A simple command-line tool for quick CSV data analysis using Polars, with lazy execution for efficiency.
Tech Stack
- Polars – fast DataFrame engine with lazy execution for efficient data processing
- Typer – modern CLI framework for building command-line interfaces
- Rich – beautiful terminal rendering for clean table output
PyPi Repository
Check the Repository on PyPI - https://pypi.org/project/pldatacli/
Installation
- Option 1: Directly with pip
pip install pldatacli
- Option 2: with uv package manager (Requires uv to be installed)
uv tool install pldatacli
- Option 3: with homebrew tap (Requires homebrew installed)
brew tap utkarshg1/pldatacli
brew install pldatacli
Commands
| Command | Description |
|---|---|
query |
Filter, aggregate, sort, and explore a single file |
schema |
Inspect columns, dtypes, and null counts |
run |
Execute a multi-step pipeline defined in a YAML file |
sql |
Run arbitrary SQL queries against one or more files |
pivot |
Create pivot tables with flexible aggregations |
init |
Generate a boilerplate YAML pipeline file |
Usage
query — Exploratory Analysis
pldatacli query FILE [OPTIONS]
Example file:
SampleSuperstore.csv
Filter rows
Single filter:
pldatacli query SampleSuperstore.csv \
--filter "State:Texas"
Multiple filters:
pldatacli query SampleSuperstore.csv \
--filter "State:Texas" \
--filter "Category:Furniture"
Numeric filter:
pldatacli query SampleSuperstore.csv \
--filter "Sales > 500"
Truncate date column
Truncate a date column into a period-based group. Format: col:period
Supported periods: year, quarter, month, week, day
This creates a new derived column named <col>_<period> that can be used in --groupby.
By month:
pldatacli query SampleSuperstore.csv \
--truncate "Order Date:month"
By quarter:
pldatacli query SampleSuperstore.csv \
--truncate "Order Date:quarter"
By year:
pldatacli query SampleSuperstore.csv \
--truncate "Order Date:year"
Group by columns
Single column:
pldatacli query SampleSuperstore.csv \
--groupby Region
Multiple columns:
pldatacli query SampleSuperstore.csv \
--groupby Region \
--groupby Category
Group by truncated date column:
pldatacli query SampleSuperstore.csv \
--truncate "Order Date:month" \
--groupby "Order Date_month"
Aggregations
Single aggregation:
pldatacli query SampleSuperstore.csv \
--groupby Region \
--agg "Profit:sum"
Multiple aggregations on one column:
pldatacli query SampleSuperstore.csv \
--groupby Region \
--agg "Profit:sum,mean"
Multiple columns with aggregations:
pldatacli query SampleSuperstore.csv \
--groupby Region \
--groupby Category \
--agg "Sales:sum,mean" \
--agg "Profit:sum"
Sorting
Single sort:
pldatacli query SampleSuperstore.csv \
--groupby Region \
--agg "Profit:sum" \
--sort "Profit_sum:desc"
Multiple sorts:
pldatacli query SampleSuperstore.csv \
--groupby Region \
--agg "Profit:sum" \
--sort "Region:asc" \
--sort "Profit_sum:desc"
Rounding results
Round float columns to 2 digits:
pldatacli query SampleSuperstore.csv \
--groupby Region \
--agg "Profit:mean" \
--round 2
Custom rounding:
pldatacli query SampleSuperstore.csv \
--groupby Region \
--agg "Profit:mean" \
--round 4
Limiting rows
Head:
pldatacli query SampleSuperstore.csv \
--head 5
Tail:
pldatacli query SampleSuperstore.csv \
--tail 10
Save output to file
Save results as CSV:
pldatacli query SampleSuperstore.csv \
--groupby Region \
--agg "Profit:sum" \
--output result.csv
Save results as Parquet:
pldatacli query SampleSuperstore.csv \
--groupby Region \
--agg "Profit:sum" \
--output result.parquet
⚡ Tip: Results are always printed to the terminal and saved to file simultaneously.
Full query example
pldatacli query SampleSuperstore.csv \
--filter "Region=West" \
--truncate "Order Date:month" \
--groupby "Order Date_month" \
--groupby Category \
--agg "Profit:sum,mean" \
--agg "Sales:sum" \
--sort "Profit_sum:desc" \
--head 10 \
--round 2 \
--output monthly_west.csv
schema — Inspect File Structure
Get columns, dtypes, and null counts without processing the full dataset:
pldatacli schema SampleSuperstore.csv
Example output:
LazyFrame Schema
┏━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┓
┃ Column ┃ Dtype ┃ Nulls ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━┩
│ Ship Mode │ String │ 0 │
│ Segment │ String │ 0 │
│ Country │ String │ 0 │
│ City │ String │ 0 │
│ State │ String │ 0 │
│ Postal Code │ Int64 │ 0 │
│ Region │ String │ 0 │
│ Category │ String │ 0 │
│ Sub-Category │ String │ 0 │
│ Sales │ Float64 │ 0 │
│ Quantity │ Int64 │ 0 │
│ Discount │ Float64 │ 0 │
│ Profit │ Float64 │ 0 │
└──────────────┴─────────┴───────┘
Rows: 9994, Columns: 13
⚡ Tip: Use
schemabefore running queries to quickly inspect columns, types, and missing values.
run — YAML Pipelines
Execute a reusable, multi-step analysis defined in a YAML file:
pldatacli run query.yaml
Option 1 — Aggregation pipeline (query.yaml):
file: Superstore.csv
filter:
- "Region=West"
truncate: "Order Date:month"
groupby:
- "Order Date_month"
- Category
agg:
- "Profit:sum,mean"
- "Sales:sum"
sort:
- "Profit_sum:desc"
head: 10
round: 2
output: monthly_west.csv
Option 2 — Pivot pipeline (pivot.yaml):
file: Superstore.csv
truncate: "Order Date:month"
pivot:
column: "Region"
index: "Order Date_month"
values: "Profit"
aggregate: "sum"
round: 2
output: pivot_result.csv
indexaccepts a single string or a list of strings for multi-index pivots.
⚡ Tip: Store your YAML query files in version control alongside your data pipelines for reproducible analysis.
sql — Ad-hoc SQL Queries
Run arbitrary Polars SQL queries against one or more files.
pldatacli sql FILE [FILES...] [OPTIONS]
The first file is always registered as the table data. Additional files are registered using their filename stem (lowercased, sanitized).
Single file query
pldatacli sql Superstore.csv \
-q "SELECT Category, SUM(Profit) AS total_profit FROM data GROUP BY Category ORDER BY total_profit DESC"
Multi-file JOIN
pldatacli sql orders.csv customers.csv \
-q "SELECT * FROM data o JOIN customers c ON o.customer_id = c.id LIMIT 100"
Load SQL from a file
pldatacli sql sales.parquet \
--sql-file complex_query.sql
⚡ Tip: Use
--sql-fileto keep complex queries in.sqlfiles for readability and reuse. You cannot use--sqland--sql-filetogether.
Limit output rows
pldatacli sql Superstore.csv \
-q "SELECT * FROM data" \
--head 20
pldatacli sql Superstore.csv \
-q "SELECT * FROM data" \
--tail 10
Save SQL results to file
pldatacli sql Superstore.csv \
-q "SELECT Region, SUM(Sales) AS total_sales FROM data GROUP BY Region" \
--output result.parquet
Full SQL example
pldatacli sql orders.csv customers.csv \
--sql-file analysis.sql \
--head 50 \
--output joined_results.csv
pivot — Pivot Tables
Create spreadsheet-style pivot tables with flexible row groupings, column expansion, and aggregations.
pldatacli pivot FILE [OPTIONS]
--on defines the column whose unique values become new columns, --index sets the row grouping, and --values is the column to aggregate.
Basic pivot
pldatacli pivot SampleSuperstore.csv \
--index Region \
--on Category \
--values Sales \
--aggregate sum
Multi-index pivot
Group rows by multiple columns:
pldatacli pivot SampleSuperstore.csv \
--index Region \
--index "Ship Mode" \
--on Category \
--values Sales \
--aggregate sum
Supported aggregations
| Aggregation | Description |
|---|---|
sum |
Total (default) |
mean |
Average |
min |
Minimum value |
max |
Maximum value |
first |
First occurrence |
last |
Last occurrence |
median |
Median value |
count |
Count of non-null values |
len |
Row count |
pldatacli pivot SampleSuperstore.csv \
--index Region \
--on "Ship Mode" \
--values OrderID \
--aggregate count
Pivot with date truncation
Combine --truncate with pivot to analyse trends over time:
pldatacli pivot SampleSuperstore.csv \
--truncate "Order Date:month" \
--index "Order Date_month" \
--on Category \
--values Sales \
--aggregate sum
Rounding pivot results
pldatacli pivot SampleSuperstore.csv \
--index Region \
--on Category \
--values Profit \
--aggregate mean \
--round 2
Save pivot output to file
pldatacli pivot SampleSuperstore.csv \
--index Region \
--on Category \
--values Sales \
--aggregate sum \
--output pivot.csv
Save as Parquet:
pldatacli pivot sales.parquet \
--index Region \
--on "Ship Mode" \
--values OrderID \
--aggregate count \
--output pivot.parquet
Full pivot example
pldatacli pivot SampleSuperstore.csv \
--truncate "Order Date:quarter" \
--index "Order Date_quarter" \
--index Region \
--on Category \
--values Revenue \
--aggregate mean \
--round 1 \
--output quarterly_pivot.csv
init — Generate Boilerplate YAML
Generate a ready-to-edit YAML pipeline file with commented examples:
pldatacli init TEMPLATE [OPTIONS]
Accepts query or pivot as the template type. Will not overwrite an existing file — use --output to choose a different path.
Generate a query pipeline
pldatacli init query
Creates query.yaml:
file: data.csv
filter:
- "Column=Value"
- "Sales > 100"
truncate: "Order Date:month" # periods: year, quarter, month, week, day
groupby:
- "Order Date_month"
- Category
agg:
- "Profit:sum,mean"
- "Sales:sum"
sort:
- "Profit_sum:desc"
head: 10
round: 2
output: result.csv
Generate a pivot pipeline
pldatacli init pivot
Creates pivot.yaml:
file: data.csv
truncate: "Order Date:month" # periods: year, quarter, month, week, day — optional
pivot:
column: "Category" # unique values of this column become new columns
index: # row grouping — single string or list
- "Order Date_month"
- Region
values: "Sales" # column to aggregate
aggregate: "sum" # sum, mean, min, max, first, last, median, count, len
round: 2
output: pivot_result.csv
Custom output path
pldatacli init query --output my_analysis.yaml
pldatacli init pivot --output my_pivot.yaml
⚡ Tip: Run
pldatacli init queryorpldatacli init pivotat the start of a new analysis to get a fully commented template, then edit it and run withpldatacli run.
Version
Check the installed version:
pldatacli --version
or
pldatacli -v
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pldatacli-0.1.9.tar.gz.
File metadata
- Download URL: pldatacli-0.1.9.tar.gz
- Upload date:
- Size: 10.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
33c18a751c4159b3ecc37a5adef7ee5318bf86bedd51c09c60700ca646689a78
|
|
| MD5 |
1e6241c17081dc4f9d7df2d79a21eeb2
|
|
| BLAKE2b-256 |
f4562fc70d35cc5eb4d37b55a6d932992b680b89ecbe971cb0fcb2101b88ef24
|
File details
Details for the file pldatacli-0.1.9-py3-none-any.whl.
File metadata
- Download URL: pldatacli-0.1.9-py3-none-any.whl
- Upload date:
- Size: 17.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6d248207254849029c4308be29a0853f7c33095f48c5da19ce77b811d0a6046
|
|
| MD5 |
82e5184314b87409bb2db03bc886a0fd
|
|
| BLAKE2b-256 |
9824a7790bd20a9f2480359c197021037211697ef90158f77dbc48964df07a9f
|