A data analysis cli tool using polars lazyframes

Project description

pldatacli

A simple command-line tool for quick CSV data analysis using Polars, with lazy execution for efficiency.

Tech Stack

Polars – fast DataFrame engine with lazy execution for efficient data processing
Typer – modern CLI framework for building command-line interfaces
Rich – beautiful terminal rendering for clean table output

PyPi Repository

Check the Repository on PyPI - https://pypi.org/project/pldatacli/

Installation

Option 1: Directly with pip

pip install pldatacli

Option 2: with uv package manager (Requires uv to be installed)

uv tool install pldatacli

Option 3: with homebrew tap (Requires homebrew installed)

brew tap utkarshg1/pldatacli
brew install pldatacli

Commands

Command	Description
`query`	Filter, aggregate, sort, and explore a single file
`schema`	Inspect columns, dtypes, and null counts
`run`	Execute a multi-step pipeline defined in a YAML file
`sql`	Run arbitrary SQL queries against one or more files
`pivot`	Create pivot tables with flexible aggregations
`init`	Generate a boilerplate YAML pipeline file

Usage

`query` — Exploratory Analysis

pldatacli query FILE [OPTIONS]

Example file:

SampleSuperstore.csv

Filter rows

Single filter:

pldatacli query SampleSuperstore.csv \
  --filter "State:Texas"

Multiple filters:

pldatacli query SampleSuperstore.csv \
  --filter "State:Texas" \
  --filter "Category:Furniture"

Numeric filter:

pldatacli query SampleSuperstore.csv \
  --filter "Sales > 500"

Truncate date column

Truncate a date column into a period-based group. Format: col:period

Supported periods: year, quarter, month, week, day

This creates a new derived column named <col>_<period> that can be used in --groupby.

By month:

pldatacli query SampleSuperstore.csv \
  --truncate "Order Date:month"

By quarter:

pldatacli query SampleSuperstore.csv \
  --truncate "Order Date:quarter"

By year:

pldatacli query SampleSuperstore.csv \
  --truncate "Order Date:year"

Group by columns

Single column:

pldatacli query SampleSuperstore.csv \
  --groupby Region

Multiple columns:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --groupby Category

Group by truncated date column:

pldatacli query SampleSuperstore.csv \
  --truncate "Order Date:month" \
  --groupby "Order Date_month"

Aggregations

Single aggregation:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit:sum"

Multiple aggregations on one column:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit:sum,mean"

Multiple columns with aggregations:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --groupby Category \
  --agg "Sales:sum,mean" \
  --agg "Profit:sum"

Sorting

Single sort:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit:sum" \
  --sort "Profit_sum:desc"

Multiple sorts:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit:sum" \
  --sort "Region:asc" \
  --sort "Profit_sum:desc"

Rounding results

Round float columns to 2 digits:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit:mean" \
  --round 2

Custom rounding:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit:mean" \
  --round 4

Limiting rows

Head:

pldatacli query SampleSuperstore.csv \
  --head 5

Tail:

pldatacli query SampleSuperstore.csv \
  --tail 10

Save output to file

Save results as CSV:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit:sum" \
  --output result.csv

Save results as Parquet:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit:sum" \
  --output result.parquet

⚡ Tip: Results are always printed to the terminal and saved to file simultaneously.

Full query example

pldatacli query SampleSuperstore.csv \
  --filter "Region=West" \
  --truncate "Order Date:month" \
  --groupby "Order Date_month" \
  --groupby Category \
  --agg "Profit:sum,mean" \
  --agg "Sales:sum" \
  --sort "Profit_sum:desc" \
  --head 10 \
  --round 2 \
  --output monthly_west.csv

`schema` — Inspect File Structure

Get columns, dtypes, and null counts without processing the full dataset:

pldatacli schema SampleSuperstore.csv

Example output:

LazyFrame Schema
┏━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┓
┃ Column       ┃ Dtype   ┃ Nulls ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━┩
│ Ship Mode    │ String  │     0 │
│ Segment      │ String  │     0 │
│ Country      │ String  │     0 │
│ City         │ String  │     0 │
│ State        │ String  │     0 │
│ Postal Code  │ Int64   │     0 │
│ Region       │ String  │     0 │
│ Category     │ String  │     0 │
│ Sub-Category │ String  │     0 │
│ Sales        │ Float64 │     0 │
│ Quantity     │ Int64   │     0 │
│ Discount     │ Float64 │     0 │
│ Profit       │ Float64 │     0 │
└──────────────┴─────────┴───────┘
Rows: 9994, Columns: 13

⚡ Tip: Use schema before running queries to quickly inspect columns, types, and missing values.

`run` — YAML Pipelines

Execute a reusable, multi-step analysis defined in a YAML file:

pldatacli run query.yaml

Option 1 — Aggregation pipeline (query.yaml):

file: Superstore.csv
filter:
  - "Region=West"
truncate: "Order Date:month"
groupby:
  - "Order Date_month"
  - Category
agg:
  - "Profit:sum,mean"
  - "Sales:sum"
sort:
  - "Profit_sum:desc"
head: 10
round: 2
output: monthly_west.csv

Option 2 — Pivot pipeline (pivot.yaml):

file: Superstore.csv
truncate: "Order Date:month"
pivot:
  column: "Region"
  index: "Order Date_month"
  values: "Profit"
  aggregate: "sum"
round: 2
output: pivot_result.csv

index accepts a single string or a list of strings for multi-index pivots.

⚡ Tip: Store your YAML query files in version control alongside your data pipelines for reproducible analysis.

`sql` — Ad-hoc SQL Queries

Run arbitrary Polars SQL queries against one or more files.

pldatacli sql FILE [FILES...] [OPTIONS]

The first file is always registered as the table data. Additional files are registered using their filename stem (lowercased, sanitized).

Single file query

pldatacli sql Superstore.csv \
  -q "SELECT Category, SUM(Profit) AS total_profit FROM data GROUP BY Category ORDER BY total_profit DESC"

Multi-file JOIN

pldatacli sql orders.csv customers.csv \
  -q "SELECT * FROM data o JOIN customers c ON o.customer_id = c.id LIMIT 100"

Load SQL from a file

pldatacli sql sales.parquet \
  --sql-file complex_query.sql

⚡ Tip: Use --sql-file to keep complex queries in .sql files for readability and reuse. You cannot use --sql and --sql-file together.

Limit output rows

pldatacli sql Superstore.csv \
  -q "SELECT * FROM data" \
  --head 20

pldatacli sql Superstore.csv \
  -q "SELECT * FROM data" \
  --tail 10

Save SQL results to file

pldatacli sql Superstore.csv \
  -q "SELECT Region, SUM(Sales) AS total_sales FROM data GROUP BY Region" \
  --output result.parquet

Full SQL example

pldatacli sql orders.csv customers.csv \
  --sql-file analysis.sql \
  --head 50 \
  --output joined_results.csv

`pivot` — Pivot Tables

Create spreadsheet-style pivot tables with flexible row groupings, column expansion, and aggregations.

pldatacli pivot FILE [OPTIONS]

--on defines the column whose unique values become new columns, --index sets the row grouping, and --values is the column to aggregate.

Basic pivot

pldatacli pivot SampleSuperstore.csv \
  --index Region \
  --on Category \
  --values Sales \
  --aggregate sum

Multi-index pivot

Group rows by multiple columns:

pldatacli pivot SampleSuperstore.csv \
  --index Region \
  --index "Ship Mode" \
  --on Category \
  --values Sales \
  --aggregate sum

Supported aggregations

Aggregation	Description
`sum`	Total (default)
`mean`	Average
`min`	Minimum value
`max`	Maximum value
`first`	First occurrence
`last`	Last occurrence
`median`	Median value
`count`	Count of non-null values
`len`	Row count

pldatacli pivot SampleSuperstore.csv \
  --index Region \
  --on "Ship Mode" \
  --values OrderID \
  --aggregate count

Pivot with date truncation

Combine --truncate with pivot to analyse trends over time:

pldatacli pivot SampleSuperstore.csv \
  --truncate "Order Date:month" \
  --index "Order Date_month" \
  --on Category \
  --values Sales \
  --aggregate sum

Rounding pivot results

pldatacli pivot SampleSuperstore.csv \
  --index Region \
  --on Category \
  --values Profit \
  --aggregate mean \
  --round 2

Save pivot output to file

pldatacli pivot SampleSuperstore.csv \
  --index Region \
  --on Category \
  --values Sales \
  --aggregate sum \
  --output pivot.csv

Save as Parquet:

pldatacli pivot sales.parquet \
  --index Region \
  --on "Ship Mode" \
  --values OrderID \
  --aggregate count \
  --output pivot.parquet

Full pivot example

pldatacli pivot SampleSuperstore.csv \
  --truncate "Order Date:quarter" \
  --index "Order Date_quarter" \
  --index Region \
  --on Category \
  --values Revenue \
  --aggregate mean \
  --round 1 \
  --output quarterly_pivot.csv

`init` — Generate Boilerplate YAML

Generate a ready-to-edit YAML pipeline file with commented examples:

pldatacli init TEMPLATE [OPTIONS]

Accepts query or pivot as the template type. Will not overwrite an existing file — use --output to choose a different path.

Generate a query pipeline

pldatacli init query

Creates query.yaml:

file: data.csv
filter:
  - "Column=Value"
  - "Sales > 100"
truncate: "Order Date:month"   # periods: year, quarter, month, week, day
groupby:
  - "Order Date_month"
  - Category
agg:
  - "Profit:sum,mean"
  - "Sales:sum"
sort:
  - "Profit_sum:desc"
head: 10
round: 2
output: result.csv

Generate a pivot pipeline

pldatacli init pivot

Creates pivot.yaml:

file: data.csv
truncate: "Order Date:month"   # periods: year, quarter, month, week, day — optional
pivot:
  column: "Category"           # unique values of this column become new columns
  index:                       # row grouping — single string or list
    - "Order Date_month"
    - Region
  values: "Sales"              # column to aggregate
  aggregate: "sum"             # sum, mean, min, max, first, last, median, count, len
round: 2
output: pivot_result.csv

Custom output path

pldatacli init query --output my_analysis.yaml
pldatacli init pivot --output my_pivot.yaml

⚡ Tip: Run pldatacli init query or pldatacli init pivot at the start of a new analysis to get a fully commented template, then edit it and run with pldatacli run.

Version

Check the installed version:

pldatacli --version

pldatacli -v

Project details

Release history Release notifications | RSS feed

0.3.0

Mar 21, 2026

0.2.0

Mar 16, 2026

0.1.12

Mar 15, 2026

0.1.11

Mar 15, 2026

0.1.10 yanked

Mar 15, 2026

Reason this release was yanked:

Incorrect readme pipx installation

This version

0.1.9

Mar 15, 2026

0.1.8

Mar 14, 2026

0.1.7 yanked

Mar 14, 2026

Reason this release was yanked:

Incorrect Readme file

0.1.6

Mar 14, 2026

0.1.5

Mar 14, 2026

0.1.3

Mar 14, 2026

0.1.2

Mar 14, 2026

0.1.0

Mar 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pldatacli-0.1.9.tar.gz (10.8 kB view details)

Uploaded Mar 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pldatacli-0.1.9-py3-none-any.whl (17.7 kB view details)

Uploaded Mar 15, 2026 Python 3

File details

Details for the file pldatacli-0.1.9.tar.gz.

File metadata

Download URL: pldatacli-0.1.9.tar.gz
Upload date: Mar 15, 2026
Size: 10.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pldatacli-0.1.9.tar.gz
Algorithm	Hash digest
SHA256	`33c18a751c4159b3ecc37a5adef7ee5318bf86bedd51c09c60700ca646689a78`
MD5	`1e6241c17081dc4f9d7df2d79a21eeb2`
BLAKE2b-256	`f4562fc70d35cc5eb4d37b55a6d932992b680b89ecbe971cb0fcb2101b88ef24`

See more details on using hashes here.

File details

Details for the file pldatacli-0.1.9-py3-none-any.whl.

File metadata

Download URL: pldatacli-0.1.9-py3-none-any.whl
Upload date: Mar 15, 2026
Size: 17.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pldatacli-0.1.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f6d248207254849029c4308be29a0853f7c33095f48c5da19ce77b811d0a6046`
MD5	`82e5184314b87409bb2db03bc886a0fd`
BLAKE2b-256	`9824a7790bd20a9f2480359c197021037211697ef90158f77dbc48964df07a9f`

See more details on using hashes here.

pldatacli 0.1.9

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

pldatacli

Tech Stack

PyPi Repository

Installation

Commands

Usage

query — Exploratory Analysis

Filter rows

Truncate date column

Group by columns

Aggregations

Sorting

Rounding results

Limiting rows

Save output to file

Full query example

schema — Inspect File Structure

run — YAML Pipelines

sql — Ad-hoc SQL Queries

Single file query

Multi-file JOIN

Load SQL from a file

Limit output rows

Save SQL results to file

Full SQL example

pivot — Pivot Tables

Basic pivot

Multi-index pivot

Supported aggregations

Pivot with date truncation

Rounding pivot results

Save pivot output to file

Full pivot example

init — Generate Boilerplate YAML

Generate a query pipeline

Generate a pivot pipeline

Custom output path

Version

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`query` — Exploratory Analysis

`schema` — Inspect File Structure

`run` — YAML Pipelines

`sql` — Ad-hoc SQL Queries

`pivot` — Pivot Tables

`init` — Generate Boilerplate YAML