Skip to main content

A data analysis cli tool using polars lazyframes

Reason this release was yanked:

Incorrect Readme file

Project description

pldatacli

A simple command-line tool for quick CSV data analysis using Polars, with lazy execution for efficiency.


Tech Stack

  • Polars – fast DataFrame engine with lazy execution for efficient data processing
  • Typer – modern CLI framework for building command-line interfaces
  • Rich – beautiful terminal rendering for clean table output

PyPi Repository

Check the Repository on PyPI - https://pypi.org/project/pldatacli/


Installation

  • Option 1: Directly with pip
pip install pldatacli
  • Option 2: with uv package manager (Requires uv to be installed)
uv tool install pldatacli

Commands

Command Description
query Filter, aggregate, sort, and explore a single file
schema Inspect columns, dtypes, and null counts
run Execute a multi-step pipeline defined in a YAML file
sql Run arbitrary SQL queries against one or more files

Usage

query — Exploratory Analysis

pldatacli query FILE [OPTIONS]

Example file:

SampleSuperstore.csv

Filter rows

Single filter:

pldatacli query SampleSuperstore.csv \
  --filter "State:Texas"

Multiple filters:

pldatacli query SampleSuperstore.csv \
  --filter "State:Texas" \
  --filter "Category:Furniture"

Numeric filter:

pldatacli query SampleSuperstore.csv \
  --filter "Sales > 500"

Truncate date column

Truncate a date column into a period-based group. Format: col:period

Supported periods: year, quarter, month, week, day

This creates a new derived column named <col>_<period> that can be used in --groupby.

By month:

pldatacli query SampleSuperstore.csv \
  --truncate "Order Date:month"

By quarter:

pldatacli query SampleSuperstore.csv \
  --truncate "Order Date:quarter"

By year:

pldatacli query SampleSuperstore.csv \
  --truncate "Order Date:year"

Group by columns

Single column:

pldatacli query SampleSuperstore.csv \
  --groupby Region

Multiple columns:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --groupby Category

Group by truncated date column:

pldatacli query SampleSuperstore.csv \
  --truncate "Order Date:month" \
  --groupby "Order Date_month"

Aggregations

Single aggregation:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit:sum"

Multiple aggregations on one column:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit:sum,mean"

Multiple columns with aggregations:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --groupby Category \
  --agg "Sales:sum,mean" \
  --agg "Profit:sum"

Sorting

Single sort:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit:sum" \
  --sort "Profit_sum:desc"

Multiple sorts:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit:sum" \
  --sort "Region:asc" \
  --sort "Profit_sum:desc"

Rounding results

Round float columns to 2 digits:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit:mean" \
  --round 2

Custom rounding:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit:mean" \
  --round 4

Limiting rows

Head:

pldatacli query SampleSuperstore.csv \
  --head 5

Tail:

pldatacli query SampleSuperstore.csv \
  --tail 10

Save output to file

Save results as CSV:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit:sum" \
  --output result.csv

Save results as Parquet:

pldatacli query SampleSuperstore.csv \
  --groupby Region \
  --agg "Profit:sum" \
  --output result.parquet

⚡ Tip: Results are always printed to the terminal and saved to file simultaneously.


Full query example

pldatacli query SampleSuperstore.csv \
  --filter "Region:West" \
  --truncate "Order Date:month" \
  --groupby "Order Date_month" \
  --groupby Category \
  --agg "Profit:sum,mean" \
  --agg "Sales:sum" \
  --sort "Profit_sum:desc" \
  --head 10 \
  --round 2 \
  --output monthly_west.csv

schema — Inspect File Structure

Get columns, dtypes, and null counts without processing the full dataset:

pldatacli schema SampleSuperstore.csv

Example output:

LazyFrame Schema
┏━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┓
┃ Column       ┃ Dtype   ┃ Nulls ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━┩
│ Ship Mode    │ String  │     0 │
│ Segment      │ String  │     0 │
│ Country      │ String  │     0 │
│ City         │ String  │     0 │
│ State        │ String  │     0 │
│ Postal Code  │ Int64   │     0 │
│ Region       │ String  │     0 │
│ Category     │ String  │     0 │
│ Sub-Category │ String  │     0 │
│ Sales        │ Float64 │     0 │
│ Quantity     │ Int64   │     0 │
│ Discount     │ Float64 │     0 │
│ Profit       │ Float64 │     0 │
└──────────────┴─────────┴───────┘
Rows: 9994, Columns: 13

⚡ Tip: Use schema before running queries to quickly inspect columns, types, and missing values.


run — YAML Pipelines

Execute a reusable, multi-step analysis defined in a YAML file:

pldatacli run query.yaml

Example query.yaml:

file: Superstore.csv
filter:
  - "Region:West"
truncate: "Order Date:month"
groupby:
  - "Order Date_month"
  - Category
agg:
  - "Profit:sum,mean"
  - "Sales:sum"
sort:
  - "Profit_sum:desc"
head: 10
round: 2
output: monthly_west.csv

⚡ Tip: Store your YAML query files in version control alongside your data pipelines for reproducible analysis.


sql — Ad-hoc SQL Queries

Run arbitrary Polars SQL queries against one or more files.

pldatacli sql FILE [FILES...] [OPTIONS]

The first file is always registered as the table data. Additional files are registered using their filename stem (lowercased, sanitized).


Single file query

pldatacli sql Superstore.csv \
  -q "SELECT Category, SUM(Profit) AS total_profit FROM data GROUP BY Category ORDER BY total_profit DESC"

Multi-file JOIN

pldatacli sql orders.csv customers.csv \
  -q "SELECT * FROM data o JOIN customers c ON o.customer_id = c.id LIMIT 100"

Load SQL from a file

pldatacli sql sales.parquet \
  --sql-file complex_query.sql

⚡ Tip: Use --sql-file to keep complex queries in .sql files for readability and reuse. You cannot use --sql and --sql-file together.


Limit output rows

pldatacli sql Superstore.csv \
  -q "SELECT * FROM data" \
  --head 20
pldatacli sql Superstore.csv \
  -q "SELECT * FROM data" \
  --tail 10

Save SQL results to file

pldatacli sql Superstore.csv \
  -q "SELECT Region, SUM(Sales) AS total_sales FROM data GROUP BY Region" \
  --output result.parquet

Full SQL example

pldatacli sql orders.csv customers.csv \
  --sql-file analysis.sql \
  --head 50 \
  --output joined_results.csv

Version

Check the installed version:

pldatacli --version

or

pldatacli -v

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pldatacli-0.1.7.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pldatacli-0.1.7-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file pldatacli-0.1.7.tar.gz.

File metadata

  • Download URL: pldatacli-0.1.7.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pldatacli-0.1.7.tar.gz
Algorithm Hash digest
SHA256 3d4332bb93e4f20d5d5db3f5162336ad5e92590053525741e646b821a46d7a4e
MD5 17754e6186b97136811f8048cf65e824
BLAKE2b-256 179719c1c3d9b0db8d3f7ab059b331df2f3d3cec6801a01bdad04b1476c54df3

See more details on using hashes here.

File details

Details for the file pldatacli-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: pldatacli-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pldatacli-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 e1419031ff7d572f4b0db586c712c60b163b7b4d4cbe03eb424dce28a106cd16
MD5 9a4efcb14c5fb51acf6661159f240f53
BLAKE2b-256 0f3f78b31e6ea47ce9c4993172afc8f6183666328f23c9179245de8dfe4d3178

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page