LLM-powered CSV data analyzer with structured output generation using LiteLLM (short for LLM Analyser)
Project description
pplyz
Minimal CSV→LLM→CSV transformer powered by LiteLLM and uv.
Requirements
- uv
- macOS/Linux:
brew install uvorcurl -LsSf https://astral.sh/uv/install.sh | sh - Windows:
scoop install uv
- macOS/Linux:
- At least one LiteLLM-compatible API key (OpenAI, Gemini, Anthropic, Groq, etc.)
uvx downloads the right Python runtime automatically, so no global Python is needed once uv is installed.
Quick run (uvx)
uvx pplyz \
data/sample.csv \
--input question,answer \
--output 'score:int,notes:str'
--previewdry-runs a handful of rows (set[pplyz].preview_rowsto change how many rows are shown).--model provider/nameoverrides the LiteLLM model (e.g.,groq/llama-3.1-8b-instant).- Prompts are entered interactively at runtime (history is stored under
~/.config/pplyz/). For non-interactive runs, provide the prompt when the CLI asks for it.
pplyz overwrites the input CSV; copy it first if you need to keep the original file.
Run uvx pplyz --help for every flag.
Common options
| Flag | Description | Required |
|---|---|---|
INPUT (positional) |
Input CSV path. | Yes |
-i, --input title,abstract |
Comma-separated source columns passed to the LLM. | Yes (unless [pplyz].default_input is set) |
-o, --output 'score:int,notes:str' |
Output column schema. Types: bool, int, float, str (missing :type defaults to str). |
Yes (unless [pplyz].default_output is set) |
-p, --preview |
Process a few rows and show would-be output without writing (row count configured via [pplyz].preview_rows). |
No |
-m, --model provider/name |
LiteLLM model (default gemini/gemini-2.5-flash-lite). |
No |
-f, --force |
Disable resume mode; always recompute rows and overwrite existing output. | No |
Configuration
-
Create the user config once:
mkdir -p ~/.config/pplyz $EDITOR ~/.config/pplyz/config.toml
-
Add only the providers you actually use:
[env] OPENAI_API_KEY = "sk-..." GROQ_API_KEY = "gsk-..." [pplyz] default_model = "gpt-4o-mini" default_input = "title,abstract" default_output = "relevant:bool,summary:str"
-
At runtime pplyz loads settings in this order: environment variables → config file. The default path is
~/.config/pplyz/config.toml(or%APPDATA%\\pplyz\\config.tomlon Windows; ifXDG_CONFIG_HOMEis set, it uses that). To keep configs elsewhere, setPPLYZ_CONFIG_DIR=/path/to/dirand placeconfig.tomlthere.
Tip: pplyz data/papers.csv --input title,abstract --output 'summary:str' uses the positional data/papers.csv as the CSV input.
Settings reference
[pplyz] table
| key | description | default |
|---|---|---|
default_model |
Sets the fallback LiteLLM model when --model is omitted. |
gemini/gemini-2.5-flash-lite |
default_input |
Comma-separated columns used when -i/--input is omitted. |
unset |
default_output |
Output schema used when -o/--output is omitted. |
unset |
preview_rows |
Number of rows used when --preview is set (can also be overridden via PPLYZ_PREVIEW_ROWS). |
3 |
Provider API keys
Set these inside the [env] table of your config.toml:
| Provider | Keys (checked in order) |
|---|---|
| Gemini | GEMINI_API_KEY |
| OpenAI | OPENAI_API_KEY |
| Anthropic / Claude | ANTHROPIC_API_KEY |
| Groq | GROQ_API_KEY |
| Mistral | MISTRAL_API_KEY |
| Cohere | COHERE_API_KEY |
| Replicate | REPLICATE_API_KEY |
| Hugging Face | HUGGINGFACE_API_KEY |
| Together AI | TOGETHERAI_API_KEY, TOGETHER_AI_TOKEN |
| Perplexity | PERPLEXITY_API_KEY |
| DeepSeek | DEEPSEEK_API_KEY |
| xAI | XAI_API_KEY |
| Azure OpenAI | AZURE_OPENAI_API_KEY, AZURE_API_KEY |
| AWS (Bedrock/SageMaker) | AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY |
| Vertex AI | GOOGLE_APPLICATION_CREDENTIALS |
Supported models
For the latest list of supported models, see the LiteLLM provider docs: https://docs.litellm.ai/docs/providers
Examples
Sentiment pass with a preview first (preview_rows set to 5 in your config):
[pplyz]
preview_rows = 5
uvx pplyz \
data/reviews.csv \
--input review_text \
--output 'sentiment:str,confidence:float' \
--preview
Boolean classifier that writes back into the same CSV:
uvx pplyz \
data/articles.csv \
--input title,abstract \
--output 'is_relevant:bool,summary:str'
Model override with Anthropic:
uvx pplyz \
data/papers.csv \
--input title,abstract \
--output 'findings:str' \
--model claude-3-5-sonnet-20241022
Tips
- Boolean output columns keep binary classifiers deterministic (
true/false). - Some models do not support JSON mode; pplyz only sends
response_formatto models that advertise support. Explicitly state “return valid JSON only” in your prompt to keep outputs consistent. - Keep prompts short and explicit about the JSON schema you expect to avoid parsing errors.
- Use
--previewbefore long or expensive CSV batches to validate prompts and model choice. - Resume mode is on by default; rows with existing output columns are skipped. Use
--forceto recompute everything. - Dynamic (schema-less) mode is not supported; always provide
--output(or set[pplyz].default_output). - CSV encoding is UTF-8 only; convert input files beforehand if they use another encoding.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pplyz-0.1.4.tar.gz.
File metadata
- Download URL: pplyz-0.1.4.tar.gz
- Upload date:
- Size: 26.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f262d032fd0c23f5d73948c97cfa21f77d8bdcef7f7973a0175f2b80500c0a4a
|
|
| MD5 |
bbf1ed43c5fc5e947971bac99f20ed1c
|
|
| BLAKE2b-256 |
3e4cc0ed6058b49fa677f5159986b6d2ca28a8e22f3042914b993b2aa50d8ffa
|
File details
Details for the file pplyz-0.1.4-py3-none-any.whl.
File metadata
- Download URL: pplyz-0.1.4-py3-none-any.whl
- Upload date:
- Size: 20.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3be272563699c56251012b204d1e64a657d8277d605ae9de0acecb4edd6dfd6
|
|
| MD5 |
740544353678bab3716d6700fe296549
|
|
| BLAKE2b-256 |
2ee2048eec35ac237f6f10c869c8480271645b96825b1e9890b36491386961b8
|