Ask questions about your CSV, Excel or Parquet data in natural language.
Project description
DataTalk CLI
Ask questions to your CSV, Excel & Parquet files in plain English right from your terminal.
Stop writing SQL or memorizing command flags for quick data checks. Just ask your question naturally and get instant answers while keeping your data completely local and private.
Why DataTalk?
Working with data files in the terminal usually means choosing between:
- GUI tools (Excel) - slow for large files, breaks your workflow
- CLI tools (qsv, csvkit) - powerful but require memorizing many commands
- SQL tools (DuckDB) - need formal query syntax for simple questions
DataTalk gives you the best of both: natural language questions + local processing + terminal speed.
Features
- 100% Local Processing - Data never leaves your machine, only schema is sent to LLM
- Natural Language - Ask questions in plain English, no SQL required
- Multiple Formats - Supports CSV, Excel (.xlsx, .xls), and Parquet files
- Transparent - Use
--show-sqlto see generated queries
Installation
pip install datatalk-cli
Requirements: Python 3.9+ and OpenAI or Azure OpenAI API key
Quick Start
# Set your API key
export OPENAI_API_KEY="your-key-here"
# Query a file
dtalk sales_data.csv "What are the top 5 products by revenue?"
Configuration
OpenAI
export OPENAI_API_KEY="your-api-key"
export OPENAI_MODEL="gpt-4o" # Optional
Azure OpenAI
export AZURE_DEPLOYMENT_TARGET_URL="https://your-resource.openai.azure.com/..."
export AZURE_OPENAI_API_KEY="your-api-key"
Or create a .env file, or run dtalk and follow the interactive setup.
Commands:
dtalk --config-info # Show configuration
dtalk --reset-config # Clear configuration
Usage
Interactive mode - ask multiple questions:
dtalk sales_data.csv
Direct query - single question and exit:
dtalk sales_data.csv "What were total sales in Q4?"
Examples
# Basic queries
dtalk data.csv "How many rows?"
dtalk data.csv "Show first 10 rows"
dtalk data.csv "What is the average order value?"
# Filtering & sorting
dtalk data.csv "Show customers from Canada"
dtalk data.csv "Top 10 products by revenue"
# Aggregations
dtalk data.csv "Total revenue by category"
dtalk data.csv "Monthly revenue trend for 2024"
# Excel files work the same way
dtalk report.xlsx "What is the average salary?"
dtalk budget.xls "Show expenses by department"
# Parquet files work the same way
dtalk data.parquet "Count distinct users"
Advanced Options
dtalk data.csv "query" --show-sql # Show generated SQL
dtalk data.csv "query" --show-tokens # Show API token usage
dtalk data.csv --hide-schema # Hide dataset schema
dtalk data.csv "query" --hide-data # Hide query results
Scripting
# Use in scripts
REPORT=$(dtalk sales.csv "yesterday's revenue" --hide-data)
echo "$REPORT" | mail -s "Report" team@company.com
# Process multiple files
for file in data_*.csv; do
dtalk "$file" "row count"
done
Development
git clone https://github.com/vtsaplin/datatalk.git
cd datatalk
uv run dtalk sample_data/sales_data.csv
# Run tests
uv sync --extra test
uv run pytest
# Build package
python -m build
FAQ
Q: Is my data sent to OpenAI/Azure?
A: No. Only schema (column names and types) is sent. Your data stays local.
Q: How much does it cost?
A: 200-500 tokens per query ($0.001-0.005 with GPT-4). Use --show-tokens to monitor.
Q: What file formats are supported?
A: CSV, Excel (.xlsx, .xls), and Parquet files.
Q: How large files can I query?
A: DuckDB handles multi-gigabyte files. Parquet is faster for large datasets.
Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes with tests (
uv run pytest) - Submit a PR
License
MIT License - see LICENSE file.
Built with DuckDB, OpenAI API, and Rich.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datatalk_cli-0.1.22.tar.gz.
File metadata
- Download URL: datatalk_cli-0.1.22.tar.gz
- Upload date:
- Size: 14.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d7c29a5abffea5fefc1550f8a00e5a20060639d3ba9ffa3dcb0323328c684ee0
|
|
| MD5 |
57045e5365ca6166a3a29a095cf69ddc
|
|
| BLAKE2b-256 |
88f2553309909c22ac6ce1355176d40ce858dcf51131c17b66cbee5fd06b5389
|
Provenance
The following attestation bundles were made for datatalk_cli-0.1.22.tar.gz:
Publisher:
publish.yml on vtsaplin/datatalk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datatalk_cli-0.1.22.tar.gz -
Subject digest:
d7c29a5abffea5fefc1550f8a00e5a20060639d3ba9ffa3dcb0323328c684ee0 - Sigstore transparency entry: 714604241
- Sigstore integration time:
-
Permalink:
vtsaplin/datatalk@1121178b54bffc2918b68f5fd7a310e95e25a9e4 -
Branch / Tag:
refs/tags/v0.1.22 - Owner: https://github.com/vtsaplin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1121178b54bffc2918b68f5fd7a310e95e25a9e4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file datatalk_cli-0.1.22-py3-none-any.whl.
File metadata
- Download URL: datatalk_cli-0.1.22-py3-none-any.whl
- Upload date:
- Size: 11.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21316bdedf491f30bd6a8cbcad6ed5e1d0aef0a6baa178349d28a431edf0ab7f
|
|
| MD5 |
40534d92598690a127090c113989288f
|
|
| BLAKE2b-256 |
b9ec1c2ccb03d0aaa3eca7f2689dc9b604d08698aa1c69b9b6f829e6591a0ca2
|
Provenance
The following attestation bundles were made for datatalk_cli-0.1.22-py3-none-any.whl:
Publisher:
publish.yml on vtsaplin/datatalk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datatalk_cli-0.1.22-py3-none-any.whl -
Subject digest:
21316bdedf491f30bd6a8cbcad6ed5e1d0aef0a6baa178349d28a431edf0ab7f - Sigstore transparency entry: 714604250
- Sigstore integration time:
-
Permalink:
vtsaplin/datatalk@1121178b54bffc2918b68f5fd7a310e95e25a9e4 -
Branch / Tag:
refs/tags/v0.1.22 - Owner: https://github.com/vtsaplin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1121178b54bffc2918b68f5fd7a310e95e25a9e4 -
Trigger Event:
push
-
Statement type: