Ask questions about your CSV, Excel or Parquet data in natural language.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

vtsaplin

These details have not been verified by PyPI

Project description

DataTalk CLI

Chat with your data in plain English. Right from your terminal.

A natural language interface for your CSV, Excel (.xlsx), and Parquet files. Fast, local, and private.

Skip SQL and complex syntax. Just ask “What are the top 5 products?”
Get instant answers from your local data.

Privacy First: Your data never leaves your machine.
Formats: CSV, Excel (.xlsx), Parquet
Performance: Local analytics engine for instant results.

Demo

⭐ If you find this useful, please star the repo. It helps a lot!

Why DataTalk?

The Problem: You have a CSV file and a simple question. What do you do?

Open Excel? Slow for large files, and you have to leave the terminal
Use command-line tools (awk, csvkit)? Need to remember complex flags and syntax
Write SQL? Overkill for "show me the top 5 products"

The Solution: Just ask your question naturally.

dtalk sales.csv
> What are the top 5 products by revenue?
> Show me sales by region for Q4
> Which customers made orders over $1000?

Features

Natural Language - Ask questions in plain English, no SQL required
Interactive Mode - Ask multiple questions with ↑↓ history
100% Local Processing - Data never leaves your machine, only schema is sent to LLM
100% Offline Option - Use local Ollama models for complete offline operation, no internet required
Fast - DuckDB processes gigabytes locally in seconds
100+ LLM Models - Powered by LiteLLM - OpenAI, Anthropic, Google, Ollama (local), and more
Multiple File Formats - Supports CSV, Excel (.xlsx, .xls), and Parquet files
Scriptable - JSON and CSV output formats for automation and pipelines
Simple Configuration - Just set LLM_MODEL and API key environment variables
Transparent - SQL queries shown by default, use --no-sql to hide

Installation

pip install datatalk-cli

Requirements: Python 3.9+ and either an API key for cloud models (OpenAI, Anthropic, etc.) OR local Ollama for offline use

Quick Start

# Option 1: Use cloud models (OpenAI, Anthropic, Google, etc.)
export LLM_MODEL="gpt-4o"
export OPENAI_API_KEY="your-key-here"

# Option 2: Use local Ollama (100% offline, fully private, no API key needed!)
export LLM_MODEL="ollama/llama3.1"
# No API key needed - works completely offline!

# Start interactive mode - ask multiple questions
dtalk sales_data.csv

# You'll get a prompt where you can ask questions naturally:
# > What are the top 5 products by revenue?
# > Show me monthly sales trends
# > Which customers made purchases over $1000?

# Or use single query mode for quick answers
dtalk sales_data.csv -p "What are the top 5 products by revenue?"

Configuration

DataTalk uses LiteLLM to support 100+ models from various providers through a unified interface.

Required Environment Variables

Set two environment variables:

# 1. Choose your model
export LLM_MODEL="gpt-4o"

# 2. Set the API key for your provider
export OPENAI_API_KEY="your-key"

Supported Models

OpenAI:

export LLM_MODEL="gpt-4o"  # or gpt-4o-mini, gpt-3.5-turbo
export OPENAI_API_KEY="sk-..."

Anthropic Claude:

export LLM_MODEL="claude-3-5-sonnet-20241022"
export ANTHROPIC_API_KEY="sk-ant-..."

Google Gemini:

export LLM_MODEL="gemini-1.5-flash"  # or gemini-1.5-pro
export GEMINI_API_KEY="..."

Ollama (100% Offline - fully private, no internet required!):

# Install Ollama from https://ollama.ai
# Start Ollama: ollama serve
# Pull a model: ollama pull llama3.1

export LLM_MODEL="ollama/llama3.1"  # or ollama/mistral, ollama/codellama
# No API key needed! Works completely offline - your data and queries never leave your machine.

Azure OpenAI:

export LLM_MODEL="azure/gpt-4o"  # Use your deployment name
export AZURE_API_KEY="..."
export AZURE_API_BASE="https://your-resource.openai.azure.com"
export AZURE_API_VERSION="2024-02-01"

Note: Replace gpt-4o with your actual Azure deployment name

And 100+ more models! See LiteLLM Providers for the complete list including Cohere, Replicate, Hugging Face, AWS Bedrock, and more.

Optional Configuration

MODEL_TEMPERATURE - Control LLM response randomness (default: 0.1)

export MODEL_TEMPERATURE="0.5"  # Range: 0.0-2.0. Lower = more deterministic, Higher = more creative

Using .env file

Create a .env file in your project directory:

LLM_MODEL=gpt-4o
OPENAI_API_KEY=your-key

Usage

Interactive mode - ask multiple questions:

dtalk sales_data.csv

Direct query - single question and exit:

dtalk sales_data.csv -p "What were total sales in Q4?"
# or using long form:
dtalk sales_data.csv --prompt "What were total sales in Q4?"

Examples

# Basic queries
dtalk data.csv "How many rows?"
dtalk data.csv "Show first 10 rows"
dtalk data.csv "What is the average order value?"

# Filtering & sorting
dtalk data.csv "Show customers from Canada"
dtalk data.csv "Top 10 products by revenue"

# Aggregations
dtalk data.csv "Total revenue by category"
dtalk data.csv "Monthly revenue trend for 2024"

# Excel files work the same way
dtalk report.xlsx "What is the average salary?"
dtalk budget.xls "Show expenses by department"

# Parquet files work the same way
dtalk data.parquet "Count distinct users"

Options

Query Modes

# Interactive mode (default) - ask multiple questions
dtalk data.csv

# Non-interactive mode - single query and exit
dtalk data.csv -p "What are the top 5 products?"
dtalk data.csv --prompt "What are the top 5 products?"

Output Formats (with `-p` only)

DataTalk supports multiple output formats for different use cases:

# Human-readable table (default)
dtalk data.csv -p "Top 5 products"

# JSON format - for scripting and automation
dtalk data.csv -p "Top 5 products" --json
# Output: {"sql": "SELECT ...", "data": [...], "error": null}

# CSV format - for export and further processing
dtalk data.csv -p "Top 5 products" --csv
# Output: product_name,revenue
#         Apple,1000
#         Orange,500

Debug & Display Options

# SQL queries are shown by default
dtalk data.csv -p "query"

# Hide generated SQL
dtalk data.csv -p "query" --no-sql

# Show only SQL without executing (for debugging/validation)
dtalk data.csv -p "query" --sql-only

# Hide column details table when loading data
dtalk data.csv --no-schema

# Combine options
dtalk data.csv -p "query" --no-sql --no-schema    # Hide both SQL and schema

Scripting

DataTalk supports structured output formats for integration with scripts and pipelines:

# JSON output for scripting
REVENUE=$(dtalk sales.csv -p "total revenue" --json | jq -r '.data[0].total_revenue')
echo "Total Revenue: $REVENUE"

# CSV output for further processing
dtalk sales.csv -p "sales by region" --csv | \
  awk -F',' '{sum+=$2} END {print "Grand Total:", sum}'

# Process multiple files
for file in data_*.csv; do
  COUNT=$(dtalk "$file" -p "row count" --json | jq -r '.data[0].count')
  echo "$file: $COUNT rows"
done

# Generate SQL for external tools
SQL=$(dtalk sales.csv -p "top 10 products" --sql-only)
echo "$SQL" | duckdb production.db

# Export filtered data
dtalk sales.csv -p "sales from Q4 2024" --csv > q4_sales.csv

# Combine with other tools
dtalk sales.csv -p "top products" --json | \
  jq '.data[] | select(.revenue > 1000)'

Contributing

See CONTRIBUTING.md for development setup, making releases, and contribution guidelines.

Exit Codes

DataTalk returns standard exit codes for use in scripts and automation:

Exit Code	Meaning	Example
`0`	Success	Query completed successfully
`1`	Runtime error	Missing API key, query failed, file not found
`2`	Invalid arguments	`--json` without `-p`, invalid option combination

Example usage in scripts:

if dtalk sales.csv -p "total revenue" --json > result.json; then
    echo "Success!"
else
    echo "Failed with exit code $?"
fi

FAQ

Q: Can I use this completely offline?
A: Yes! Use local Ollama models and DataTalk works 100% offline with no internet connection required. Your data and queries never leave your machine.

Q: Is my data sent to the LLM provider?
A: With cloud models, only schema (column names and types) is sent - your actual data stays local. With local Ollama models, nothing leaves your machine at all.

Q: What file formats are supported?
A: CSV, Excel (.xlsx, .xls), and Parquet files.

Q: How large files can I query?
A: DuckDB handles multi-gigabyte files. Parquet is faster for large datasets.

License

MIT License - see LICENSE file.

Built with DuckDB, LiteLLM, and Rich.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

vtsaplin

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.27

Nov 30, 2025

0.1.26

Nov 26, 2025

0.1.25

Nov 25, 2025

0.1.24

Nov 24, 2025

0.1.23

Nov 22, 2025

0.1.22

Nov 22, 2025

0.1.21

Nov 22, 2025

0.1.20

Nov 21, 2025

0.1.19

Nov 21, 2025

0.1.18

Nov 21, 2025

0.1.17

Nov 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datatalk_cli-0.1.27.tar.gz (18.7 kB view details)

Uploaded Nov 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datatalk_cli-0.1.27-py3-none-any.whl (14.4 kB view details)

Uploaded Nov 30, 2025 Python 3

File details

Details for the file datatalk_cli-0.1.27.tar.gz.

File metadata

Download URL: datatalk_cli-0.1.27.tar.gz
Upload date: Nov 30, 2025
Size: 18.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datatalk_cli-0.1.27.tar.gz
Algorithm	Hash digest
SHA256	`d7ddd54fbe025ed913fb40ec526e9a7fcc114c0056a142c10f0db7fd02745e77`
MD5	`a389a3084b4e202dc90e4de36373123b`
BLAKE2b-256	`272f0fea176905d99f13b50b015a2283c43debfbb3d4eb9f651dcadbd704692b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datatalk_cli-0.1.27.tar.gz:

Publisher: publish.yml on vtsaplin/datatalk-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datatalk_cli-0.1.27.tar.gz
- Subject digest: d7ddd54fbe025ed913fb40ec526e9a7fcc114c0056a142c10f0db7fd02745e77
- Sigstore transparency entry: 731781997
- Sigstore integration time: Nov 30, 2025
Source repository:
- Permalink: vtsaplin/datatalk-cli@bec9e34f506b60bb1858e42dd9fcd1ae4d87dbbd
- Branch / Tag: refs/tags/v0.1.27
- Owner: https://github.com/vtsaplin
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@bec9e34f506b60bb1858e42dd9fcd1ae4d87dbbd
- Trigger Event: push

File details

Details for the file datatalk_cli-0.1.27-py3-none-any.whl.

File metadata

Download URL: datatalk_cli-0.1.27-py3-none-any.whl
Upload date: Nov 30, 2025
Size: 14.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datatalk_cli-0.1.27-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3c0e21b8ae3cc72a6cbb40b99bf52478bebe4c093fbcc3348f93366e025c16f0`
MD5	`04e8b70810530c6625ef1bbc46983e1a`
BLAKE2b-256	`453d507ef29cc3c1da432029c09f9af5da466f8cf030fd75efc91afc7f9ab7d5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datatalk_cli-0.1.27-py3-none-any.whl:

Publisher: publish.yml on vtsaplin/datatalk-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datatalk_cli-0.1.27-py3-none-any.whl
- Subject digest: 3c0e21b8ae3cc72a6cbb40b99bf52478bebe4c093fbcc3348f93366e025c16f0
- Sigstore transparency entry: 731781999
- Sigstore integration time: Nov 30, 2025
Source repository:
- Permalink: vtsaplin/datatalk-cli@bec9e34f506b60bb1858e42dd9fcd1ae4d87dbbd
- Branch / Tag: refs/tags/v0.1.27
- Owner: https://github.com/vtsaplin
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@bec9e34f506b60bb1858e42dd9fcd1ae4d87dbbd
- Trigger Event: push

datatalk-cli 0.1.27

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

DataTalk CLI

Chat with your data in plain English. Right from your terminal.

Why DataTalk?

Features

Installation

Quick Start

Configuration

Required Environment Variables

Supported Models

Optional Configuration

Using .env file

Usage

Examples

Options

Query Modes

Output Formats (with -p only)

Debug & Display Options

Scripting

Contributing

Exit Codes

FAQ

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Output Formats (with `-p` only)