AI-powered data exploration with natural language
Project description
datasight
Status: early and evolving. This project is in active development and the code is changing rapidly — APIs, CLI flags, and behavior may shift between commits. Feedback and bug reports from users are very welcome; please open an issue on GitHub.
AI-powered data exploration with natural language.
datasight connects an AI agent to your database and provides a web UI where you can ask questions in natural language. The agent writes SQL, runs queries, and generates interactive Plotly visualizations.
Supports DuckDB, PostgreSQL, SQLite, and Flight SQL databases. Also queries local CSV, Parquet, and Excel (.xlsx) files directly — no database setup required. Supports Anthropic Claude (default), GitHub Models (open source), and Ollama (local) as LLM backends.
Quick start
uv tool install datasight
# Create a new project
mkdir my-project && cd my-project
datasight init
# Edit .env with your API key and database path
# Edit schema_description.md to describe your data
# Edit queries.yaml with example questions
# Run the web UI
datasight run
Open http://localhost:8084 and start asking questions.
Explore CSV, Parquet, or Excel files with no setup
# Launch the web UI with no project, then paste a file or directory path
# into "Explore Files" to create views automatically
datasight run
# Or inspect a file from the command line (schema, row count, column stats)
datasight inspect generation.parquet
# Or build a persistent project from CSV/Parquet inputs
datasight generate generation.csv plants.csv --db-path grid.duckdb
Or ask from the command line without starting a server:
datasight ask "What are the top 10 records?"
datasight ask "Show trends by year" --chart-format html -o chart.html
datasight profile
datasight quality --format markdown -o quality.md
datasight ask --file questions.txt --output-dir batch-output
Features
- Natural language queries — ask questions in English, get SQL + results
- Interactive charts — Plotly visualizations with chart-type switching
- Multiple databases — DuckDB, PostgreSQL, SQLite, and Flight SQL
- Query files directly — point at a local CSV, Parquet, or Excel file (or directory) and start asking questions; datasight creates DuckDB views (or one table per Excel sheet) on the fly
- Headless CLI —
datasight askruns queries without a web server - Deterministic CLI workflows — profile, quality, dimension, trend, and recipe commands that do not require an LLM
- Schema browser — sidebar with tables, columns, and example queries
- Schema auto-discovery — tables, columns, and types detected automatically
- Domain context — describe your data in Markdown for better AI understanding
- Example queries — seed the AI with question/SQL pairs
- Reusable prompt recipes — project-specific analysis prompts derived from the schema
- Multi-chart dashboard — pin results, filter cards, and configure layouts
- Session export — export conversations as shareable HTML pages
- Keyboard shortcuts —
?to see all shortcuts,/to focus input - Streaming responses — real-time SSE streaming from the LLM
Architecture
datasight pairs a FastAPI backend with a Svelte 5 + TypeScript + Tailwind CSS
frontend built with Vite. It supports multiple LLM backends — Anthropic
(default), OpenAI, GitHub Models, and Ollama — selectable via LLM_PROVIDER in .env.
datasight run / datasight ask / datasight profile / datasight quality
→ LLM provider (Anthropic / OpenAI / GitHub Models / Ollama)
→ DuckDB / PostgreSQL / SQLite / Flight SQL (or CSV/Parquet via DuckDB views)
→ Plotly chart generator
→ Web UI (SSE streaming) or CLI output
Documentation
uv sync --extra dev
. .venv/bin/activate
zensical serve
zensical build
python scripts/generate_cli_reference.py
Development Tests
# Build frontend assets for FastAPI serving after a clean checkout
bash scripts/build-frontend.sh
# Python test suite
pytest
# CI-safe Python test suite, excluding tests that need local Ollama
pytest -m "not integration"
# Frontend unit tests (Vitest)
cd frontend && npm test
# Frontend E2E tests (Playwright, requires datasight run)
cd frontend && npm run test:e2e
# Rebuild frontend for FastAPI serving after frontend changes
bash scripts/build-frontend.sh
Generated web assets under src/datasight/web/static/ and
src/datasight/web/templates/index.html are not checked in. Run
bash scripts/build-frontend.sh before using datasight run from a clean
checkout when you want FastAPI to serve the production UI.
Ollama-backed CLI tests are marked integration because they require a running
local Ollama server with the qwen2.5:7b model available. CI runs pytest -m "not integration"; run pytest -m integration locally when you want to exercise the
live LLM path.
Software Record
datasight is developed under NLR Software Record SWR-26-045.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datasight-0.6.0.tar.gz.
File metadata
- Download URL: datasight-0.6.0.tar.gz
- Upload date:
- Size: 3.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94e03ee487f533df061f92c1a7ad587b923d1a5ae7d710245f2ede2341cb9a49
|
|
| MD5 |
d993f495153827dcaff65800ad1af954
|
|
| BLAKE2b-256 |
3097e6636a9f59d2f97610b8e1abd5761eb85dbca065ead7425223007be2daee
|
Provenance
The following attestation bundles were made for datasight-0.6.0.tar.gz:
Publisher:
release.yml on dsgrid/datasight
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datasight-0.6.0.tar.gz -
Subject digest:
94e03ee487f533df061f92c1a7ad587b923d1a5ae7d710245f2ede2341cb9a49 - Sigstore transparency entry: 1372553105
- Sigstore integration time:
-
Permalink:
dsgrid/datasight@456944b61a50aa5aaafa010c88a45871b5b24f9b -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/dsgrid
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@456944b61a50aa5aaafa010c88a45871b5b24f9b -
Trigger Event:
push
-
Statement type:
File details
Details for the file datasight-0.6.0-py3-none-any.whl.
File metadata
- Download URL: datasight-0.6.0-py3-none-any.whl
- Upload date:
- Size: 1.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94d357628cbb746b89f81d687615bee27b5da45ba3cda1d9e466c307cfbce641
|
|
| MD5 |
be6851fd11aa2197942613f08f74b45b
|
|
| BLAKE2b-256 |
9cb5495dc36fb54e7df3a56b9cbb35a9ca5040aa4b6f90c88770a8025c0554ef
|
Provenance
The following attestation bundles were made for datasight-0.6.0-py3-none-any.whl:
Publisher:
release.yml on dsgrid/datasight
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datasight-0.6.0-py3-none-any.whl -
Subject digest:
94d357628cbb746b89f81d687615bee27b5da45ba3cda1d9e466c307cfbce641 - Sigstore transparency entry: 1372553246
- Sigstore integration time:
-
Permalink:
dsgrid/datasight@456944b61a50aa5aaafa010c88a45871b5b24f9b -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/dsgrid
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@456944b61a50aa5aaafa010c88a45871b5b24f9b -
Trigger Event:
push
-
Statement type: