A lightweight SQL query engine for data exploration with lazy evaluation and intelligent optimizations

These details have not been verified by PyPI

Project description

SQLStream

A lightweight, pure-Python SQL query engine for CSV and Parquet files with lazy evaluation and intelligent optimizations.

📖 Full Documentation | 🚀 Quick Start | 💬 Discussions

Quick Example

# Query a CSV file
$ sqlstream query "SELECT * FROM 'data.csv' WHERE age > 25"

# Query S3 files
$ sqlstream query "SELECT * FROM 's3://my-bucket/data.parquet' WHERE date > '2024-01-01'"

# Join multiple files
$ sqlstream query "SELECT c.name, o.total FROM 'customers.csv' c JOIN 'orders.csv' o ON c.id = o.customer_id"

# Interactive shell with full TUI
$ sqlstream shell data.csv

Features

🚀 Pure Python - No database installation required
📊 Multiple Formats - CSV, Parquet files, HTTP URLs, S3 buckets
⚡ 10-100x Faster - Optional pandas backend for performance
🔗 JOIN Support - INNER, LEFT, RIGHT joins
📈 Aggregations - GROUP BY with COUNT, SUM, AVG, MIN, MAX
🔢 Type System - Automatic schema inference with type checking
☁️ S3 Support - Query files directly from Amazon S3
🎨 Beautiful Output - Rich tables, JSON, CSV formatting
🖥️ Interactive Shell - Full-featured TUI with multiple tabs, state persistence, file browser, query plan visualization, multi-format export
🔍 Smart Optimizations - Column pruning, predicate pushdown, lazy evaluation
📦 Lightweight - Minimal dependencies, works everywhere

Installation

Basic (CSV only):

pip install sqlstream

All features (recommended):

pip install "sqlstream[all]"

See Installation Guide for more options.

Quick Start

CLI Usage

# Simple query
$ sqlstream query data.csv "SELECT name, salary FROM data WHERE salary > 80000"

# With pandas backend for performance
$ sqlstream query data.csv "SELECT * FROM data" --backend pandas

# JSON output
$ sqlstream query data.csv "SELECT * FROM data" --format json

# Interactive shell with TUI
$ sqlstream shell data.csv

Interactive Shell

$ sqlstream shell

Features:

Multiple Query Tabs (Ctrl+T/Ctrl+W): Work with multiple queries simultaneously
State Persistence: Automatically saves and restores your tabs and queries between sessions
Tabbed Sidebar (F2): Toggle between Schema browser and File explorer
File Browser (Ctrl+O): Browse and select files to query with tree structure
Query History (Ctrl+Up/Down): Navigate through previous queries (multiline supported)
Execution Plan (F4): View detailed query execution steps
Smart Export (Ctrl+X): Save results as CSV, JSON, or Parquet with custom filenames
Live Filtering (Ctrl+F): Search across all columns
Pagination: Handle large result sets (100 rows per page)
Column Sorting: Click headers to sort ascending/descending
Syntax Highlighting: Dracula theme for SQL queries
Exit & Save (Ctrl+Q or Ctrl+D): Quit with automatic state saving

Python API

from sqlstream import query

# Execute query (lazy evaluation)
results = query("data.csv").sql("SELECT * FROM data WHERE age > 25")

# Iterate over results
for row in results:
    print(row)

# Or convert to list
results_list = query("data.csv").sql("SELECT * FROM data").to_list()

Documentation

Full documentation: https://subhayu99.github.io/sqlstream

Key sections:

Quick Start Guide - Get started in 5 minutes
SQL Reference - Supported SQL syntax
CLI Reference - Command-line interface
Python API - Programmatic usage
Examples - Real-world examples
Architecture - How it works

Development Status

Current Phase: 9 (Enhanced Interactive Shell - Complete!)

✅ Phase 0-2: Core query engine with Volcano model
✅ Phase 3: Parquet support
✅ Phase 4: Aggregations & GROUP BY
✅ Phase 5: JOIN operations (INNER, LEFT, RIGHT)
✅ Phase 5.5: Pandas backend (10-100x speedup)
✅ Phase 6: HTTP data sources
✅ Phase 7: CLI with beautiful output
✅ Phase 7.5: Interactive mode with Textual
✅ Phase 7.6: Inline file path support
✅ Phase 8: Type system & schema inference
✅ Phase 9: Enhanced interactive shell (multiple tabs, state persistence, file browser, query plan)
🚧 Phase 10: Error handling & user feedback
🚧 Phase 11: Testing & documentation

Test Coverage: 377 tests, 53% coverage

Performance

SQLStream offers two execution backends:

Backend	Speed	Use Case
Python	Baseline	Learning, small files (<100K rows)
Pandas	10-100x faster	Production, large files (>100K rows)

Benchmark (1M rows):

Python backend: 52s
Pandas backend: 0.8s ⚡ 65x faster

Architecture

SQLStream uses the Volcano iterator model for query execution:

SQL Query → Parser → AST → Planner → Optimizer → Executor → Results
                                          ↓
                            (Column Pruning, Predicate Pushdown,
                             Lazy Evaluation)

Key concepts:

Lazy Evaluation: Rows are processed on-demand
Column Pruning: Only read columns that are used
Predicate Pushdown: Apply filters early to reduce data scanned
Two Backends: Pure Python (learning) and Pandas (performance)

See Architecture Guide for details.

Contributing

Contributions are welcome! See Contributing Guide for details.

Development setup:

# Clone repository
git clone https://github.com/subhayu99/sqlstream.git
cd sqlstream

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
ruff format .
ruff check .

License

MIT License - see LICENSE for details.

Built with ❤️ by the SQLStream Team

📖 Documentation • 🐛 Issues • 💬 Discussions

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.3

Dec 9, 2025

0.6.2

Dec 4, 2025

0.6.0

Dec 4, 2025

0.5.5

Dec 3, 2025

0.5.2

Dec 2, 2025

0.5.0

Dec 1, 2025

This version

0.4.0

Dec 1, 2025

0.3.0

Nov 30, 2025

0.2.5

Nov 30, 2025

0.2.1

Nov 30, 2025

0.2.0

Nov 30, 2025

0.1.0

Nov 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlstream-0.4.0.tar.gz (518.0 kB view details)

Uploaded Dec 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sqlstream-0.4.0-py3-none-any.whl (98.3 kB view details)

Uploaded Dec 1, 2025 Python 3

File details

Details for the file sqlstream-0.4.0.tar.gz.

File metadata

Download URL: sqlstream-0.4.0.tar.gz
Upload date: Dec 1, 2025
Size: 518.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for sqlstream-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`757022f954654a78f5eed7ef719445b4f6a9bf81160171f0413cdea16585631c`
MD5	`f74983c0014c9881b07169f39a4208d2`
BLAKE2b-256	`b783a0afc74275248b0ed4b7b85fa3658f934be4688b523ae5b53261bd73aa77`

See more details on using hashes here.

File details

Details for the file sqlstream-0.4.0-py3-none-any.whl.

File metadata

Download URL: sqlstream-0.4.0-py3-none-any.whl
Upload date: Dec 1, 2025
Size: 98.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for sqlstream-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`854809b969367e9b3ac7aa52780071b67a267fd0cac742c2fb4fbf473c6f78c3`
MD5	`4679fb31755d1fb019f8f994b5ff96ec`
BLAKE2b-256	`46fb337357b8fdfa14a7b5164968ff50cdd4ca6e722def8e4d881f8d63923e8b`

See more details on using hashes here.

sqlstream 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

SQLStream

Quick Example

Features

Installation

Quick Start

CLI Usage

Interactive Shell

Python API

Documentation

Development Status

Performance

Architecture

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes