A lightweight SQL query engine for data exploration with lazy evaluation and intelligent optimizations
Project description
SQLStream
A lightweight, pure-Python SQL query engine for CSV and Parquet files with lazy evaluation and intelligent optimizations.
📖 Full Documentation | 🚀 Quick Start | 💬 Discussions
Quick Example
# Query a CSV file
$ sqlstream query "SELECT * FROM 'data.csv' WHERE age > 25"
# Query S3 files
$ sqlstream query "SELECT * FROM 's3://my-bucket/data.parquet' WHERE date > '2024-01-01'"
# Join multiple files
$ sqlstream query "SELECT c.name, o.total FROM 'customers.csv' c JOIN 'orders.csv' o ON c.id = o.customer_id"
# Interactive shell with full TUI
$ sqlstream shell data.csv
Features
- 🚀 Pure Python - No database installation required
- 📊 Multiple Formats - CSV, Parquet files, HTTP URLs, S3 buckets
- ⚡ 10-100x Faster - Optional pandas backend for performance
- 🔗 JOIN Support - INNER, LEFT, RIGHT joins
- 📈 Aggregations - GROUP BY with COUNT, SUM, AVG, MIN, MAX
- 🔢 Type System - Automatic schema inference with type checking
- ☁️ S3 Support - Query files directly from Amazon S3
- 🎨 Beautiful Output - Rich tables, JSON, CSV formatting
- 🖥️ Interactive Shell - Full-featured TUI with modal dialogs, file browser, query plan visualization, multi-format export
- 🔍 Smart Optimizations - Column pruning, predicate pushdown, lazy evaluation
- 📦 Lightweight - Minimal dependencies, works everywhere
Installation
Basic (CSV only):
pip install sqlstream
All features (recommended):
pip install "sqlstream[all]"
See Installation Guide for more options.
Quick Start
CLI Usage
# Simple query
$ sqlstream query data.csv "SELECT name, salary FROM data WHERE salary > 80000"
# With pandas backend for performance
$ sqlstream query data.csv "SELECT * FROM data" --backend pandas
# JSON output
$ sqlstream query data.csv "SELECT * FROM data" --format json
# Interactive shell with TUI
$ sqlstream shell data.csv
Interactive Shell
$ sqlstream shell
Features:
- Modal Dialogs: Professional UI for filtering, export, file selection
- File Browser (
Ctrl+O): Browse and select files to query - Query History (
Ctrl+Up/Down): Navigate through previous queries (multiline supported) - Execution Plan (
F4): View detailed query execution steps - Smart Export (
Ctrl+X): Save results as CSV, JSON, or Parquet with custom filenames - Live Filtering (
Ctrl+F): Search across all columns - Schema Browser (
F2): View file schemas with column types - Pagination: Handle large result sets (100 rows per page)
- Column Sorting: Click headers to sort ascending/descending
- Syntax Highlighting: Monokai theme for SQL queries
Python API
from sqlstream import query
# Execute query (lazy evaluation)
results = query("data.csv").sql("SELECT * FROM data WHERE age > 25")
# Iterate over results
for row in results:
print(row)
# Or convert to list
results_list = query("data.csv").sql("SELECT * FROM data").to_list()
Documentation
Full documentation: https://subhayu99.github.io/sqlstream
Key sections:
- Quick Start Guide - Get started in 5 minutes
- SQL Reference - Supported SQL syntax
- CLI Reference - Command-line interface
- Python API - Programmatic usage
- Examples - Real-world examples
- Architecture - How it works
Development Status
Current Phase: 8 (Type System & Schema Inference)
- ✅ Phase 0-2: Core query engine with Volcano model
- ✅ Phase 3: Parquet support
- ✅ Phase 4: Aggregations & GROUP BY
- ✅ Phase 5: JOIN operations (INNER, LEFT, RIGHT)
- ✅ Phase 5.5: Pandas backend (10-100x speedup)
- ✅ Phase 6: HTTP data sources
- ✅ Phase 7: CLI with beautiful output
- ✅ Phase 7.5: Interactive mode with Textual
- ✅ Phase 7.6: Inline file path support
- ✅ Phase 8: Type system & schema inference
- 🚧 Phase 9: Error handling & user feedback
- 🚧 Phase 10: Testing & documentation
Test Coverage: 377 tests, 53% coverage
Performance
SQLStream offers two execution backends:
| Backend | Speed | Use Case |
|---|---|---|
| Python | Baseline | Learning, small files (<100K rows) |
| Pandas | 10-100x faster | Production, large files (>100K rows) |
Benchmark (1M rows):
- Python backend: 52s
- Pandas backend: 0.8s ⚡ 65x faster
Architecture
SQLStream uses the Volcano iterator model for query execution:
SQL Query → Parser → AST → Planner → Optimizer → Executor → Results
↓
(Column Pruning, Predicate Pushdown,
Lazy Evaluation)
Key concepts:
- Lazy Evaluation: Rows are processed on-demand
- Column Pruning: Only read columns that are used
- Predicate Pushdown: Apply filters early to reduce data scanned
- Two Backends: Pure Python (learning) and Pandas (performance)
See Architecture Guide for details.
Contributing
Contributions are welcome! See Contributing Guide for details.
Development setup:
# Clone repository
git clone https://github.com/subhayu99/sqlstream.git
cd sqlstream
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Format code
ruff format .
ruff check .
License
MIT License - see LICENSE for details.
Built with ❤️ by the SQLStream Team
📖 Documentation • 🐛 Issues • 💬 Discussions
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sqlstream-0.2.0.tar.gz.
File metadata
- Download URL: sqlstream-0.2.0.tar.gz
- Upload date:
- Size: 367.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7378b5639f54e693bbe171b35a22bffc974ae889e7048331fb1b69039e32c5e
|
|
| MD5 |
3992f3a812f1b66fb88a7a30b796c8bd
|
|
| BLAKE2b-256 |
643a1eb8212928379ec5a25f68eb71fcb9eeb3ded26dbbb6cca55fcf1006a7b5
|
File details
Details for the file sqlstream-0.2.0-py3-none-any.whl.
File metadata
- Download URL: sqlstream-0.2.0-py3-none-any.whl
- Upload date:
- Size: 62.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb35e8dc4fe72b4fa3113bbd6b44d73dcf64a1ee20093f7212aba84b5d8c918c
|
|
| MD5 |
ce53f0d97f84b3acbd6a8c7e130730d7
|
|
| BLAKE2b-256 |
5dc2ff5ecf3f43d15ac530222aa4784c8374dbe69553a0f4550e62b4949e7e8d
|