Virtual warehouse — SQL over cloud Parquet via DuckDB
Project description
DataSpoc Lens
SQL over cloud Parquet. Query your data lake from the terminal.
Why Lens?
Data teams store Parquet in S3, GCS, or Azure but still spin up heavy warehouses just to run SQL. DataSpoc Lens mounts cloud buckets as DuckDB views and gives you an interactive shell, notebooks, AI-powered queries, and local caching -- all from a single CLI. No servers, no infrastructure, no data copying.
Installation
pip install dataspoc-lens
Cloud and feature extras:
pip install dataspoc-lens[s3] # AWS S3
pip install dataspoc-lens[gcs] # Google Cloud Storage
pip install dataspoc-lens[azure] # Azure Blob Storage
pip install dataspoc-lens[jupyter] # JupyterLab integration
pip install dataspoc-lens[ai] # AI natural language queries
pip install dataspoc-lens[all] # Everything
Quick Start
1. Initialize and register a bucket
dataspoc-lens init
dataspoc-lens add-bucket s3://my-data-lake
Lens discovers tables automatically -- first from Pipe's .dataspoc/manifest.json, then by scanning for *.parquet files.
2. Explore the catalog
dataspoc-lens catalog
dataspoc-lens catalog --detail orders
3. Query with SQL
dataspoc-lens query "SELECT * FROM orders LIMIT 10"
dataspoc-lens query "SELECT status, COUNT(*) FROM orders GROUP BY status"
4. Launch the interactive shell
dataspoc-lens shell
lens> SELECT customer_id, SUM(total) FROM orders GROUP BY 1 ORDER BY 2 DESC LIMIT 10;
lens> .tables
lens> .schema orders
lens> .export csv /tmp/orders.csv
lens> .quit
5. Configure AI and ask questions
Before using ask, configure an LLM provider:
Option A -- Local AI (free, no API key):
dataspoc-lens setup-ai
Option B -- Cloud provider:
# Anthropic (default)
export DATASPOC_LLM_API_KEY=sk-ant-...
# OpenAI
export DATASPOC_LLM_PROVIDER=openai
export DATASPOC_LLM_API_KEY=sk-...
Then ask questions in natural language:
dataspoc-lens ask "how many orders were placed yesterday?"
dataspoc-lens ask "top 10 customers by revenue this month"
dataspoc-lens ask --debug "average order value by month"
Lens sends your table schemas and sample data to the LLM, receives SQL, executes it, and prints the results. Use --debug to see the full prompt sent to the LLM.
6. Export results
Add --export to any query or ask command. Format is detected from the file extension:
dataspoc-lens query "SELECT * FROM orders" --export orders.csv
dataspoc-lens query "SELECT * FROM users" --export users.parquet
dataspoc-lens ask "monthly revenue" --export revenue.json
Features
Interactive Shell
SQL REPL with syntax highlighting, autocomplete, and history. Dot commands: .tables, .schema <table>, .buckets, .cache <table>, .export <format> <path>, .help, .quit.
Notebook
Launch JupyterLab or Marimo with all tables pre-mounted:
pip install dataspoc-lens[jupyter]
dataspoc-lens notebook
pip install dataspoc-lens[marimo]
dataspoc-lens notebook --marimo
SQL Transforms
Numbered .sql files in ~/.dataspoc-lens/transforms/ that run in order:
dataspoc-lens transform list
dataspoc-lens transform run
Cache
Copy tables locally for offline work and reduced egress costs:
dataspoc-lens cache orders # Cache a table
dataspoc-lens cache --list # Check status (fresh/stale)
dataspoc-lens cache orders --refresh # Re-download
dataspoc-lens cache --clear # Clear all
Freshness: compares your cache timestamp against the manifest's last_extraction.
Commands
dataspoc-lens init # Initialize configuration
dataspoc-lens add-bucket <uri> # Register a bucket
dataspoc-lens catalog # List all tables
dataspoc-lens catalog --detail <table> # Show table schema
dataspoc-lens query "<sql>" # Execute SQL query
dataspoc-lens query "<sql>" --export f.csv # Execute and export
dataspoc-lens shell # Interactive SQL shell
dataspoc-lens ask "<question>" # Natural language query
dataspoc-lens ask "<question>" --debug # Show LLM prompt
dataspoc-lens setup-ai # Install local AI (Ollama)
dataspoc-lens notebook # Launch JupyterLab
dataspoc-lens notebook --marimo # Launch Marimo
dataspoc-lens transform list # List transform files
dataspoc-lens transform run # Run all transforms
dataspoc-lens cache <table> # Cache a table locally
dataspoc-lens cache --list # List cached tables
dataspoc-lens cache --clear # Clear cache
dataspoc-lens ml activate [key] # Activate DataSpoc ML
dataspoc-lens ml train --target col --from tbl # Train a model
dataspoc-lens ml predict --model m --from tbl # Generate predictions
dataspoc-lens ml models # List trained models
dataspoc-lens --version # Show version
Part of the DataSpoc Platform
| Product | Role |
|---|---|
| DataSpoc Pipe | Ingestion: Singer taps to Parquet in cloud buckets |
| DataSpoc Lens (this) | Virtual warehouse: SQL + Jupyter + AI over your data lake |
| DataSpoc ML | AutoML: train and deploy models from your lake |
Pipe writes. Lens reads. ML learns.
Community
- GitHub Issues -- Report bugs or request features
- Contributing -- PRs welcome. Run
pytest tests/ -vbefore submitting.
License
Apache-2.0 -- free to use, modify, and distribute.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dataspoc_lens-0.1.1.tar.gz.
File metadata
- Download URL: dataspoc_lens-0.1.1.tar.gz
- Upload date:
- Size: 64.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d4003ebfde51268c867b8d3b876cb2f37cd767d5df53d7b32e4e5202b97ecb0
|
|
| MD5 |
2615b1f1abc7b1d5d79f2e6d8bf8f74b
|
|
| BLAKE2b-256 |
38fdc51a4122ba72d4aa250e0843904dae61242d9661176eebcd1646592f7f43
|
File details
Details for the file dataspoc_lens-0.1.1-py3-none-any.whl.
File metadata
- Download URL: dataspoc_lens-0.1.1-py3-none-any.whl
- Upload date:
- Size: 32.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a30a1d687e7880a9754ed28b1e1b8b7c67e01bf95750e79f20c30cbf9f4f259b
|
|
| MD5 |
1b71c1de3b4479839f59351c6aca9283
|
|
| BLAKE2b-256 |
19a09df58497301e6c2f6a57b2a1770493a584171a110c7473789f0bffdef4ea
|