Virtual warehouse — SQL over cloud Parquet via DuckDB
Project description
DataSpoc Lens
SQL over cloud Parquet. Query your data lake from the terminal.
Why Lens?
Data teams store Parquet in S3, GCS, or Azure but still spin up heavy warehouses just to run SQL. DataSpoc Lens mounts cloud buckets as DuckDB views and gives you an interactive shell, notebooks, AI-powered queries, and local caching -- all from a single CLI. No servers, no infrastructure, no data copying.
Highlights
- Zero infrastructure -- DuckDB runs in-process, no server or daemon
- Multi-cloud -- S3, GCS, and Azure Blob Storage via fsspec + httpfs
- Interactive shell -- SQL REPL with syntax highlighting and autocomplete
- AI Ask -- natural language to SQL using Ollama, Anthropic, or OpenAI
- Local cache -- download once, query offline
- Transforms -- numbered
.sqlfiles for repeatable data pipelines - Notebook ready -- launch JupyterLab or Marimo with tables pre-mounted
- Export anywhere -- CSV, JSON, or Parquet with a single flag
Installation
pip install dataspoc-lens[s3]
Other install options
# Google Cloud Storage
pip install dataspoc-lens[gcs]
# Azure Blob Storage
pip install dataspoc-lens[azure]
# JupyterLab integration
pip install dataspoc-lens[jupyter]
# AI natural language queries
pip install dataspoc-lens[ai]
# Everything
pip install dataspoc-lens[all]
Quick Start
# Initialize configuration
dataspoc-lens init
# Register a cloud bucket
dataspoc-lens add-bucket s3://my-data-lake
# Run a SQL query
dataspoc-lens query "SELECT * FROM orders LIMIT 10"
# Launch the interactive shell
dataspoc-lens shell
# Ask a question in plain English
dataspoc-lens ask "how many orders were placed yesterday?"
Features Overview
Shell
An interactive SQL REPL powered by prompt_toolkit with syntax highlighting, autocomplete for table and column names, and dot commands (.tables, .schema, .export, .help).
$ dataspoc-lens shell
lens> SELECT customer_id, count(*) FROM orders GROUP BY 1 ORDER BY 2 DESC LIMIT 5;
Notebook
Launch JupyterLab or Marimo with all discovered tables pre-mounted as DuckDB views -- ready to query from the first cell.
dataspoc-lens notebook
AI Ask
Turn plain English into SQL. Lens sends your schema and a data sample to the LLM, gets back SQL, and executes it.
# Default: Ollama (local, no API key needed)
dataspoc-lens ask "top 10 customers by revenue this month"
# Or use a cloud provider
export DATASPOC_LLM_PROVIDER=anthropic
export DATASPOC_LLM_API_KEY=sk-...
dataspoc-lens ask "which products have declining sales?"
Cache
Download tables to ~/.dataspoc-lens/cache/ for offline work. Cache freshness is tracked automatically and views switch between remote and local transparently.
# Inside the shell
lens> .cache orders
Transforms
Numbered .sql files in ~/.dataspoc-lens/transforms/ that run in order, writing results to the /curated/ prefix.
transforms/
001_clean_users.sql
002_aggregate_orders.sql
003_build_summary.sql
dataspoc-lens transform run
Export
Add --export to any query or ask command to save results as CSV, JSON, or Parquet.
dataspoc-lens query "SELECT * FROM orders" --export csv -o orders.csv
dataspoc-lens ask "monthly revenue" --export parquet -o revenue.parquet
Access Control
DataSpoc delegates all access control to your cloud provider's IAM. The recommended pattern is one bucket per permission level:
| Bucket | Audience |
|---|---|
s3://company-public |
All employees |
s3://company-finance |
Finance team |
s3://company-executive |
C-level only |
Each user's cloud credentials determine which buckets they can see. If a user lacks IAM permission, add-bucket fails with "Access Denied" and no data is exposed. Lens needs only read access.
Part of the DataSpoc Platform
| Layer | Role |
|---|---|
| DataSpoc Pipe | Ingest data from APIs into cloud storage as Parquet |
| DataSpoc Lens | Query cloud Parquet with SQL, shell, notebooks, and AI |
| DataSpoc ML | Machine learning on your data lake (commercial) |
Pipe writes. Lens reads. ML learns.
Community
- Discord -- Join the conversation for questions and discussion
- GitHub Issues -- Report bugs or request features
- Contributing -- PRs welcome. Run
pytest tests/ -vbefore submitting.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dataspoc_lens-0.1.0.tar.gz.
File metadata
- Download URL: dataspoc_lens-0.1.0.tar.gz
- Upload date:
- Size: 65.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf9a81cf35f315cd9124edba5e88930518548519f73d5e7f896d3da9b1114667
|
|
| MD5 |
179f1231e7c5070035bd07e945aa9591
|
|
| BLAKE2b-256 |
297b0ee4759feaf2cffa799989efd749fca38872744bbd306ab9042089ba36b4
|
File details
Details for the file dataspoc_lens-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dataspoc_lens-0.1.0-py3-none-any.whl
- Upload date:
- Size: 33.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
974e94a96598cc23dcbb9bf7985e9ceb08e0b357ae5be5688e010757abf5175e
|
|
| MD5 |
aa093b992eee7424f6cd937610e43bcb
|
|
| BLAKE2b-256 |
cc3a097de001c7f39a1411473b6d32afc6c5f0aa6c0dec81a7d76e8a03d059b2
|