Virtual warehouse — SQL over cloud Parquet via DuckDB

These details have not been verified by PyPI

Project description

DataSpoc Lens

SQL over cloud Parquet. Query your data lake from the terminal.

Why Lens?

Data teams store Parquet in S3, GCS, or Azure but still spin up heavy warehouses just to run SQL. DataSpoc Lens mounts cloud buckets as DuckDB views and gives you an interactive shell, notebooks, AI-powered queries, and local caching -- all from a single CLI. No servers, no infrastructure, no data copying.

Highlights

Zero infrastructure -- DuckDB runs in-process, no server or daemon
Multi-cloud -- S3, GCS, and Azure Blob Storage via fsspec + httpfs
Interactive shell -- SQL REPL with syntax highlighting and autocomplete
AI Ask -- natural language to SQL using Ollama, Anthropic, or OpenAI
Local cache -- download once, query offline
Transforms -- numbered .sql files for repeatable data pipelines
Notebook ready -- launch JupyterLab or Marimo with tables pre-mounted
Export anywhere -- CSV, JSON, or Parquet with a single flag

Installation

pip install dataspoc-lens[s3]

Other install options

# Google Cloud Storage
pip install dataspoc-lens[gcs]

# Azure Blob Storage
pip install dataspoc-lens[azure]

# JupyterLab integration
pip install dataspoc-lens[jupyter]

# AI natural language queries
pip install dataspoc-lens[ai]

# Everything
pip install dataspoc-lens[all]

Quick Start

# Initialize configuration
dataspoc-lens init

# Register a cloud bucket
dataspoc-lens add-bucket s3://my-data-lake

# Run a SQL query
dataspoc-lens query "SELECT * FROM orders LIMIT 10"

# Launch the interactive shell
dataspoc-lens shell

# Ask a question in plain English
dataspoc-lens ask "how many orders were placed yesterday?"

Features Overview

Shell

An interactive SQL REPL powered by prompt_toolkit with syntax highlighting, autocomplete for table and column names, and dot commands (.tables, .schema, .export, .help).

$ dataspoc-lens shell
lens> SELECT customer_id, count(*) FROM orders GROUP BY 1 ORDER BY 2 DESC LIMIT 5;

Notebook

Launch JupyterLab or Marimo with all discovered tables pre-mounted as DuckDB views -- ready to query from the first cell.

dataspoc-lens notebook

AI Ask

Turn plain English into SQL. Lens sends your schema and a data sample to the LLM, gets back SQL, and executes it.

# Default: Ollama (local, no API key needed)
dataspoc-lens ask "top 10 customers by revenue this month"

# Or use a cloud provider
export DATASPOC_LLM_PROVIDER=anthropic
export DATASPOC_LLM_API_KEY=sk-...
dataspoc-lens ask "which products have declining sales?"

Cache

Download tables to ~/.dataspoc-lens/cache/ for offline work. Cache freshness is tracked automatically and views switch between remote and local transparently.

# Inside the shell
lens> .cache orders

Transforms

Numbered .sql files in ~/.dataspoc-lens/transforms/ that run in order, writing results to the /curated/ prefix.

transforms/
  001_clean_users.sql
  002_aggregate_orders.sql
  003_build_summary.sql

dataspoc-lens transform run

Export

Add --export to any query or ask command to save results as CSV, JSON, or Parquet.

dataspoc-lens query "SELECT * FROM orders" --export csv -o orders.csv
dataspoc-lens ask "monthly revenue" --export parquet -o revenue.parquet

Access Control

DataSpoc delegates all access control to your cloud provider's IAM. The recommended pattern is one bucket per permission level:

Bucket	Audience
`s3://company-public`	All employees
`s3://company-finance`	Finance team
`s3://company-executive`	C-level only

Each user's cloud credentials determine which buckets they can see. If a user lacks IAM permission, add-bucket fails with "Access Denied" and no data is exposed. Lens needs only read access.

Part of the DataSpoc Platform

Layer	Role
DataSpoc Pipe	Ingest data from APIs into cloud storage as Parquet
DataSpoc Lens	Query cloud Parquet with SQL, shell, notebooks, and AI
DataSpoc ML	Machine learning on your data lake (commercial)

Pipe writes. Lens reads. ML learns.

Community

Discord -- Join the conversation for questions and discussion
GitHub Issues -- Report bugs or request features
Contributing -- PRs welcome. Run pytest tests/ -v before submitting.

License

Apache-2.0

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1

Apr 15, 2026

This version

0.1.0

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataspoc_lens-0.1.0.tar.gz (65.2 kB view details)

Uploaded Apr 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dataspoc_lens-0.1.0-py3-none-any.whl (33.0 kB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file dataspoc_lens-0.1.0.tar.gz.

File metadata

Download URL: dataspoc_lens-0.1.0.tar.gz
Upload date: Apr 15, 2026
Size: 65.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for dataspoc_lens-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cf9a81cf35f315cd9124edba5e88930518548519f73d5e7f896d3da9b1114667`
MD5	`179f1231e7c5070035bd07e945aa9591`
BLAKE2b-256	`297b0ee4759feaf2cffa799989efd749fca38872744bbd306ab9042089ba36b4`

See more details on using hashes here.

File details

Details for the file dataspoc_lens-0.1.0-py3-none-any.whl.

File metadata

Download URL: dataspoc_lens-0.1.0-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 33.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for dataspoc_lens-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`974e94a96598cc23dcbb9bf7985e9ceb08e0b357ae5be5688e010757abf5175e`
MD5	`aa093b992eee7424f6cd937610e43bcb`
BLAKE2b-256	`cc3a097de001c7f39a1411473b6d32afc6c5f0aa6c0dec81a7d76e8a03d059b2`

See more details on using hashes here.

dataspoc-lens 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

DataSpoc Lens

Why Lens?

Highlights

Installation

Quick Start

Features Overview

Shell

Notebook

AI Ask

Cache

Transforms

Export

Access Control

Part of the DataSpoc Platform

Community

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes