Data lake query engine for humans and AI agents — SQL, Jupyter, AI and MCP over cloud Parquet via DuckDB

These details have not been verified by PyPI

Project description

DataSpoc Lens

The data lake query engine for humans and AI agents.

Why Lens?

Data teams store Parquet in S3, GCS, or Azure but still spin up heavy warehouses just to run SQL. DataSpoc Lens mounts cloud buckets as DuckDB views and gives you an interactive shell, notebooks, AI-powered queries, and local caching -- all from a single CLI. Works from the terminal or as an MCP server for AI agents like Claude, Cursor, and Windsurf. No servers, no infrastructure, no data copying.

Installation

pip install dataspoc-lens

Cloud and feature extras:

pip install dataspoc-lens[s3]       # AWS S3
pip install dataspoc-lens[gcs]      # Google Cloud Storage
pip install dataspoc-lens[azure]    # Azure Blob Storage
pip install dataspoc-lens[jupyter]  # JupyterLab integration
pip install dataspoc-lens[ai]       # AI natural language queries
pip install dataspoc-lens[all]      # Everything

Quick Start

1. Initialize and register a bucket

dataspoc-lens init
dataspoc-lens add-bucket s3://my-data-lake

Lens discovers tables automatically -- first from Pipe's .dataspoc/manifest.json, then by scanning for *.parquet files.

2. Explore the catalog

dataspoc-lens catalog
dataspoc-lens catalog --detail orders

3. Query with SQL

dataspoc-lens query "SELECT * FROM orders LIMIT 10"
dataspoc-lens query "SELECT status, COUNT(*) FROM orders GROUP BY status"

4. Launch the interactive shell

dataspoc-lens shell

lens> SELECT customer_id, SUM(total) FROM orders GROUP BY 1 ORDER BY 2 DESC LIMIT 10;
lens> .tables
lens> .schema orders
lens> .export csv /tmp/orders.csv
lens> .quit

5. Configure AI and ask questions

Before using ask, configure an LLM provider:

Option A -- Local AI (free, no API key):

dataspoc-lens setup-ai

Option B -- Cloud provider:

# Anthropic (default)
export DATASPOC_LLM_API_KEY=sk-ant-...

# OpenAI
export DATASPOC_LLM_PROVIDER=openai
export DATASPOC_LLM_API_KEY=sk-...

Then ask questions in natural language:

dataspoc-lens ask "how many orders were placed yesterday?"
dataspoc-lens ask "top 10 customers by revenue this month"
dataspoc-lens ask --debug "average order value by month"

Lens sends your table schemas and sample data to the LLM, receives SQL, executes it, and prints the results. Use --debug to see the full prompt sent to the LLM.

6. Export results

Add --export to any query or ask command. Format is detected from the file extension:

dataspoc-lens query "SELECT * FROM orders" --export orders.csv
dataspoc-lens query "SELECT * FROM users" --export users.parquet
dataspoc-lens ask "monthly revenue" --export revenue.json

Features

Interactive Shell

SQL REPL with syntax highlighting, autocomplete, and history. Dot commands: .tables, .schema <table>, .buckets, .cache <table>, .export <format> <path>, .help, .quit.

Notebook

Launch JupyterLab or Marimo with all tables pre-mounted:

pip install dataspoc-lens[jupyter]
dataspoc-lens notebook

pip install dataspoc-lens[marimo]
dataspoc-lens notebook --marimo

SQL Transforms

Numbered .sql files in ~/.dataspoc-lens/transforms/ that run in order:

dataspoc-lens transform list
dataspoc-lens transform run

Cache

Copy tables locally for offline work and reduced egress costs:

dataspoc-lens cache orders              # Cache a table
dataspoc-lens cache --list              # Check status (fresh/stale)
dataspoc-lens cache orders --refresh    # Re-download
dataspoc-lens cache --clear             # Clear all

Freshness: compares your cache timestamp against the manifest's last_extraction.

AI Agent Integration

Lens works as an MCP server for Claude Desktop, Claude Code, Cursor, and any MCP-compatible AI agent.

pip install dataspoc-lens[mcp]
dataspoc-lens mcp                           # Start MCP server (stdio)

Add to your Claude Desktop config (claude_desktop_config.json):

{
  "mcpServers": {
    "dataspoc-lens": {
      "command": "dataspoc-lens",
      "args": ["mcp"]
    }
  }
}

Your agent can now discover tables, run SQL, ask questions in natural language, and manage cache.

Python SDK

from dataspoc_lens import LensClient

with LensClient() as client:
    tables = client.tables()
    schema = client.schema("orders")
    result = client.query("SELECT status, COUNT(*) FROM orders GROUP BY 1")
    answer = client.ask("top 10 customers by revenue")
    stale = client.cache_refresh_stale()

JSON Output

All CLI commands support --output json for machine-readable output:

dataspoc-lens catalog --output json
dataspoc-lens query "SELECT * FROM orders LIMIT 5" --output json
dataspoc-lens ask "monthly revenue" --output json

Commands

dataspoc-lens init                          # Initialize configuration
dataspoc-lens add-bucket <uri>              # Register a bucket
dataspoc-lens catalog                       # List all tables
dataspoc-lens catalog --detail <table>      # Show table schema
dataspoc-lens query "<sql>"                 # Execute SQL query
dataspoc-lens query "<sql>" --export f.csv  # Execute and export
dataspoc-lens shell                         # Interactive SQL shell
dataspoc-lens ask "<question>"              # Natural language query
dataspoc-lens ask "<question>" --debug      # Show LLM prompt
dataspoc-lens setup-ai                      # Install local AI (Ollama)
dataspoc-lens notebook                      # Launch JupyterLab
dataspoc-lens notebook --marimo             # Launch Marimo
dataspoc-lens transform list                # List transform files
dataspoc-lens transform run                 # Run all transforms
dataspoc-lens cache <table>                 # Cache a table locally
dataspoc-lens cache --list                  # List cached tables
dataspoc-lens cache --clear                 # Clear cache
dataspoc-lens mcp                           # Start MCP server for AI agents
dataspoc-lens ml activate [key]             # Activate DataSpoc ML
dataspoc-lens ml train --target col --from tbl  # Train a model
dataspoc-lens ml predict --model m --from tbl   # Generate predictions
dataspoc-lens ml models                     # List trained models
dataspoc-lens --version                     # Show version

Part of the DataSpoc Platform

Product	Role
DataSpoc Pipe	Ingestion: Singer taps to Parquet in cloud buckets
DataSpoc Lens (this)	Virtual warehouse: SQL + Jupyter + AI over your data lake
DataSpoc ML	AutoML: train and deploy models from your lake

Pipe writes. Lens reads. ML learns.

Community

GitHub Issues -- Report bugs or request features
Contributing -- PRs welcome. Run pytest tests/ -v before submitting.

License

Apache-2.0 -- free to use, modify, and distribute.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Apr 28, 2026

0.1.1

Apr 15, 2026

0.1.0

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataspoc_lens-0.2.0.tar.gz (72.3 kB view details)

Uploaded Apr 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dataspoc_lens-0.2.0-py3-none-any.whl (37.7 kB view details)

Uploaded Apr 28, 2026 Python 3

File details

Details for the file dataspoc_lens-0.2.0.tar.gz.

File metadata

Download URL: dataspoc_lens-0.2.0.tar.gz
Upload date: Apr 28, 2026
Size: 72.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for dataspoc_lens-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`1c49a5178e5522b800caa396de92f42a36ffc38179bbb9aad3b8f54d61ba71d1`
MD5	`84d4b0fb3c74ded29359eea331456c52`
BLAKE2b-256	`6c9ae1e572159b567e964b8c0ef1791ef48a640dc1c684ed75abdfe619bbd459`

See more details on using hashes here.

File details

Details for the file dataspoc_lens-0.2.0-py3-none-any.whl.

File metadata

Download URL: dataspoc_lens-0.2.0-py3-none-any.whl
Upload date: Apr 28, 2026
Size: 37.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for dataspoc_lens-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c9056f7ffb74c1cdbfbaf366173484a4d6357f8392035dff7e2419f118df9eb7`
MD5	`60ebd7ff80dc0d249cfd5bf4c3bab246`
BLAKE2b-256	`f17eb58f25d0acd5be11fa89cf393fd44bbbca6cacd70ab1d5be34592a695fa7`

See more details on using hashes here.

dataspoc-lens 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

DataSpoc Lens

Why Lens?

Installation

Quick Start

1. Initialize and register a bucket

2. Explore the catalog

3. Query with SQL

4. Launch the interactive shell

5. Configure AI and ask questions

6. Export results

Features

Interactive Shell

Notebook

SQL Transforms

Cache

AI Agent Integration

Python SDK

JSON Output

Commands

Part of the DataSpoc Platform

Community

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes