Skip to main content

Run Snowflake SQL locally, powered by DuckDB - lightweight in-memory SQL engine for development and testing

Project description

โ„๏ธ๐Ÿฆ† SnowDuck

CI/CD Pipeline Python 3.11+ License: MIT Code style: ruff

Run Snowflake SQL locally, powered by DuckDB

SnowDuck is a lightweight, in-memory SQL engine that emulates Snowflake's behavior for development and testing. Write and test Snowflake SQL locally without cloud access or costs.

Recent Improvements:

  • โœ… Zero linting errors - Professional, clean codebase
  • โœ… Modern CI/CD - PyPI Trusted Publishers with package attestations
  • โœ… 128 tests passing in <2s - Fast, comprehensive test suite
  • โœ… Production-ready for development & testing

Why SnowDuck?

  • ๐Ÿš€ Fast Development - Test SQL queries instantly without waiting for cloud connections
  • ๐Ÿ’ฐ Zero Cloud Costs - Develop and test locally without Snowflake compute charges
  • ๐Ÿงช Easy Testing - Mock Snowflake databases for unit tests and CI/CD pipelines
  • โšก Lightning Fast - Powered by DuckDB's in-memory execution engine
  • ๐Ÿ”Œ Drop-in Compatible - Uses Snowflake's connector interface - just patch and go

Features

Feature Support
DDL Operations โœ… CREATE/DROP DATABASE, SCHEMA, TABLE
String Functions โœ… UPPER, LOWER, CONCAT, LENGTH, SUBSTRING
Aggregate Functions โœ… COUNT, SUM, AVG, MIN, MAX, GROUP BY
Window Functions โœ… ROW_NUMBER, RANK, DENSE_RANK, PARTITION BY
Session Variables โœ… SET/SELECT $variable syntax
Information Schema โœ… Query metadata (databases, tables, columns)
Advanced SQL โœ… CTEs, JOINs, subqueries, CASE statements

Note: SnowDuck is experimental. Use for development/testing only, not production workloads.

Quick Start

Installation

# Using uv (recommended - 10-100x faster)
uv pip install snowduck

# Or using pip
pip install snowduck

Usage

import snowflake.connector
from snowduck import start_patch_snowflake

# Patch the Snowflake connector to use DuckDB
start_patch_snowflake()

# Use Snowflake connector as normal - it's now backed by DuckDB!
with snowflake.connector.connect() as conn:
    cursor = conn.cursor()
    
    # Create a database
    cursor.execute("CREATE DATABASE my_database")
    cursor.execute("USE DATABASE my_database")
    
    # Create a table
    cursor.execute("""
        CREATE TABLE employees (
            id INTEGER,
            name VARCHAR,
            salary INTEGER
        )
    """)
    
    # Insert data
    cursor.execute("""
        INSERT INTO employees VALUES
        (1, 'Alice', 95000),
        (2, 'Bob', 75000),
        (3, 'Carol', 105000)
    """)
    
    # Query with Snowflake SQL
    cursor.execute("""
        SELECT 
            name,
            salary,
            RANK() OVER (ORDER BY salary DESC) as rank
        FROM employees
    """)
    
    for row in cursor.fetchall():
        print(f"{row[0]}: ${row[1]:,} (Rank: {row[2]})")

Output:

Carol: $105,000 (Rank: 1)
Alice: $95,000 (Rank: 2)
Bob: $75,000 (Rank: 3)

Data Persistence Options

SnowDuck supports two storage modes:

In-Memory (Default): Fast, isolated, data lost on exit

start_patch_snowflake()  # or start_patch_snowflake(db_file=':memory:')

File-Based: Persistent, data survives restarts, perfect for testing

# Data persists to file and survives program restarts
start_patch_snowflake(db_file='test_data.duckdb')

with snowflake.connector.connect() as conn:
    cursor = conn.cursor()
    cursor.execute("CREATE TABLE employees (...)")
    # Data saved to test_data.duckdb

# Later - data still exists!
with snowflake.connector.connect() as conn:
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM employees")  # โœ… Works!

3. Fresh Start: Reset clears existing data (great for notebooks!)

# Deletes existing database file for a clean slate
start_patch_snowflake(db_file='demo.duckdb', reset=True)

Use file-based storage for:

  • ๐Ÿงช Test fixtures that persist across test runs
  • ๐Ÿ“Š Shared test data across multiple test files
  • ๐Ÿ”„ CI/CD pipelines with reproducible datasets
  • ๐Ÿš€ Development with sample data you want to keep

Data Seeding Made Easy

SnowDuck includes seed_table() for effortless test data creation:

from snowduck import seed_table
import pandas as pd

with snowflake.connector.connect() as conn:
    # From dict of lists (easiest!)
    seed_table(conn, 'customers', {
        'id': [1, 2, 3],
        'name': ['Acme Corp', 'TechStart', 'DataCo'],
        'revenue': [1000000, 50000, 800000]
    })
    
    # From pandas DataFrame
    df = pd.read_csv('test_data.csv')
    seed_table(conn, 'orders', df)
    
    # From list of dicts
    seed_table(conn, 'products', [
        {'id': 1, 'name': 'Widget', 'price': 9.99},
        {'id': 2, 'name': 'Gadget', 'price': 19.99}
    ])

Why seed_table() is awesome:

  • โœ… One line to create and populate tables
  • โœ… Automatic data type inference
  • โœ… Handles NULL values, timestamps, special characters
  • โœ… Drops existing table by default (perfect for test fixtures)
  • โœ… Works with dicts, DataFrames, or list of dicts

More Examples

Check out examples/notebook.ipynb for comprehensive examples including:

  • Database and schema management
  • Table operations and data manipulation
  • String and aggregate functions
  • Window functions and analytics
  • Session variables
  • Information schema queries
  • Complex CTEs and subqueries

Use Cases

1. Local Development

# Develop and test Snowflake SQL without cloud access
from snowduck import start_patch_snowflake
start_patch_snowflake()

# Your existing Snowflake code works unchanged!
import snowflake.connector
conn = snowflake.connector.connect()

2. Unit Testing

Recommended: Use the @mock_snowflake decorator for clean, isolated tests

import pytest
import snowflake.connector
from snowduck import mock_snowflake, seed_table

# Simple unit test - automatic setup and cleanup
@mock_snowflake
def test_query_execution():
    """Each test gets a fresh, isolated in-memory database."""
    conn = snowflake.connector.connect()
    cursor = conn.cursor()
    
    cursor.execute("CREATE TABLE users (id INT, name VARCHAR)")
    cursor.execute("INSERT INTO users VALUES (1, 'Alice')")
    cursor.execute("SELECT * FROM users")
    
    assert cursor.fetchone() == (1, 'Alice')
    # โœ… Automatic cleanup - no side effects!

# Test with seed data
@mock_snowflake
def test_with_seed_data():
    """Use seed_table() for easy test data creation."""
    conn = snowflake.connector.connect()
    
    # One line to create and populate table!
    seed_table(conn, 'customers', {
        'id': [1, 2, 3],
        'name': ['Acme Corp', 'TechStart', 'DataCo'],
        'tier': ['enterprise', 'startup', 'enterprise']
    })
    
    cursor = conn.cursor()
    cursor.execute("SELECT COUNT(*) FROM customers WHERE tier = 'enterprise'")
    assert cursor.fetchone()[0] == 2

For shared fixtures across tests, use conftest.py:

# conftest.py
import pytest
from snowduck import patch_snowflake
import snowflake.connector

@pytest.fixture
def conn():
    """Per-test isolation with automatic cleanup."""
    with patch_snowflake():
        conn = snowflake.connector.connect()
        yield conn
        # โœ… Automatic cleanup on exit

def test_my_feature(conn):
    cursor = conn.cursor()
    cursor.execute("SELECT 1")
    assert cursor.fetchone()[0] == 1

Testing Patterns:

  • โœ… Simple tests: Use @mock_snowflake decorator
  • โœ… Shared setup: Use fixture with patch_snowflake() context manager
  • โœ… Isolated tests: In-memory (default) - fresh DB per test
  • โœ… Persistent data: File-based - start_patch_snowflake(db_file='test.duckdb')

3. CI/CD Integration

# .github/workflows/test.yml
- name: Test SQL transformations
  run: |
    pip install snowduck pytest
    pytest tests/  # Uses SnowDuck instead of real Snowflake

4. REST API Server

# Install with server extras
uv pip install snowduck[server]

# Start the server
just serve
# Or: uvicorn snowduck.server:app --reload

# Server runs at http://localhost:8000
# - Execute SQL queries via REST API
# - Get results in Arrow IPC format
# - Multi-session support with session management
# - Compatible with Snowflake REST API clients

Architecture

SnowDuck works by:

  1. Patching - Intercepts Snowflake connector calls via start_patch_snowflake()
  2. Translating - Converts Snowflake SQL dialect to DuckDB-compatible SQL
  3. Executing - Runs queries in DuckDB's fast in-memory engine
  4. Emulating - Mimics Snowflake's information schema and metadata
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Your Application   โ”‚
โ”‚  (Snowflake code)   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚
           โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  SnowDuck Patch     โ”‚  โ† Intercepts connector calls
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚
           โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  SQL Translator     โ”‚  โ† Snowflake โ†’ DuckDB dialect
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚
           โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   DuckDB Engine     โ”‚  โ† Fast in-memory execution
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Development

Prerequisites: Python 3.11+, uv, just

# Clone and setup
git clone https://github.com/hupe1980/snowduck.git
cd snowduck
uv sync

# Run tests (128 tests, <2s)
just test

# Run all quality checks
just check

# See all commands
just --list

Testing Best Practices

When writing tests with SnowDuck:

  1. Use @mock_snowflake for unit tests - Clean, automatic cleanup

    from snowduck import mock_snowflake
    
    @mock_snowflake
    def test_my_feature():
        conn = snowflake.connector.connect()
        # Test code here - fresh DB, auto cleanup!
    
  2. Use seed_table() for test data - One-line table creation

    seed_table(conn, 'test_table', {'id': [1, 2], 'name': ['a', 'b']})
    
  3. Choose the right storage mode:

    • In-memory (default): Fast, isolated tests
    • File-based: Shared fixtures, persistent data
    # Isolated: Each test gets fresh DB
    @mock_snowflake  # Uses in-memory by default
    def test_isolated(): ...
    
    # Shared: Data persists across tests
    @pytest.fixture(scope="session")
    def shared_data():
        start_patch_snowflake(db_file='fixtures.duckdb')
    
  4. Avoid anti-patterns:

    • โŒ Don't mix start_patch_snowflake() with @mock_snowflake
    • โŒ Don't forget cleanup with manual start_patch_snowflake()
    • โœ… Use decorators or context managers for automatic cleanup

Project Status

  • โœ… 128 tests (100% passing, <2s execution)
  • โœ… Zero linting errors - Professional, clean codebase
  • โœ… 90%+ Snowflake compatibility - DDL, DML, functions, CTEs, window functions
  • โœ… Modern CI/CD - PyPI trusted publishers with package attestations
  • โœ… Production-ready for development & testing use cases
  • ๐Ÿšง Experimental - Not for production Snowflake replacement

What's Next:

  • Type safety improvements (mypy strict mode)
  • Performance monitoring and benchmarks
  • Enhanced error messages with helpful suggestions
  • API documentation with Sphinx

Roadmap

  • Additional Snowflake functions (JSON, ARRAY, etc.)
  • Stored procedure emulation
  • External table support
  • Enhanced security features
  • Performance benchmarks vs. Snowflake

Contributing

Contributions welcome! We'd love help with:

  • ๐Ÿ› Bug reports and fixes
  • โœจ New Snowflake function implementations
  • ๐Ÿ“š Documentation improvements
  • ๐Ÿงช Additional test coverage

Development Setup:

  • ๐Ÿ“š Documentation improvements
  • ๐Ÿงช Additional test coverage

Development Setup:

git clone https://github.com/hupe1980/snowduck.git
cd snowduck
uv sync
just test  # Run 128 tests in <2s

Code Quality:

  • Zero linting errors (ruff)
  • Comprehensive test coverage
  • Modern CI/CD with package attestations

License

MIT License - see LICENSE for details.


**Built with โ„๏ธ and ๐Ÿฆ†

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snowduck-0.0.3.tar.gz (179.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snowduck-0.0.3-py3-none-any.whl (48.4 kB view details)

Uploaded Python 3

File details

Details for the file snowduck-0.0.3.tar.gz.

File metadata

  • Download URL: snowduck-0.0.3.tar.gz
  • Upload date:
  • Size: 179.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for snowduck-0.0.3.tar.gz
Algorithm Hash digest
SHA256 3621f642f5ed94a73955d3223be46178ef1ac7f0b1c6259f52dcf4ea5dcd082f
MD5 050ab82f08e4e7153afb59a026a6effb
BLAKE2b-256 a015de532f4cece74a309766dbdda3661a3b05591a37b59123241d11ce522185

See more details on using hashes here.

Provenance

The following attestation bundles were made for snowduck-0.0.3.tar.gz:

Publisher: release.yml on hupe1980/snowduck

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file snowduck-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: snowduck-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 48.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for snowduck-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9520b46d68f998da9c5f29199c09a9d845db0c8ebf9a443cfaca8f62ac1ec4e2
MD5 670ee391599612da0e16e5ecb2731cef
BLAKE2b-256 8466308d3aad4c7cdf6d44641f01138a834e5579591140352c80398ffceb9c59

See more details on using hashes here.

Provenance

The following attestation bundles were made for snowduck-0.0.3-py3-none-any.whl:

Publisher: release.yml on hupe1980/snowduck

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page