Skip to main content

SQL to Redis command translation utility

Project description

sql-redis

A proof-of-concept SQL-to-Redis translator that converts SQL SELECT statements into Redis FT.SEARCH and FT.AGGREGATE commands.

Status

This is an early POC demonstrating feasibility, not a production-ready library. The goal is to explore design decisions and validate the approach before committing to a full implementation.

Quick Example

from redis import Redis
from sql_redis import Translator
from sql_redis.schema import SchemaRegistry
from sql_redis.executor import Executor

client = Redis()
registry = SchemaRegistry(client)
registry.load_all()  # Loads index schemas from Redis

executor = Executor(client, registry)

# Simple query
result = executor.execute("""
    SELECT title, price 
    FROM products 
    WHERE category = 'electronics' AND price < 500
    ORDER BY price ASC
    LIMIT 10
""")

for row in result.rows:
    print(row["title"], row["price"])

# Vector search with params
result = executor.execute("""
    SELECT title, vector_distance(embedding, :vec) AS score
    FROM products
    LIMIT 5
""", params={"vec": vector_bytes})

Design Decisions

Why SQL instead of a pandas-like Python DSL?

We considered several interface options:

Approach Example Trade-offs
SQL SELECT * FROM products WHERE price > 100 Universal, well-understood, tooling exists
Pandas-like df[df.price > 100] Pythonic but limited to Python, no standard
Builder pattern query.select("*").where(price__gt=100) Type-safe but verbose, learning curve

We chose SQL because:

  1. Universality — SQL is the lingua franca of data. Developers, analysts, and tools all speak it.
  2. No new DSL to learn — Users already know SQL. A pandas-like API requires learning our specific dialect.
  3. Tooling compatibility — SQL strings can be generated by ORMs, query builders, or AI assistants.
  4. Clear mapping — SQL semantics map reasonably well to RediSearch operations (SELECT→LOAD, WHERE→filter, GROUP BY→GROUPBY).

The downside is losing Python's type checking and IDE support, but for a query interface, the universality trade-off is worth it.

Why sqlglot instead of writing a custom parser?

Options considered:

  • Custom parser (regex, hand-rolled recursive descent)
  • PLY/Lark (parser generators)
  • sqlglot (production SQL parser)
  • sqlparse (tokenizer, not a full parser)

We chose sqlglot because:

  1. Battle-tested — Used in production by companies like Tobiko (SQLMesh). Handles edge cases we'd miss.
  2. Full AST — Provides a complete abstract syntax tree, not just tokens. We can traverse and analyze queries properly.
  3. Dialect support — Handles SQL variations. Users can write MySQL-style or PostgreSQL-style queries.
  4. Active maintenance — Regular releases, responsive maintainers, good documentation.

The alternative was writing a custom parser, which would be error-prone and time-consuming for a POC. sqlglot lets us focus on the translation logic rather than parsing edge cases.

Why schema-aware translation?

Redis field types determine query syntax:

Field Type Redis Syntax Example
TEXT @field:term @title:laptop
NUMERIC @field:[min max] @price:[100 500]
TAG @field:{value} @category:{books}

Without schema knowledge, we can't translate category = 'books' correctly — it could be @category:books (TEXT search) or @category:{books} (TAG exact match).

Our approach: The SchemaRegistry fetches index schemas via FT.INFO at startup. The translator uses this to generate correct syntax per field type.

This adds a Redis round-trip at initialization but ensures correct query generation.

Architecture: Why this layered design?

SQL String
    ↓
┌─────────────────┐
│   SQLParser     │  Parse SQL → ParsedQuery dataclass
└────────┬────────┘
         ↓
┌─────────────────┐
│ SchemaRegistry  │  Load field types from Redis
└────────┬────────┘
         ↓
┌─────────────────┐
│    Analyzer     │  Classify conditions by field type
└────────┬────────┘
         ↓
┌─────────────────┐
│  QueryBuilder   │  Generate RediSearch syntax per type
└────────┬────────┘
         ↓
┌─────────────────┐
│   Translator    │  Orchestrate pipeline, build command
└────────┬────────┘
         ↓
┌─────────────────┐
│    Executor     │  Execute command, parse results
└────────┬────────┘
         ↓
QueryResult(rows, count)

Why separate components?

  1. Testability — Each layer has focused unit tests. 100% coverage is achievable because responsibilities are clear.
  2. Single responsibility — Parser doesn't know about Redis. QueryBuilder doesn't know about SQL. Changes are localized.
  3. Extensibility — Adding a new field type (e.g., GEO) means updating Analyzer and QueryBuilder, not rewriting everything.

Why not a single monolithic translator?

Early prototypes combined parsing and translation. This led to:

  • Tests that required Redis connections for simple SQL parsing tests
  • Difficulty testing edge cases in isolation
  • Tangled code that was hard to modify

The layered approach emerged from TDD — writing tests first revealed natural boundaries.

What's Implemented

  • Basic SELECT with field selection
  • WHERE with TEXT, NUMERIC, TAG field types
  • Comparison operators: =, !=, <, <=, >, >=, BETWEEN, IN
  • Boolean operators: AND, OR
  • Aggregations: COUNT, SUM, AVG, MIN, MAX
  • GROUP BY with multiple aggregations
  • ORDER BY with ASC/DESC
  • LIMIT and OFFSET pagination
  • Computed fields: price * 0.9 AS discounted
  • Vector KNN search: vector_distance(field, :param)
  • Hybrid search (filters + vector)
  • Full-text search: LIKE 'prefix%' (prefix), fulltext(field, 'terms') function

What's Not Implemented (Yet...)

  • JOINs (Redis doesn't support cross-index joins)
  • Subqueries
  • HAVING clause
  • DISTINCT
  • GEO field queries
  • Index creation from SQL (CREATE INDEX)

Development

# Install dependencies
uv sync --all-extras

# Run tests (requires Docker for testcontainers)
uv run pytest

# Run with coverage
uv run pytest --cov=sql_redis --cov-report=html

Testing Philosophy

This project uses strict TDD with 100% test coverage as a hard requirement. The approach:

  1. Write failing tests first — Define expected behavior before implementation
  2. One test at a time — Implement just enough to pass each test
  3. No untestable code — If we can't test it, we don't write it
  4. Integration tests mirror raw Redistest_sql_queries.py verifies SQL produces same results as equivalent FT.AGGREGATE commands in test_redis_queries.py

Coverage is enforced in CI. Pragmas (# pragma: no cover) are forbidden — if code can't be tested, it shouldn't exist.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sql_redis-0.1.1.tar.gz (103.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sql_redis-0.1.1-py3-none-any.whl (18.7 kB view details)

Uploaded Python 3

File details

Details for the file sql_redis-0.1.1.tar.gz.

File metadata

  • Download URL: sql_redis-0.1.1.tar.gz
  • Upload date:
  • Size: 103.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for sql_redis-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1b763bd33e8963811a8c3d191506d5572f6584bfa5bbfa9c8af09a51f07baf02
MD5 effa5632551638d9742599b8a47d9e66
BLAKE2b-256 edef9ef69125be3b8a9906010f4bfd84d3b12fce86d9ecc9ed18443ff5fa9af6

See more details on using hashes here.

File details

Details for the file sql_redis-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: sql_redis-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 18.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for sql_redis-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8369e8c61990b0f9aa5ad1a9d4b03060f770af5a7b856b84e88e819efcacb1ed
MD5 c2db44452a495da80501796fb4d32cd9
BLAKE2b-256 04cfc9e13d253acb3c08dc9113dc3e75962ebb69584d6286b931f364dfb9225d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page