SQL to Redis command translation utility

These details have not been verified by PyPI

Project links

Project description

sql-redis

A proof-of-concept SQL-to-Redis translator that converts SQL SELECT statements into Redis FT.SEARCH and FT.AGGREGATE commands.

Status

This is an early POC demonstrating feasibility, not a production-ready library. The goal is to explore design decisions and validate the approach before committing to a full implementation.

Quick Example

from redis import Redis
from sql_redis import Translator
from sql_redis.schema import SchemaRegistry
from sql_redis.executor import Executor

client = Redis()
registry = SchemaRegistry(client)
registry.load_all()  # Loads index schemas from Redis

executor = Executor(client, registry)

# Simple query
result = executor.execute("""
    SELECT title, price
    FROM products
    WHERE category = 'electronics' AND price < 500
    ORDER BY price ASC
    LIMIT 10
""")

for row in result.rows:
    print(row["title"], row["price"])

# Vector search with params
result = executor.execute("""
    SELECT title, vector_distance(embedding, :vec) AS score
    FROM products
    LIMIT 5
""", params={"vec": vector_bytes})

Design Decisions

Why SQL instead of a pandas-like Python DSL?

We considered several interface options:

Approach	Example	Trade-offs
SQL	`SELECT * FROM products WHERE price > 100`	Universal, well-understood, tooling exists
Pandas-like	`df[df.price > 100]`	Pythonic but limited to Python, no standard
Builder pattern	`query.select("*").where(price__gt=100)`	Type-safe but verbose, learning curve

We chose SQL because:

Universality — SQL is the lingua franca of data. Developers, analysts, and tools all speak it.
No new DSL to learn — Users already know SQL. A pandas-like API requires learning our specific dialect.
Tooling compatibility — SQL strings can be generated by ORMs, query builders, or AI assistants.
Clear mapping — SQL semantics map reasonably well to RediSearch operations (SELECT→LOAD, WHERE→filter, GROUP BY→GROUPBY).

The downside is losing Python's type checking and IDE support, but for a query interface, the universality trade-off is worth it.

Why sqlglot instead of writing a custom parser?

Options considered:

Custom parser (regex, hand-rolled recursive descent)
PLY/Lark (parser generators)
sqlglot (production SQL parser)
sqlparse (tokenizer, not a full parser)

We chose sqlglot because:

Battle-tested — Used in production by companies like Tobiko (SQLMesh). Handles edge cases we'd miss.
Full AST — Provides a complete abstract syntax tree, not just tokens. We can traverse and analyze queries properly.
Dialect support — Handles SQL variations. Users can write MySQL-style or PostgreSQL-style queries.
Active maintenance — Regular releases, responsive maintainers, good documentation.

The alternative was writing a custom parser, which would be error-prone and time-consuming for a POC. sqlglot lets us focus on the translation logic rather than parsing edge cases.

Why schema-aware translation?

Redis field types determine query syntax:

Field Type	Redis Syntax	Example
TEXT	`@field:term`	`@title:laptop`
NUMERIC	`@field:[min max]`	`@price:[100 500]`
TAG	`@field:{value}`	`@category:{books}`

Without schema knowledge, we can't translate category = 'books' correctly — it could be @category:books (TEXT search) or @category:{books} (TAG exact match).

Our approach: The SchemaRegistry fetches index schemas via FT.INFO at startup. The translator uses this to generate correct syntax per field type.

This adds a Redis round-trip at initialization but ensures correct query generation.

Architecture: Why this layered design?

SQL String
    ↓
┌─────────────────┐
│   SQLParser     │  Parse SQL → ParsedQuery dataclass
└────────┬────────┘
         ↓
┌─────────────────┐
│ SchemaRegistry  │  Load field types from Redis
└────────┬────────┘
         ↓
┌─────────────────┐
│    Analyzer     │  Classify conditions by field type
└────────┬────────┘
         ↓
┌─────────────────┐
│  QueryBuilder   │  Generate RediSearch syntax per type
└────────┬────────┘
         ↓
┌─────────────────┐
│   Translator    │  Orchestrate pipeline, build command
└────────┬────────┘
         ↓
┌─────────────────┐
│    Executor     │  Execute command, parse results
└────────┬────────┘
         ↓
QueryResult(rows, count)

Why separate components?

Testability — Each layer has focused unit tests. 100% coverage is achievable because responsibilities are clear.
Single responsibility — Parser doesn't know about Redis. QueryBuilder doesn't know about SQL. Changes are localized.
Extensibility — Adding a new field type (e.g., GEO) means updating Analyzer and QueryBuilder, not rewriting everything.

Why not a single monolithic translator?

Early prototypes combined parsing and translation. This led to:

Tests that required Redis connections for simple SQL parsing tests
Difficulty testing edge cases in isolation
Tangled code that was hard to modify

The layered approach emerged from TDD — writing tests first revealed natural boundaries.

What's Implemented

Basic SELECT with field selection
WHERE with TEXT, NUMERIC, TAG field types
Comparison operators: =, !=, <, <=, >, >=, BETWEEN, IN
Boolean operators: AND, OR
Aggregations: COUNT, SUM, AVG, MIN, MAX
GROUP BY with multiple aggregations
ORDER BY with ASC/DESC
LIMIT and OFFSET pagination
Computed fields: price * 0.9 AS discounted
Vector KNN search: vector_distance(field, :param)
Hybrid search (filters + vector)
Full-text search: LIKE 'prefix%' (prefix), fulltext(field, 'terms') function
GEO field queries with full operator support (see below)
Date functions: YEAR(), MONTH(), DAY(), DATE_FORMAT(), etc. (see below)

What's Not Implemented (Yet...)

JOINs (Redis doesn't support cross-index joins)
Subqueries
HAVING clause
DISTINCT
Index creation from SQL (CREATE INDEX)

DATE/DATETIME Handling

Redis does not have a native DATE field type. Dates are stored as NUMERIC fields with Unix timestamps.

sql-redis automatically converts ISO 8601 date literals to Unix timestamps:

-- Date literal (automatically converted to timestamp 1704067200)
SELECT * FROM events WHERE created_at > '2024-01-01'

-- Datetime literal with time
SELECT * FROM events WHERE created_at > '2024-01-01T12:00:00'

-- Date range with BETWEEN
SELECT * FROM events WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31'

-- Multiple date conditions
SELECT * FROM events WHERE created_at > '2024-01-01' AND created_at < '2024-12-31'

Supported date formats:

Date: '2024-01-01' (interpreted as midnight UTC)
Datetime: '2024-01-01T12:00:00' or '2024-01-01 12:00:00'
Datetime with timezone: '2024-01-01T12:00:00Z', '2024-01-01T12:00:00+00:00'

Note: All dates without timezone are interpreted as UTC. You can also use raw Unix timestamps if preferred:

SELECT * FROM events WHERE created_at > 1704067200

Date Functions

Extract date parts using SQL functions that map to Redis APPLY expressions:

SQL Function	Redis Function	Description
`YEAR(field)`	`year(@field)`	Extract year (e.g., 2024)
`MONTH(field)`	`monthofyear(@field)`	Extract month (0-11)
`DAY(field)`	`dayofmonth(@field)`	Extract day of month (1-31)
`HOUR(field)`	`hour(@field)`	Round to hour
`MINUTE(field)`	`minute(@field)`	Round to minute
`DAYOFWEEK(field)`	`dayofweek(@field)`	Day of week (0=Sunday)
`DAYOFYEAR(field)`	`dayofyear(@field)`	Day of year (0-365)
`DATE_FORMAT(field, fmt)`	`timefmt(@field, fmt)`	Format timestamp

Examples:

-- Extract year and month
SELECT name, YEAR(created_at) AS year, MONTH(created_at) AS month FROM events

-- Filter by year
SELECT name FROM events WHERE YEAR(created_at) = 2024

-- Group by date parts
SELECT YEAR(created_at) AS year, COUNT(*) FROM events GROUP BY year

-- Format dates
SELECT name, DATE_FORMAT(created_at, '%Y-%m-%d') AS date FROM events

Note: Redis's monthofyear() returns 0-11 (not 1-12), and dayofweek() returns 0 for Sunday.

Limitations

NOT YEAR(field) = 2024 is not supported (raises ValueError)
DATE_FORMAT() is only supported in SELECT, not in WHERE (raises ValueError)
Date functions combined with OR are not supported (raises ValueError)

GEO Field Support

GEO fields are fully implemented with standard SQL-like syntax:

Feature	Status
Coordinate order	✅ `POINT(lon, lat)` — matches Redis native format
Default unit	✅ Meters (`m`) — SQL standard
All operators	✅ `<`, `<=`, `>`, `>=`, `BETWEEN`
Distance calculation	✅ `geo_distance()` in SELECT clause
Combined filters	✅ GEO + TEXT/TAG/NUMERIC

Coordinate Order: `POINT(lon, lat)`

Use longitude first, matching Redis's native GEO format:

-- San Francisco coordinates: lon=-122.4194, lat=37.7749
SELECT name FROM stores WHERE geo_distance(location, POINT(-122.4194, 37.7749)) < 5000

Units

Unit	Code	Example
Meters	`m`	`geo_distance(location, POINT(-122.4194, 37.7749)) < 5000`
Kilometers	`km`	`geo_distance(location, POINT(-122.4194, 37.7749), 'km') < 5`
Miles	`mi`	`geo_distance(location, POINT(-122.4194, 37.7749), 'mi') < 3`
Feet	`ft`	`geo_distance(location, POINT(-122.4194, 37.7749), 'ft') < 16400`

Default is meters when no unit is specified.

Operators

All comparison operators are supported:

-- Less than (uses optimized GEOFILTER)
SELECT name FROM stores WHERE geo_distance(location, POINT(-122.4194, 37.7749)) < 5000

-- Less than or equal (uses optimized GEOFILTER)
SELECT name FROM stores WHERE geo_distance(location, POINT(-122.4194, 37.7749)) <= 5000

-- Greater than (uses FT.AGGREGATE with FILTER)
SELECT name FROM stores WHERE geo_distance(location, POINT(-122.4194, 37.7749)) > 100000

-- Greater than or equal (uses FT.AGGREGATE with FILTER)
SELECT name FROM stores WHERE geo_distance(location, POINT(-122.4194, 37.7749)) >= 100000

-- Between (uses FT.AGGREGATE with FILTER)
SELECT name FROM stores WHERE geo_distance(location, POINT(-122.4194, 37.7749), 'km') BETWEEN 10 AND 100

Distance Calculation in SELECT

Calculate distances for all results using geo_distance() in the SELECT clause:

-- Get distance to each store (returns meters)
SELECT name, geo_distance(location, POINT(-122.4194, 37.7749)) AS distance
FROM stores

-- With explicit unit
SELECT name, geo_distance(location, POINT(-122.4194, 37.7749), 'km') AS distance_km
FROM stores

Combined Filters

Combine GEO filters with other field types:

-- GEO + TAG filter
SELECT name FROM stores
WHERE category = 'retail' AND geo_distance(location, POINT(-122.4194, 37.7749)) < 5000

-- GEO + NUMERIC filter
SELECT name FROM stores
WHERE rating >= 4.0 AND geo_distance(location, POINT(-122.4194, 37.7749), 'mi') < 10

-- GEO + TEXT filter
SELECT name FROM stores
WHERE name = 'Downtown' AND geo_distance(location, POINT(-122.4194, 37.7749)) < 10000

Development

# Install dependencies
uv sync --all-extras

# Run tests (requires Docker for testcontainers)
uv run pytest

# Run with coverage
uv run pytest --cov=sql_redis --cov-report=html

Testing Philosophy

This project uses strict TDD with 100% test coverage as a hard requirement. The approach:

Write failing tests first — Define expected behavior before implementation
One test at a time — Implement just enough to pass each test
No untestable code — If we can't test it, we don't write it
Integration tests mirror raw Redis — test_sql_queries.py verifies SQL produces same results as equivalent FT.AGGREGATE commands in test_redis_queries.py

Coverage is enforced in CI. Pragmas (# pragma: no cover) are forbidden — if code can't be tested, it shouldn't exist.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.0

May 6, 2026

0.4.0

Apr 6, 2026

This version

0.3.0

Mar 16, 2026

0.2.0

Mar 2, 2026

0.1.2

Feb 6, 2026

0.1.1

Feb 3, 2026

0.1.0

Feb 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sql_redis-0.3.0.tar.gz (127.6 kB view details)

Uploaded Mar 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sql_redis-0.3.0-py3-none-any.whl (29.8 kB view details)

Uploaded Mar 16, 2026 Python 3

File details

Details for the file sql_redis-0.3.0.tar.gz.

File metadata

Download URL: sql_redis-0.3.0.tar.gz
Upload date: Mar 16, 2026
Size: 127.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.13

File hashes

Hashes for sql_redis-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`54e12e690c8a751d1379039d6d24e5b7697ea2283b4693f99fc0221928ff90d9`
MD5	`33fdc9783e11ac92d6339dc5c83dc062`
BLAKE2b-256	`757cdc77d8fda301cfd9d1937472fbe6555ddce0322f1b4ca0eb18a5d9952b22`

See more details on using hashes here.

File details

Details for the file sql_redis-0.3.0-py3-none-any.whl.

File metadata

Download URL: sql_redis-0.3.0-py3-none-any.whl
Upload date: Mar 16, 2026
Size: 29.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.13

File hashes

Hashes for sql_redis-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e0569a65d50a4ecd79a46eba0a414f625d1edbaeb2f5a2b039ff5aac697b12c6`
MD5	`4fd5c7c8089f9e1662547a7ae9a05c07`
BLAKE2b-256	`8b18fbbe5f134cbb6be1901c0bb497e0491fa91c8b3aa4cada5d5c300e575212`

See more details on using hashes here.

sql-redis 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sql-redis

Status

Quick Example

Design Decisions

Why SQL instead of a pandas-like Python DSL?

Why sqlglot instead of writing a custom parser?

Why schema-aware translation?

Architecture: Why this layered design?

What's Implemented

What's Not Implemented (Yet...)

DATE/DATETIME Handling

Date Functions

Limitations

GEO Field Support

Coordinate Order: POINT(lon, lat)

Units

Operators

Distance Calculation in SELECT

Combined Filters

Development

Testing Philosophy

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Coordinate Order: `POINT(lon, lat)`