A Python tool for compressing/decompressing SQL DDL fewer tokens for LLMs.
Project description
sqlcodec
sqlcodec is a Python utility designed to compress SQL DDL into a tokenized format optimized for Large Language Models (LLMs). It reduces the token count of schema definitions, allowing more context to be fit into the LLM's window while maintaining semantic clarity.
Key Features
- Regex Mapping: Replaces common SQL keywords with short tokens (e.g.,
CREATE TABLE->~cr ~t). - Dialect Support: Specific mappings for SQL Server and Postgres.
- Comment Wrapping: Preserves comments in a minified-safe format (
~cml ... ~endcml). - Whitespace Minification: Collapses unnecessary spaces and newlines while preserving statement separators.
- LLM Integration: Includes a helper to generate system prompts for LLMs to interpret the compressed SQL.
Benefits
- Token Efficiency: Can reduce DDL size by 40-60%.
- Context Preservation: Useful for RAG systems or LLM agents that need to "see" a large database schema.
sqlcodec Usage Instructions
1. Compressing and Decompressing
A. Compress a String Statement
from sqlcodec import compress
sql_string = "CREATE TABLE Users (ID INT PRIMARY KEY);"
compressed = compress(sql_string)
print(compressed)
B. Compress a SQL File
from sqlcodec import compress
with open("input.sql", "r") as f:
sql_data = f.read()
compressed = compress(sql_data)
with open("compressed.txt", "w") as f:
f.write(compressed)
C. Decompress a String Statement
from sqlcodec import decompress
compressed_str = "~cr ~t Users (ID INT ~pk);"
original = decompress(compressed_str, dialect="sqlserver")
print(original)
D. Decompress a Compressed File
from sqlcodec import decompress
with open("compressed.txt", "r") as f:
compressed_data = f.read()
# Tip: detect_dialect can help if you're unsure
original = decompress(compressed_data, dialect="sqlserver")
with open("reconstructed.sql", "w") as f:
f.write(original)
E. Get a System Prompt for an LLM
This generates the instructions the LLM needs to understand your compressed SQL.
from sqlcodec import get_system_prompt
# 1. For SQL Server
ss_prompt = get_system_prompt(dialect="sqlserver")
print(ss_prompt)
# 2. For Postgres
pg_prompt = get_system_prompt(dialect="postgres")
print(pg_prompt)
# 3. For Standard SQL (Generic)
std_prompt = get_system_prompt()
print(std_prompt)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sqlcodec-0.1.3.tar.gz.
File metadata
- Download URL: sqlcodec-0.1.3.tar.gz
- Upload date:
- Size: 7.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b3ec390a6964babb0f5b26e13814866ed993b54e13910b18878146585e5718ec
|
|
| MD5 |
8db834d9ca2e012db3f483d3206b8be4
|
|
| BLAKE2b-256 |
fcda05f70ccd47ad659b4881e30adbc211c91f28ce202fc5042c7b2cfee4173e
|
File details
Details for the file sqlcodec-0.1.3-py3-none-any.whl.
File metadata
- Download URL: sqlcodec-0.1.3-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21857c2791a1f66848dbbc68e0e3fa011f8f60bf5ba71e615249ca5ffb5184a1
|
|
| MD5 |
1562555417d32dd2cdf47fef286f9b53
|
|
| BLAKE2b-256 |
541b2e1b7ac57dafd87b8416c8179a85122adda002ee7b7739552088a9c87600
|