Skip to main content

A Python tool for compressing/decompressing SQL DDL fewer tokens for LLMs.

Project description

sqlcodec

sqlcodec is a Python utility designed to compress SQL DDL into a tokenized format optimized for Large Language Models (LLMs). It reduces the token count of schema definitions, allowing more context to be fit into the LLM's window while maintaining semantic clarity.

Key Features

  • Regex Mapping: Replaces common SQL keywords with short tokens (e.g., CREATE TABLE -> ~cr ~t).
  • Dialect Support: Specific mappings for SQL Server and Postgres.
  • Comment Wrapping: Preserves comments in a minified-safe format (~cml ... ~endcml).
  • Whitespace Minification: Collapses unnecessary spaces and newlines while preserving statement separators.
  • LLM Integration: Includes a helper to generate system prompts for LLMs to interpret the compressed SQL.

Benefits

  • Token Efficiency: Can reduce DDL size by 40-60%.
  • Context Preservation: Useful for RAG systems or LLM agents that need to "see" a large database schema.

sqlcodec Usage Instructions

1. Compressing and Decompressing

A. Compress a String Statement

from sqlcodec import compress

sql_string = "CREATE TABLE Users (ID INT PRIMARY KEY);"
compressed = compress(sql_string)
print(compressed)

B. Compress a SQL File

from sqlcodec import compress

with open("input.sql", "r") as f:
    sql_data = f.read()

compressed = compress(sql_data)

with open("compressed.txt", "w") as f:
    f.write(compressed)

C. Decompress a String Statement

from sqlcodec import decompress

compressed_str = "~cr ~t Users (ID INT ~pk);"
original = decompress(compressed_str, dialect="sqlserver")
print(original)

D. Decompress a Compressed File

from sqlcodec import decompress

with open("compressed.txt", "r") as f:
    compressed_data = f.read()

# Tip: detect_dialect can help if you're unsure
original = decompress(compressed_data, dialect="sqlserver")

with open("reconstructed.sql", "w") as f:
    f.write(original)

E. Get a System Prompt for an LLM

This generates the instructions the LLM needs to understand your compressed SQL.

from sqlcodec import get_system_prompt

# 1. For SQL Server
ss_prompt = get_system_prompt(dialect="sqlserver")
print(ss_prompt)

# 2. For Postgres
pg_prompt = get_system_prompt(dialect="postgres")
print(pg_prompt)

# 3. For Standard SQL (Generic)
std_prompt = get_system_prompt()
print(std_prompt)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlcodec-0.1.3.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqlcodec-0.1.3-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file sqlcodec-0.1.3.tar.gz.

File metadata

  • Download URL: sqlcodec-0.1.3.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for sqlcodec-0.1.3.tar.gz
Algorithm Hash digest
SHA256 b3ec390a6964babb0f5b26e13814866ed993b54e13910b18878146585e5718ec
MD5 8db834d9ca2e012db3f483d3206b8be4
BLAKE2b-256 fcda05f70ccd47ad659b4881e30adbc211c91f28ce202fc5042c7b2cfee4173e

See more details on using hashes here.

File details

Details for the file sqlcodec-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: sqlcodec-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for sqlcodec-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 21857c2791a1f66848dbbc68e0e3fa011f8f60bf5ba71e615249ca5ffb5184a1
MD5 1562555417d32dd2cdf47fef286f9b53
BLAKE2b-256 541b2e1b7ac57dafd87b8416c8179a85122adda002ee7b7739552088a9c87600

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page