Skip to main content

A Shield for your LLM generated SQL Queries. It provides an application level control for securing the database from SQL generated by LLM.

Project description

LLM SQL Shield

A shield for your database to neutralize the SQL queries generated by Large Language Models (LLM).

Background

Large Language Models (LLM) have proven to be extraordinarily effective in generating SQL queries, revolutionizing SQL generation using large language models.

However, there is a significant challenge: a threat to the database. The queries generated by LLMs can access data that they aren't supposed to. No matter how constrained the prompt is, it is always possible to jailbreak. Given the nature of LLMs, they can never be controlled deterministically.

Securing the database is difficult because:

  1. Databases are usually controlled and managed by a different team. Therefore, any changes to the databases would take time.
  2. Databases are usually accessed and modified by multiple different services. Therefore, making any change to the database is difficult.
  3. Database security is designed for a handful of roles and users. Achieving security for thousands of users is difficult.
  4. Row-based security can be achieved by predefined views, but if we have to have dynamic views or parameterized views, it is not possible in databases as of now.
  5. LLMs require the table names and column names to be self-explanatory. To achieve the same in a database, you will have to clone the database and make modifications to it, but then making the data sync in these tables becomes a difficult task.

LLM SQL Shield - Features

With SQL Shield, you can:

  1. Limit the tables that you expose to LLM.
  2. Rename the tables as per your wish by setting proper pub_name.
  3. Limit the columns from each table.
  4. Rename the columns to make them more meaningful by setting proper pub_name.
  5. Limit the rows that can be accessed by providing filters.
  6. The filters can have variables that you can fill in at the time of query generation and execution.

Getting started or How does it work?

Install

It is available as the pip package. You can simply install it like this:

pip install sqlshield

Prepare the Schema

Once installed, you can import the following:

# Models contain the models for Table, Column, and Database.
from sqlshield.models import *

# The main work of handling and generating SQL is done by this
# The entry point is Session object.
from sqlshield.shield import *

The basic idea is that you create a pseudo-schema of your database keeping track of the original table name and column names. You expose this pseudo-schema to LLM to generate a query and then translate this query to internal tables.

Further, in the pseudo-schema, you create the row filters or parameterized views that basically filter rows per user.

The first step is to create the schema. Either you can create a schema by creating instances of MDatabase, MTable, and MColumn or you can simply instantiate by loading existing tables from your database using SQLAlchemy. I would recommend the second approach because it is easier and more maintainable - think of the new tables being introduced.

Let's go ahead and load the database objects using SQLAlchemy.

pip install sqlalchemy
# Using SQLAlchemy 
import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

Now, using the inspector, you can create the SQL Shield's models: MTable, MColumn, and MDatabase automatically.

mDb = MDatabase.from_inspector(inspector)

Once you have the MDatabase instance ready, you can start modifying it to ensure that only minimal information is exposed to the LLMs.

The following code removes all the tables except for four tables. This also reduces the prompt size.

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

MDatabase.keep_only_tables is a handy utility method. To suit your business needs, you directly modify the MDatabase.tables object, which is a set().

If you want to modify each table, you can access the table directly from MDatabase.tables set() object. Since it is easier to create a dictionary and then access it, we have provided get_table_dict() as shown in the below code:

# It is a handy method to get the tables as a dictionary.
# Please use the MDatabase.tables
tables = mDb.get_table_dict()

Now, say we want to modify a table with the name Customer. We can change its name to Customers. Quite often the table names in your organization could be really messy and LLMs would not be able to generate good queries on this. You can give a good pub_name to each table. pub_name is what is shown to the LLM.

customer_table = tables['Customer']
# Change Name of table
customer_table.pub_name = 'Customers'

Filters are the most powerful feature of SQL Shield. Filters let you create subspaces for users. Using filters, you can limit the rows shown to the user. Say, there is a table that contains the data for all 500 teams in your organization. You want each team to see only their data. So you would create a filter now and at runtime, you would insert the team as part of parameters.

Here in this example, we want users to access only the rows of the Customers table that belong to their company.

# Add a filter
customer_table.filters = 'where company = {company}'

You can access the columns of an MTable using the columns which is a set(). It also provides a handy method drop_columns. I will be adding more handy methods based on the user's requests.

But you can simply access all columns from the set and modify it. Each column is MColumn, it has an important field pub_name. If you want to modify the name of a column that is visible to an LLM, just change pub_name. Don't change name.

# Drop some columns
customer_table.drop_columns(set(['Address']))

That's it. Your database is ready!

Generate Schema for LLM

With this MDatabase mDb prepared in previous steps, we can generate schema to augment the prompt as follows:

schema_generated = mDb.generate_schema()

We can send this schema along with the question in the prompt to any LLMs and once LLMs generate SQL, we can re-write the SQL. Say, an LLM has generated a query aSql.

We can now, generate the safe query using the aSQL as follows:

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)

That's it. gSQL would have a query that is safe. Please note that we created a dictionary of parameters and passed it to the constructor of Session because we had parameterized filters that would limit the rows to the company = Talus.

Now, you can execute the gSQL on your actual database peacefully.

Complete Example with Open AI

Here is an example:

import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

from sqlshield.models import *
from sqlshield.shield import *
import os

# TODO: Specify correct OpenAI key
os.environ["OPENAI_API_KEY"] = 'sk-XXXXXXXXX'

# TODO: You can download this SQLite3 DB file: https://github.com/terno-ai/llm-sql-shield/raw/main/tests/chinook.db
# And save it in your current directory
# Connect to DB
engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

# Load default DB
mDb = MDatabase.from_inspector(inspector)

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

tables = mDb.get_table_dict()
customer_table = tables['Customer']

# Change Name of table
customer_table.pub_name = 'Customers'

# Add a filter
customer_table.filters = 'where company = {company}'

# Drop some colums
customer_table.drop_columns(set(['Address']))

# Column renaming

question = "Show me all customers."

from openai import OpenAI
client = OpenAI()

schema_generated = mDb.generate_schema()
print('The following schema was generated: ', schema_generated)

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are an SQL Analyst. Your role is to generate the SQL given a question. Only generate SQL nothing else."},
        {"role": "user", "content": question},
        {"role": "assistant", "content": "The tables schema is follows: " + schema_generated},
    ]
)

aSql = response.choices[0].message.content
print('SQL Generated by LLM: ', aSql)

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)
print("Native SQL: ", gSQL)

print(" ===================== ")
with engine.connect() as con:
    rs = con.execute(text(gSQL))
    for row in rs:
        print(row)

Testing

Comment out ext_modules and cmdclass from setup.py first

pip install -e .

Usage

import sqlshield
shield = sqlshield.SQLShield(...)

from sqlshield import SQLShield
shield = SQLShield(...)

Run inside tests directory

coverage run test.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sqlshield-0.0.22-cp312-cp312-win_amd64.whl (95.4 kB view details)

Uploaded CPython 3.12Windows x86-64

sqlshield-0.0.22-cp312-cp312-macosx_10_13_universal2.whl (203.2 kB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

sqlshield-0.0.22-cp311-cp311-win_amd64.whl (99.0 kB view details)

Uploaded CPython 3.11Windows x86-64

sqlshield-0.0.22-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (621.7 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

sqlshield-0.0.22-cp311-cp311-macosx_10_9_universal2.whl (203.9 kB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.22-cp310-cp310-win_amd64.whl (98.8 kB view details)

Uploaded CPython 3.10Windows x86-64

sqlshield-0.0.22-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (586.7 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

sqlshield-0.0.22-cp310-cp310-macosx_10_9_universal2.whl (204.9 kB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.22-cp39-cp39-win_amd64.whl (99.3 kB view details)

Uploaded CPython 3.9Windows x86-64

sqlshield-0.0.22-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (588.1 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

sqlshield-0.0.22-cp39-cp39-macosx_10_9_universal2.whl (206.5 kB view details)

Uploaded CPython 3.9macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.22-cp38-cp38-win_amd64.whl (100.5 kB view details)

Uploaded CPython 3.8Windows x86-64

sqlshield-0.0.22-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (624.3 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

sqlshield-0.0.22-cp38-cp38-macosx_11_0_universal2.whl (215.2 kB view details)

Uploaded CPython 3.8macOS 11.0+ universal2 (ARM64, x86-64)

File details

Details for the file sqlshield-0.0.22-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.22-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 95.4 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqlshield-0.0.22-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 780df3953a869481217d0d2b78e70ef0525b28596a24db5c8b336cb023f2e1c7
MD5 5db9e4f873df3c405328fea84aaac8c1
BLAKE2b-256 0c6eab2b66ac03629a628819939266af7441024a89c919bcda91447eb22798a0

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.22-cp312-cp312-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.22-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.22-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 b61663322866d2cccd103c2e089280d23f9ffaf991e4eadf371174f7c8465233
MD5 f6c7e9b1426339545c3456b734e4be77
BLAKE2b-256 70dd5cb9985cfb554a250431f5925a70238c74df3dfbeb93b7e5e19aa9039924

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.22-cp312-cp312-macosx_10_13_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.22-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.22-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 99.0 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqlshield-0.0.22-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 6d38bcf65a8505edcd7ef523ad57abcb45fb7d17e7ecf20db6a6da4a69756f8c
MD5 514a4051f0585c99e63b53ca1060111a
BLAKE2b-256 ae3e7c8388693c3121e2a5d887666be37d54aa8cd8abe912dd99b56b5ee58f9f

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.22-cp311-cp311-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.22-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.22-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c6452ee8d945084b2903932b5976fe7ba60699edd8d6240e1d576c7186a7daea
MD5 d53f9d90e1bb3c632cc94e8faa6f266a
BLAKE2b-256 78dde084af8767c67990a526ac324a6013a7d4aa26fe1e3918f18f941f1a3e5c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.22-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.22-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.22-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 9306bf8fa89cf0d27a91e3bd9fa934efe4d5a708b90e351ecfb93af142e892ea
MD5 0827ab4660850a3b7aa23316e71bf4b8
BLAKE2b-256 0153c24c7af19b6d19752a00741d82c679394155cd7543a6e494476a6b610bf5

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.22-cp311-cp311-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.22-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.22-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 98.8 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqlshield-0.0.22-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 1aa4bd38ceda118a6a0a93018aa4bc4b8b3bdba93f4c0fb8ffc2c155ec92eb6b
MD5 ff3cd4318a7d8fa4d3d460060090a8db
BLAKE2b-256 99020d6693f5b365bd8890199e0b4d21f4690959c698dde40f39d642554fe46a

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.22-cp310-cp310-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.22-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.22-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1dba7d3c18721a59bd83afed49695254dba33505f0d1a16d280cecaaf25446a5
MD5 7c037fb3af4f303a1cec1339f7bce76e
BLAKE2b-256 ed02825ef9bcded40ecdc505199559180dd91ace064421c45e4dbba4ba139c82

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.22-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.22-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.22-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 f54552ca681cf102a52afaa1093d75e6ccb9ba4322894248712e921ca8ebb40a
MD5 e25f4975db20044b0ae6a2181f3f7a04
BLAKE2b-256 9fc49ba39e5a30c1f1bd7c4722f661cdf1c12fb593a4d08c40e71ca583a1dc87

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.22-cp310-cp310-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.22-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.22-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 99.3 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqlshield-0.0.22-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 93c70432e59e80c8ea2161b6535b657152592691f6302df2d76cc346a1227c10
MD5 2ab1600f0469758582319e96dff34f4c
BLAKE2b-256 6530a62370fb7d61087a70336698c131061dd9958096105f5091da1ce47842fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.22-cp39-cp39-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.22-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.22-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1570737a690ad3341050d7b50c6b72049c339e5fbd1b1f9d69fb4ec7e5c5d034
MD5 d600661a5df90134b1574bbf34e82da6
BLAKE2b-256 4470bf7f0ee4b10eda48b4933a4b598f8108165a01b59ea9d22527dcbaea8815

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.22-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.22-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.22-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 d0f926fbb40b20d395ed319e37845653b2ada5a5a347ec2875c21ef1e4e333be
MD5 1957d4dfb55efdaa9913419d9871b8c1
BLAKE2b-256 f7d8b5f413ca9ef268e19bd19dfe74bbe9451190cbd0d8da5568385fdb522df1

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.22-cp39-cp39-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.22-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.22-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 100.5 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqlshield-0.0.22-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 62325dad5f3776ff0b135b2cda4ea8c19be3369768014fbcf5fde8e7c1b9006c
MD5 556f4278700fe9282e2e06fd11bffd76
BLAKE2b-256 e76eedc3f92efdfc51b507c17f2e4fcb9a48ddd674fbf0f09b88d489f871b51c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.22-cp38-cp38-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.22-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.22-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 422d70fcc3204911945b63851535c9550a73f28507445892d478e5fafbda2f7b
MD5 2bfef5e49d15e18d7b72cd6f0800467f
BLAKE2b-256 9353f6ab4dba0676157bcdc31426872ad0b77050960d2556901386591e267809

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.22-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.22-cp38-cp38-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.22-cp38-cp38-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 de52af0b719477cc2f54ae35908e96eb585bb925feaa2e9cb813374e002e3336
MD5 d91befe1fdef2fbf091c2b93c9b4f7c6
BLAKE2b-256 38439fae9d30c40451b635533e5c46c42393721478cc79e48953525f2fb7c0d0

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.22-cp38-cp38-macosx_11_0_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page