Skip to main content

A Shield for your LLM generated SQL Queries. It provides an application level control for securing the database from SQL generated by LLM.

Project description

LLM SQL Shield

A shield for your database to neutralize the SQL queries generated by Large Language Models (LLM).

Background

Large Language Models (LLM) have proven to be extraordinarily effective in generating SQL queries, revolutionizing SQL generation using large language models.

However, there is a significant challenge: a threat to the database. The queries generated by LLMs can access data that they aren't supposed to. No matter how constrained the prompt is, it is always possible to jailbreak. Given the nature of LLMs, they can never be controlled deterministically.

Securing the database is difficult because:

  1. Databases are usually controlled and managed by a different team. Therefore, any changes to the databases would take time.
  2. Databases are usually accessed and modified by multiple different services. Therefore, making any change to the database is difficult.
  3. Database security is designed for a handful of roles and users. Achieving security for thousands of users is difficult.
  4. Row-based security can be achieved by predefined views, but if we have to have dynamic views or parameterized views, it is not possible in databases as of now.
  5. LLMs require the table names and column names to be self-explanatory. To achieve the same in a database, you will have to clone the database and make modifications to it, but then making the data sync in these tables becomes a difficult task.

LLM SQL Shield - Features

With SQL Shield, you can:

  1. Limit the tables that you expose to LLM.
  2. Rename the tables as per your wish by setting proper pub_name.
  3. Limit the columns from each table.
  4. Rename the columns to make them more meaningful by setting proper pub_name.
  5. Limit the rows that can be accessed by providing filters.
  6. The filters can have variables that you can fill in at the time of query generation and execution.

Getting started or How does it work?

Install

It is available as the pip package. You can simply install it like this:

pip install sqlshield

Prepare the Schema

Once installed, you can import the following:

# Models contain the models for Table, Column, and Database.
from sqlshield.models import *

# The main work of handling and generating SQL is done by this
# The entry point is Session object.
from sqlshield.shield import *

The basic idea is that you create a pseudo-schema of your database keeping track of the original table name and column names. You expose this pseudo-schema to LLM to generate a query and then translate this query to internal tables.

Further, in the pseudo-schema, you create the row filters or parameterized views that basically filter rows per user.

The first step is to create the schema. Either you can create a schema by creating instances of MDatabase, MTable, and MColumn or you can simply instantiate by loading existing tables from your database using SQLAlchemy. I would recommend the second approach because it is easier and more maintainable - think of the new tables being introduced.

Let's go ahead and load the database objects using SQLAlchemy.

pip install sqlalchemy
# Using SQLAlchemy 
import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

Now, using the inspector, you can create the SQL Shield's models: MTable, MColumn, and MDatabase automatically.

mDb = MDatabase.from_inspector(inspector)

Once you have the MDatabase instance ready, you can start modifying it to ensure that only minimal information is exposed to the LLMs.

The following code removes all the tables except for four tables. This also reduces the prompt size.

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

MDatabase.keep_only_tables is a handy utility method. To suit your business needs, you directly modify the MDatabase.tables object, which is a set().

If you want to modify each table, you can access the table directly from MDatabase.tables set() object. Since it is easier to create a dictionary and then access it, we have provided get_table_dict() as shown in the below code:

# It is a handy method to get the tables as a dictionary.
# Please use the MDatabase.tables
tables = mDb.get_table_dict()

Now, say we want to modify a table with the name Customer. We can change its name to Customers. Quite often the table names in your organization could be really messy and LLMs would not be able to generate good queries on this. You can give a good pub_name to each table. pub_name is what is shown to the LLM.

customer_table = tables['Customer']
# Change Name of table
customer_table.pub_name = 'Customers'

Filters are the most powerful feature of SQL Shield. Filters let you create subspaces for users. Using filters, you can limit the rows shown to the user. Say, there is a table that contains the data for all 500 teams in your organization. You want each team to see only their data. So you would create a filter now and at runtime, you would insert the team as part of parameters.

Here in this example, we want users to access only the rows of the Customers table that belong to their company.

# Add a filter
customer_table.filters = 'where company = {company}'

You can access the columns of an MTable using the columns which is a set(). It also provides a handy method drop_columns. I will be adding more handy methods based on the user's requests.

But you can simply access all columns from the set and modify it. Each column is MColumn, it has an important field pub_name. If you want to modify the name of a column that is visible to an LLM, just change pub_name. Don't change name.

# Drop some columns
customer_table.drop_columns(set(['Address']))

That's it. Your database is ready!

Generate Schema for LLM

With this MDatabase mDb prepared in previous steps, we can generate schema to augment the prompt as follows:

schema_generated = mDb.generate_schema()

We can send this schema along with the question in the prompt to any LLMs and once LLMs generate SQL, we can re-write the SQL. Say, an LLM has generated a query aSql.

We can now, generate the safe query using the aSQL as follows:

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)

That's it. gSQL would have a query that is safe. Please note that we created a dictionary of parameters and passed it to the constructor of Session because we had parameterized filters that would limit the rows to the company = Talus.

Now, you can execute the gSQL on your actual database peacefully.

Complete Example with Open AI

Here is an example:

import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

from sqlshield.models import *
from sqlshield.shield import *
import os

# TODO: Specify correct OpenAI key
os.environ["OPENAI_API_KEY"] = 'sk-XXXXXXXXX'

# TODO: You can download this SQLite3 DB file: https://github.com/terno-ai/llm-sql-shield/raw/main/tests/chinook.db
# And save it in your current directory
# Connect to DB
engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

# Load default DB
mDb = MDatabase.from_inspector(inspector)

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

tables = mDb.get_table_dict()
customer_table = tables['Customer']

# Change Name of table
customer_table.pub_name = 'Customers'

# Add a filter
customer_table.filters = 'where company = {company}'

# Drop some colums
customer_table.drop_columns(set(['Address']))

# Column renaming

question = "Show me all customers."

from openai import OpenAI
client = OpenAI()

schema_generated = mDb.generate_schema()
print('The following schema was generated: ', schema_generated)

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are an SQL Analyst. Your role is to generate the SQL given a question. Only generate SQL nothing else."},
        {"role": "user", "content": question},
        {"role": "assistant", "content": "The tables schema is follows: " + schema_generated},
    ]
)

aSql = response.choices[0].message.content
print('SQL Generated by LLM: ', aSql)

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)
print("Native SQL: ", gSQL)

print(" ===================== ")
with engine.connect() as con:
    rs = con.execute(text(gSQL))
    for row in rs:
        print(row)

Testing

Comment out ext_modules and cmdclass from setup.py first

pip install -e .

Usage

import sqlshield
shield = sqlshield.SQLShield(...)

from sqlshield import SQLShield
shield = SQLShield(...)

Run inside tests directory

coverage run test.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sqlshield-0.0.23-cp312-cp312-win_amd64.whl (95.8 kB view details)

Uploaded CPython 3.12Windows x86-64

sqlshield-0.0.23-cp312-cp312-macosx_10_13_universal2.whl (203.8 kB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

sqlshield-0.0.23-cp311-cp311-win_amd64.whl (99.4 kB view details)

Uploaded CPython 3.11Windows x86-64

sqlshield-0.0.23-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (624.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

sqlshield-0.0.23-cp311-cp311-macosx_10_9_universal2.whl (204.7 kB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.23-cp310-cp310-win_amd64.whl (99.1 kB view details)

Uploaded CPython 3.10Windows x86-64

sqlshield-0.0.23-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (590.3 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

sqlshield-0.0.23-cp310-cp310-macosx_10_9_universal2.whl (205.5 kB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.23-cp39-cp39-win_amd64.whl (99.6 kB view details)

Uploaded CPython 3.9Windows x86-64

sqlshield-0.0.23-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (591.5 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

sqlshield-0.0.23-cp39-cp39-macosx_10_9_universal2.whl (207.1 kB view details)

Uploaded CPython 3.9macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.23-cp38-cp38-win_amd64.whl (100.8 kB view details)

Uploaded CPython 3.8Windows x86-64

sqlshield-0.0.23-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (627.5 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

sqlshield-0.0.23-cp38-cp38-macosx_11_0_universal2.whl (215.9 kB view details)

Uploaded CPython 3.8macOS 11.0+ universal2 (ARM64, x86-64)

File details

Details for the file sqlshield-0.0.23-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.23-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 95.8 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqlshield-0.0.23-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 431d7026c62308171ad53e12f133b475bd0f77dcb1d6fa9ac3ef77d512b6c8eb
MD5 f68832a2cde3bdf6466f041abde83115
BLAKE2b-256 6483909b952bb1fb6412894d98fd0f298ee42441ef7948170b0978d633f66101

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.23-cp312-cp312-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.23-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.23-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 954dd703037698b54295eff04ed8abd8bd26f83d2b7b44d5155c7994ece6bfc4
MD5 aa610cb5eafc27300f8fdd852c7a2dab
BLAKE2b-256 46957cd341fe9b383c29fcaba1bdb25eac2b0f56f506eeb5cf868856d2e18450

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.23-cp312-cp312-macosx_10_13_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.23-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.23-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 99.4 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqlshield-0.0.23-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 738b42c51b2fea655ac7d181254fcba3846de116209a3a450817b489cbc126c2
MD5 612a693aea0a636604a5bfe3cceccc52
BLAKE2b-256 d3bd82b5cba7745c00245a02a06eee793554846181c5ae3a731376e71ced45d4

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.23-cp311-cp311-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.23-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.23-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4af09a7bc6425fbc7e77f4ed92e667ae1c476b5cd4517dd914a722812fa70a44
MD5 0b2f43e141915201e9f81da83445866e
BLAKE2b-256 2c6feb873e07642f77d1f90f43cfc4eb88f4de4d50674a2d5e10e5509a3cef6c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.23-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.23-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.23-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 fd36a1756c9f3fb71857c8284f0f04511ef4970a9def4d78848b9690358cf870
MD5 f934afeee9953d157c064edc64e7135a
BLAKE2b-256 927ae51c2453c2cac84712684ef4180e3e433ad2f9a0b14535c4599288a77855

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.23-cp311-cp311-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.23-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.23-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 99.1 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqlshield-0.0.23-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 2a25491e1c3367ecb8592b95ebc65c9b922f3c2629e8efd68711e40dc71c4231
MD5 b3f79706338ccd54526e847c2ffc41d2
BLAKE2b-256 075f5342449ab3228222c05b075ce5c21ff2c02bd9ec7349fb9bb18d8281c16c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.23-cp310-cp310-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.23-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.23-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3b2c0b63ca2c78962cd141aeb6532929869eae190dc3c59ab26335900ddbf952
MD5 8ee4b99fdb7bc977ffe87e57efc2fffb
BLAKE2b-256 72d302925d11f168cd789c5dab578de0f6038606497d3f69936a5093ae6ae448

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.23-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.23-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.23-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 ce18cbb557f15fba73e28644ec80ad78b94226baae19432465df8bdf14288a3d
MD5 dd819086bd5696d9d4776d9522623eab
BLAKE2b-256 2bbd7d1e000262e76f40dbe21f644199e3ed341012489f111552e1d003f2b8af

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.23-cp310-cp310-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.23-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.23-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 99.6 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqlshield-0.0.23-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 fe5afb4304196fb243e08cf3100bab31fd5c7f48dd0218296498a5cc6201be60
MD5 86ac89bdaa11dbd6794838ac849cf2e5
BLAKE2b-256 e85be5f20af665767372b70a61d74ad873b2a76ebd649b0939257ba24a4df0a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.23-cp39-cp39-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.23-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.23-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7220f3645ddcc76137fef310635d2fbca73a56cd97fa1e03f85362c4e2a2977d
MD5 b63c396c1337593f289e2529f280d196
BLAKE2b-256 4cacb95ca0d8fa9bbe8e6246734c12e14026c9ce069216299ac020f5b0d4cb3c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.23-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.23-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.23-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 345647618dc36ee51e366004c0abfa765e3d795c54c2534f1d662ae0c7bd4d9d
MD5 0a433b9400d620feba85fcade956df7a
BLAKE2b-256 388efeee2ec69b14e573838cb2af5c026bb35f69ab23084ee5c9c28c6d4e5cc5

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.23-cp39-cp39-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.23-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.23-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 100.8 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqlshield-0.0.23-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 5de970aebf386af050eefd8ce077591aa65c91bbd7823aa4ebe5ee8adef70e7c
MD5 e31952884eda0e60a906e664aa471006
BLAKE2b-256 e0ec35701a167f856a53eb3cfcea20351342a7b57e4bd6060a0ff3c067063506

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.23-cp38-cp38-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.23-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.23-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2b56807bcf23bb665adfd30f3df056077fa3a79efc66987c350e35e3f9aca921
MD5 fdb5799d37abc90a188d5b2924a6e9ad
BLAKE2b-256 bbc08704db1a49057ec56b41b988a5bd8cca8c765af60b6c3af948c0eb0bbde7

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.23-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.23-cp38-cp38-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.23-cp38-cp38-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 91a6e2229d718314bcb7758b3f13aeb88c4b0f2700c779fd2d470abb4c625151
MD5 c7c63aa30972c66f397cd195e58696b1
BLAKE2b-256 598cc4f5d97d4b0e69aaf4d954e9170446fb97b8eb725d3bb5c3e68e55c46b63

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.23-cp38-cp38-macosx_11_0_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page