Skip to main content

A Shield for your LLM generated SQL Queries. It provides an application level control for securing the database from SQL generated by LLM.

Project description

LLM SQL Shield

A shield for your database to neutralize the SQL queries generated by Large Language Models (LLM).

Background

Large Language Models (LLM) have proven to be extraordinarily effective in generating SQL queries, revolutionizing SQL generation using large language models.

However, there is a significant challenge: a threat to the database. The queries generated by LLMs can access data that they aren't supposed to. No matter how constrained the prompt is, it is always possible to jailbreak. Given the nature of LLMs, they can never be controlled deterministically.

Securing the database is difficult because:

  1. Databases are usually controlled and managed by a different team. Therefore, any changes to the databases would take time.
  2. Databases are usually accessed and modified by multiple different services. Therefore, making any change to the database is difficult.
  3. Database security is designed for a handful of roles and users. Achieving security for thousands of users is difficult.
  4. Row-based security can be achieved by predefined views, but if we have to have dynamic views or parameterized views, it is not possible in databases as of now.
  5. LLMs require the table names and column names to be self-explanatory. To achieve the same in a database, you will have to clone the database and make modifications to it, but then making the data sync in these tables becomes a difficult task.

LLM SQL Shield - Features

With SQL Shield, you can:

  1. Limit the tables that you expose to LLM.
  2. Rename the tables as per your wish by setting proper pub_name.
  3. Limit the columns from each table.
  4. Rename the columns to make them more meaningful by setting proper pub_name.
  5. Limit the rows that can be accessed by providing filters.
  6. The filters can have variables that you can fill in at the time of query generation and execution.

Getting started or How does it work?

Install

It is available as the pip package. You can simply install it like this:

pip install sqlshield

Prepare the Schema

Once installed, you can import the following:

# Models contain the models for Table, Column, and Database.
from sqlshield.models import *

# The main work of handling and generating SQL is done by this
# The entry point is Session object.
from sqlshield.shield import *

The basic idea is that you create a pseudo-schema of your database keeping track of the original table name and column names. You expose this pseudo-schema to LLM to generate a query and then translate this query to internal tables.

Further, in the pseudo-schema, you create the row filters or parameterized views that basically filter rows per user.

The first step is to create the schema. Either you can create a schema by creating instances of MDatabase, MTable, and MColumn or you can simply instantiate by loading existing tables from your database using SQLAlchemy. I would recommend the second approach because it is easier and more maintainable - think of the new tables being introduced.

Let's go ahead and load the database objects using SQLAlchemy.

pip install sqlalchemy
# Using SQLAlchemy 
import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

Now, using the inspector, you can create the SQL Shield's models: MTable, MColumn, and MDatabase automatically.

mDb = MDatabase.from_inspector(inspector)

Once you have the MDatabase instance ready, you can start modifying it to ensure that only minimal information is exposed to the LLMs.

The following code removes all the tables except for four tables. This also reduces the prompt size.

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

MDatabase.keep_only_tables is a handy utility method. To suit your business needs, you directly modify the MDatabase.tables object, which is a set().

If you want to modify each table, you can access the table directly from MDatabase.tables set() object. Since it is easier to create a dictionary and then access it, we have provided get_table_dict() as shown in the below code:

# It is a handy method to get the tables as a dictionary.
# Please use the MDatabase.tables
tables = mDb.get_table_dict()

Now, say we want to modify a table with the name Customer. We can change its name to Customers. Quite often the table names in your organization could be really messy and LLMs would not be able to generate good queries on this. You can give a good pub_name to each table. pub_name is what is shown to the LLM.

customer_table = tables['Customer']
# Change Name of table
customer_table.pub_name = 'Customers'

Filters are the most powerful feature of SQL Shield. Filters let you create subspaces for users. Using filters, you can limit the rows shown to the user. Say, there is a table that contains the data for all 500 teams in your organization. You want each team to see only their data. So you would create a filter now and at runtime, you would insert the team as part of parameters.

Here in this example, we want users to access only the rows of the Customers table that belong to their company.

# Add a filter
customer_table.filters = 'where company = {company}'

You can access the columns of an MTable using the columns which is a set(). It also provides a handy method drop_columns. I will be adding more handy methods based on the user's requests.

But you can simply access all columns from the set and modify it. Each column is MColumn, it has an important field pub_name. If you want to modify the name of a column that is visible to an LLM, just change pub_name. Don't change name.

# Drop some columns
customer_table.drop_columns(set(['Address']))

That's it. Your database is ready!

Generate Schema for LLM

With this MDatabase mDb prepared in previous steps, we can generate schema to augment the prompt as follows:

schema_generated = mDb.generate_schema()

We can send this schema along with the question in the prompt to any LLMs and once LLMs generate SQL, we can re-write the SQL. Say, an LLM has generated a query aSql.

We can now, generate the safe query using the aSQL as follows:

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)

That's it. gSQL would have a query that is safe. Please note that we created a dictionary of parameters and passed it to the constructor of Session because we had parameterized filters that would limit the rows to the company = Talus.

Now, you can execute the gSQL on your actual database peacefully.

Complete Example with Open AI

Here is an example:

import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

from sqlshield.models import *
from sqlshield.shield import *
import os

# TODO: Specify correct OpenAI key
os.environ["OPENAI_API_KEY"] = 'sk-XXXXXXXXX'

# TODO: You can download this SQLite3 DB file: https://github.com/terno-ai/llm-sql-shield/raw/main/tests/chinook.db
# And save it in your current directory
# Connect to DB
engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

# Load default DB
mDb = MDatabase.from_inspector(inspector)

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

tables = mDb.get_table_dict()
customer_table = tables['Customer']

# Change Name of table
customer_table.pub_name = 'Customers'

# Add a filter
customer_table.filters = 'where company = {company}'

# Drop some colums
customer_table.drop_columns(set(['Address']))

# Column renaming

question = "Show me all customers."

from openai import OpenAI
client = OpenAI()

schema_generated = mDb.generate_schema()
print('The following schema was generated: ', schema_generated)

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are an SQL Analyst. Your role is to generate the SQL given a question. Only generate SQL nothing else."},
        {"role": "user", "content": question},
        {"role": "assistant", "content": "The tables schema is follows: " + schema_generated},
    ]
)

aSql = response.choices[0].message.content
print('SQL Generated by LLM: ', aSql)

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)
print("Native SQL: ", gSQL)

print(" ===================== ")
with engine.connect() as con:
    rs = con.execute(text(gSQL))
    for row in rs:
        print(row)

Testing

Comment out ext_modules and cmdclass from setup.py first

pip install -e .

Usage

import sqlshield
shield = sqlshield.SQLShield(...)

from sqlshield import SQLShield
shield = SQLShield(...)

Run inside tests directory

coverage run test.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sqlshield-0.0.18-cp312-cp312-win_amd64.whl (88.6 kB view details)

Uploaded CPython 3.12Windows x86-64

sqlshield-0.0.18-cp312-cp312-macosx_10_13_universal2.whl (183.7 kB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

sqlshield-0.0.18-cp311-cp311-win_amd64.whl (92.1 kB view details)

Uploaded CPython 3.11Windows x86-64

sqlshield-0.0.18-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (576.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

sqlshield-0.0.18-cp311-cp311-macosx_10_9_universal2.whl (185.6 kB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.18-cp310-cp310-win_amd64.whl (92.1 kB view details)

Uploaded CPython 3.10Windows x86-64

sqlshield-0.0.18-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (541.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

sqlshield-0.0.18-cp310-cp310-macosx_10_9_universal2.whl (182.2 kB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.18-cp39-cp39-win_amd64.whl (92.4 kB view details)

Uploaded CPython 3.9Windows x86-64

sqlshield-0.0.18-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (539.1 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

sqlshield-0.0.18-cp39-cp39-macosx_10_9_universal2.whl (183.0 kB view details)

Uploaded CPython 3.9macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.18-cp38-cp38-win_amd64.whl (93.6 kB view details)

Uploaded CPython 3.8Windows x86-64

sqlshield-0.0.18-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (563.1 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

sqlshield-0.0.18-cp38-cp38-macosx_11_0_universal2.whl (190.3 kB view details)

Uploaded CPython 3.8macOS 11.0+ universal2 (ARM64, x86-64)

File details

Details for the file sqlshield-0.0.18-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.18-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 88.6 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.18-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 85a3c1ab01ed780d09009eed61ac78896ea154d4b3efbbed8dbdf4cc9c219109
MD5 26dd5c2503dd344f0d65ae527ae3135f
BLAKE2b-256 b453c3010d5517b002eeea5db6d78057a74bf9f1317208edb0799c4d36750cfa

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.18-cp312-cp312-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.18-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.18-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 62c9087737218f7da7d5256b692caddd1d4b5e6b311daec15da8455b5ac31eeb
MD5 764640d39d9c5b81f46802f30e8ded63
BLAKE2b-256 a185ef8f5f4210a981dfa6db081a86774797428f5d755ecd84e76ae24d18ffa1

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.18-cp312-cp312-macosx_10_13_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.18-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.18-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 92.1 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.18-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 3db570aefe16e37465ab3f89a91dbe6037dc019e70098cd8a38238912a97df83
MD5 34df7983e8941f8dc118034bf1dd3de1
BLAKE2b-256 efe581c3e43d39ae16f55f439bf96fb7dcba36be0536f4089eb1fb5dc80b2919

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.18-cp311-cp311-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.18-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.18-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 df3f009fe1c60b1f8b24d38da51a67b1df45e2ed9f01ca99546f971f5f6ae0db
MD5 1d2be46ad53accd037120eda93d9e654
BLAKE2b-256 888e246c2fce596ebc19318fadefe68ed2395f0c602c95ee5936d60c06d1d0c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.18-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.18-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.18-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 b48a05aa55c69407287eb006ed14b2fd65843c16794ffe6f4293a21b4ca68a3a
MD5 d1836a2de26ab0ae69b4141100fbbc7b
BLAKE2b-256 50e42c0562a8b4b90d92e6eef5c38c816afef80f5617ea3d21e60029bffb23ab

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.18-cp311-cp311-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.18-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.18-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 92.1 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.18-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 4977b64980fcb5ba9b8a6ce60a26c8fe4624ce8eb41edcccf7de972a16bc9597
MD5 7d43b573e133423b6b239748251c5fb9
BLAKE2b-256 67b62e5740ca591db37a31048c8b8eddfcfe73579e3bb373df9861209a5f6b75

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.18-cp310-cp310-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.18-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.18-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cb8233c09c6197a35b00fcb151a7af614dcf7a3ecbb656be4b530e7537dd4d0e
MD5 9c64579022552e239048a6bff1a23e06
BLAKE2b-256 426edc11a89de8771369ba56517814fb9037e87126ebcc21f53921ee55a65891

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.18-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.18-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.18-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 66ce24bcdfe7e5c29ac902143a23772d477dbe1084d5f7c8e20687265742a978
MD5 ae76509bdd5acd5fe154a7acc0f5c8f9
BLAKE2b-256 3d0f3a0270e563aeabe73fea441908f3c92ef4dd4bd34b8f7c8ae4adc35ec9ba

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.18-cp310-cp310-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.18-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.18-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 92.4 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.18-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 d844e6c982544ec41e867a927cbabe82cc3954dadef9bfeaf46accacac1ece58
MD5 5750e7f7d41638612b0b4bfb86040d85
BLAKE2b-256 6e49a95893f71160763ae54000b272ff3f4b1db30e26e29e318d6628fcd08779

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.18-cp39-cp39-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.18-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.18-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ecf51fe9f6152a93ed37b9d1d73d8a0a9884b29f3c5f5b170a77da0fb1e78b1d
MD5 0441ebab7ccc6aeec794e85df9dae0b8
BLAKE2b-256 ff3b7c3323254815c51509f0599f361fda3839fc480d595982096ad042a60023

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.18-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.18-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.18-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 5c1be7d10bfd88be5d378d8ef73cc4a6eba1032a0a2a1816b7aa2c8800054022
MD5 e0dc49ca2a2784ac9c30582acf035684
BLAKE2b-256 1127fed4899858e52de4329121e9e2dcc2dde30be4e3a75ce3f118588a5f2e08

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.18-cp39-cp39-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.18-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.18-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 93.6 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.18-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 c6192881fc48104f5e7b92728fb92c7465900ba6f8dded66b60c71453f92735a
MD5 a20c80edb1360dce948e1c6eeb3f429c
BLAKE2b-256 69b8a3be627b415d1fd341b7b32c3b2bef90b23a5c05b19b02483f9c131efe13

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.18-cp38-cp38-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.18-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.18-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 17447a734b973b00600b9138d25aa75f6868027ad9ac4b5cd6f5e899df7f0416
MD5 3fad874eb5e9533eb21ded467460fc4f
BLAKE2b-256 581b9c0108294af532ffed43fac29ca59953a374016c1a926d7900a846351fe5

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.18-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.18-cp38-cp38-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.18-cp38-cp38-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 99875231a440fccb0533dd39f16330b646a51321670049d1cc2132297a4038c1
MD5 f83705f1c789436ff546f5cfd0f47981
BLAKE2b-256 25e9981b15ec544809b2ade3ecfda77910dcf89a9d222aaec113a43d8d3bb2b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.18-cp38-cp38-macosx_11_0_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page