Skip to main content

A Shield for your LLM generated SQL Queries. It provides an application level control for securing the database from SQL generated by LLM.

Project description

LLM SQL Shield

A shield for your database to neutralize the SQL queries generated by Large Language Models (LLM).

Background

Large Language Models (LLM) have proven to be extraordinarily effective in generating SQL queries, revolutionizing SQL generation using large language models.

However, there is a significant challenge: a threat to the database. The queries generated by LLMs can access data that they aren't supposed to. No matter how constrained the prompt is, it is always possible to jailbreak. Given the nature of LLMs, they can never be controlled deterministically.

Securing the database is difficult because:

  1. Databases are usually controlled and managed by a different team. Therefore, any changes to the databases would take time.
  2. Databases are usually accessed and modified by multiple different services. Therefore, making any change to the database is difficult.
  3. Database security is designed for a handful of roles and users. Achieving security for thousands of users is difficult.
  4. Row-based security can be achieved by predefined views, but if we have to have dynamic views or parameterized views, it is not possible in databases as of now.
  5. LLMs require the table names and column names to be self-explanatory. To achieve the same in a database, you will have to clone the database and make modifications to it, but then making the data sync in these tables becomes a difficult task.

LLM SQL Shield - Features

With SQL Shield, you can:

  1. Limit the tables that you expose to LLM.
  2. Rename the tables as per your wish by setting proper pub_name.
  3. Limit the columns from each table.
  4. Rename the columns to make them more meaningful by setting proper pub_name.
  5. Limit the rows that can be accessed by providing filters.
  6. The filters can have variables that you can fill in at the time of query generation and execution.

Getting started or How does it work?

Install

It is available as the pip package. You can simply install it like this:

pip install sqlshield

Prepare the Schema

Once installed, you can import the following:

# Models contain the models for Table, Column, and Database.
from sqlshield.models import *

# The main work of handling and generating SQL is done by this
# The entry point is Session object.
from sqlshield.shield import *

The basic idea is that you create a pseudo-schema of your database keeping track of the original table name and column names. You expose this pseudo-schema to LLM to generate a query and then translate this query to internal tables.

Further, in the pseudo-schema, you create the row filters or parameterized views that basically filter rows per user.

The first step is to create the schema. Either you can create a schema by creating instances of MDatabase, MTable, and MColumn or you can simply instantiate by loading existing tables from your database using SQLAlchemy. I would recommend the second approach because it is easier and more maintainable - think of the new tables being introduced.

Let's go ahead and load the database objects using SQLAlchemy.

pip install sqlalchemy
# Using SQLAlchemy 
import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

Now, using the inspector, you can create the SQL Shield's models: MTable, MColumn, and MDatabase automatically.

mDb = MDatabase.from_inspector(inspector)

Once you have the MDatabase instance ready, you can start modifying it to ensure that only minimal information is exposed to the LLMs.

The following code removes all the tables except for four tables. This also reduces the prompt size.

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

MDatabase.keep_only_tables is a handy utility method. To suit your business needs, you directly modify the MDatabase.tables object, which is a set().

If you want to modify each table, you can access the table directly from MDatabase.tables set() object. Since it is easier to create a dictionary and then access it, we have provided get_table_dict() as shown in the below code:

# It is a handy method to get the tables as a dictionary.
# Please use the MDatabase.tables
tables = mDb.get_table_dict()

Now, say we want to modify a table with the name Customer. We can change its name to Customers. Quite often the table names in your organization could be really messy and LLMs would not be able to generate good queries on this. You can give a good pub_name to each table. pub_name is what is shown to the LLM.

customer_table = tables['Customer']
# Change Name of table
customer_table.pub_name = 'Customers'

Filters are the most powerful feature of SQL Shield. Filters let you create subspaces for users. Using filters, you can limit the rows shown to the user. Say, there is a table that contains the data for all 500 teams in your organization. You want each team to see only their data. So you would create a filter now and at runtime, you would insert the team as part of parameters.

Here in this example, we want users to access only the rows of the Customers table that belong to their company.

# Add a filter
customer_table.filters = 'where company = {company}'

You can access the columns of an MTable using the columns which is a set(). It also provides a handy method drop_columns. I will be adding more handy methods based on the user's requests.

But you can simply access all columns from the set and modify it. Each column is MColumn, it has an important field pub_name. If you want to modify the name of a column that is visible to an LLM, just change pub_name. Don't change name.

# Drop some columns
customer_table.drop_columns(set(['Address']))

That's it. Your database is ready!

Generate Schema for LLM

With this MDatabase mDb prepared in previous steps, we can generate schema to augment the prompt as follows:

schema_generated = mDb.generate_schema()

We can send this schema along with the question in the prompt to any LLMs and once LLMs generate SQL, we can re-write the SQL. Say, an LLM has generated a query aSql.

We can now, generate the safe query using the aSQL as follows:

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)

That's it. gSQL would have a query that is safe. Please note that we created a dictionary of parameters and passed it to the constructor of Session because we had parameterized filters that would limit the rows to the company = Talus.

Now, you can execute the gSQL on your actual database peacefully.

Complete Example with Open AI

Here is an example:

import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

from sqlshield.models import *
from sqlshield.shield import *
import os

# TODO: Specify correct OpenAI key
os.environ["OPENAI_API_KEY"] = 'sk-XXXXXXXXX'

# TODO: You can download this SQLite3 DB file: https://github.com/terno-ai/llm-sql-shield/raw/main/tests/chinook.db
# And save it in your current directory
# Connect to DB
engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

# Load default DB
mDb = MDatabase.from_inspector(inspector)

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

tables = mDb.get_table_dict()
customer_table = tables['Customer']

# Change Name of table
customer_table.pub_name = 'Customers'

# Add a filter
customer_table.filters = 'where company = {company}'

# Drop some colums
customer_table.drop_columns(set(['Address']))

# Column renaming

question = "Show me all customers."

from openai import OpenAI
client = OpenAI()

schema_generated = mDb.generate_schema()
print('The following schema was generated: ', schema_generated)

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are an SQL Analyst. Your role is to generate the SQL given a question. Only generate SQL nothing else."},
        {"role": "user", "content": question},
        {"role": "assistant", "content": "The tables schema is follows: " + schema_generated},
    ]
)

aSql = response.choices[0].message.content
print('SQL Generated by LLM: ', aSql)

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)
print("Native SQL: ", gSQL)

print(" ===================== ")
with engine.connect() as con:
    rs = con.execute(text(gSQL))
    for row in rs:
        print(row)

Testing

Comment out ext_modules and cmdclass from setup.py first

pip install -e .

Usage

import sqlshield
shield = sqlshield.SQLShield(...)

from sqlshield import SQLShield
shield = SQLShield(...)

Run inside tests directory

coverage run test.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sqlshield-0.0.14-cp312-cp312-win_amd64.whl (89.3 kB view details)

Uploaded CPython 3.12Windows x86-64

sqlshield-0.0.14-cp312-cp312-macosx_10_13_universal2.whl (181.7 kB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

sqlshield-0.0.14-cp311-cp311-win_amd64.whl (89.7 kB view details)

Uploaded CPython 3.11Windows x86-64

sqlshield-0.0.14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (569.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

sqlshield-0.0.14-cp311-cp311-macosx_10_9_universal2.whl (181.1 kB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.14-cp310-cp310-win_amd64.whl (89.1 kB view details)

Uploaded CPython 3.10Windows x86-64

sqlshield-0.0.14-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (505.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

sqlshield-0.0.14-cp310-cp310-macosx_10_9_universal2.whl (181.2 kB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.14-cp39-cp39-win_amd64.whl (89.6 kB view details)

Uploaded CPython 3.9Windows x86-64

sqlshield-0.0.14-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (504.9 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

sqlshield-0.0.14-cp39-cp39-macosx_10_9_universal2.whl (181.7 kB view details)

Uploaded CPython 3.9macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.14-cp38-cp38-win_amd64.whl (89.8 kB view details)

Uploaded CPython 3.8Windows x86-64

sqlshield-0.0.14-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (513.3 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

sqlshield-0.0.14-cp38-cp38-macosx_11_0_universal2.whl (182.8 kB view details)

Uploaded CPython 3.8macOS 11.0+ universal2 (ARM64, x86-64)

File details

Details for the file sqlshield-0.0.14-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.14-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 89.3 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.14-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 7aea22e1db73d4f02ed5f0de0916ba5fb8024493e43c128c6951fe40f4eed60a
MD5 54b209cc0309ce6fb3a776c95fa0adfa
BLAKE2b-256 13dbb6fee51318cb44a7592885d982b3c3ebe272e4066b0b238b94d0d0ca8c36

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.14-cp312-cp312-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.14-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.14-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 9593049ca1634d35030e375c1f25463a91c87cb8c6475817c526f75bbe1c165d
MD5 44510673b0fe687948f2696cc015d493
BLAKE2b-256 5226ae8f11a845c2ec4212bbfd128e39bf99b26240e87a0a074bcd6c0b1ab6d5

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.14-cp312-cp312-macosx_10_13_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.14-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.14-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 89.7 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.14-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 ea50aee191c29f24c1d01f9847c9675d9dbb07c719d37e0832df5a0157b44202
MD5 b7c4ae42147c017c32173712d6ffea0a
BLAKE2b-256 9b05576a3f07ac59269470abe6a882cea28ee78bead7f3c191d6c14ddbf118d7

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.14-cp311-cp311-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7305ca0eba9d29c154689ded01c04cbd285cbf3ec30279dc1b569d9a62526bd8
MD5 028cc068d5363fe938d1628ab5ed7cdb
BLAKE2b-256 9e8a9493e0337dd6294946d956e881c43d53c33cb113528ca09d6df512d0c288

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.14-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.14-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 d8f2c41313bde4c1884e5805daf43865bcc5351756a3e9e150d1dfb1242b1d40
MD5 0aa57ef34e7e18d44c6c17ada735e4a3
BLAKE2b-256 f56e42a83e737e7f842da7be25063de8bffea58fc88cf2c1dd72b110a5dd038a

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.14-cp311-cp311-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.14-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.14-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 89.1 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.14-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 859d9405d4a78d36fdce38f8f483cdaeb4cc4199d59eebff5df850573dfb707c
MD5 7c6c9856ec4a45ab4a03426162790b3b
BLAKE2b-256 60647a8b7ccc7e2b9666161bca5aea0335f1e503fb9431be67565a447f3faecd

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.14-cp310-cp310-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.14-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.14-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8bbe2beb171a96e3a07bf626e9e91679957cd400f6a2050e58462eb0817e402d
MD5 97846f1bfd6c9bcef2167d31a02d615f
BLAKE2b-256 62eb598fa83f98f91eee3c5ee118f011aabe6b2e7498202102e4440064abfcfb

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.14-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.14-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.14-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 d1e88ca28bffbc21734dac9acf3b5dd9c2b08fa218813b05f9fe7bd320d55c34
MD5 4f7da4c52de8c7886891adf9840f8bcf
BLAKE2b-256 96919d22c36ce25a50511a4cfea1942b737e6c775d24ac4fbabecad78a8e43e4

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.14-cp310-cp310-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.14-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.14-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 89.6 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.14-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 307c09de53acf37311b2b55979f06163363e1940411dd7eeaa00485eff82a431
MD5 59f335fe6f067ec656efc6579a5c35ce
BLAKE2b-256 1a18ea9cbe67c1c8a2da2ae9ac43ec5747623afc6c733106fe210b062ae31aaa

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.14-cp39-cp39-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.14-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.14-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 861fd06d4f004a678c04e57cd6205264f04554b4d95157b638ac99e0765e7283
MD5 c110ef548078bcaec0727e1b25f26de3
BLAKE2b-256 c7a0839c93c5ce748627d72ce9a262dedb6b1ac56b67372d624046ccd297bea7

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.14-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.14-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.14-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 0e38762f9777749a2eb1055032dea928eb056d5fcca7d84dc76d0c882b51f3aa
MD5 edcfc5ae7c1b81fead7a6fc87fbc7ec1
BLAKE2b-256 bece191f94a893883476af65a2087298d85ec7e4cc8318766b4a330b5444e362

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.14-cp39-cp39-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.14-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.14-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 89.8 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.14-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 0f9b7b8553ca17cb219acf1cc9cc4bdc9dc516c60bd6d058d2b83eba8944e5a2
MD5 2a95448d75a252824e5e006aa2a1fdf0
BLAKE2b-256 c4a9782157441c3a4117b39ad34674c406e33344c33940bbdf75d881e2665b9b

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.14-cp38-cp38-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.14-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.14-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9539e94e375f6d80d45e460e117417fd7128f8b48d0549cf9a2ff9f343770746
MD5 c9c15aab4e2e022fe3eaf560aca62033
BLAKE2b-256 a24382bdb48e02ba1a72bf3f919b82fb1a847309528d260d2bac229d43c61e41

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.14-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.14-cp38-cp38-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.14-cp38-cp38-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 f07d4168996e12cef6f22860ad980417d1423d788ed0f0a24bc4dee66618016c
MD5 12ee025f38c8c1b6b6fe855ab7b91906
BLAKE2b-256 e448dec6727f3cb35e88de7646a294fed4996bda5a639b878fed84e499cc1d0c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.14-cp38-cp38-macosx_11_0_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page