Skip to main content

A Shield for your LLM generated SQL Queries. It provides an application level control for securing the database from SQL generated by LLM.

Project description

LLM SQL Shield

A shield for your database to neutralize the SQL queries generated by Large Language Models (LLM).

Background

Large Language Models (LLM) have proven to be extraordinarily effective in generating SQL queries, revolutionizing SQL generation using large language models.

However, there is a significant challenge: a threat to the database. The queries generated by LLMs can access data that they aren't supposed to. No matter how constrained the prompt is, it is always possible to jailbreak. Given the nature of LLMs, they can never be controlled deterministically.

Securing the database is difficult because:

  1. Databases are usually controlled and managed by a different team. Therefore, any changes to the databases would take time.
  2. Databases are usually accessed and modified by multiple different services. Therefore, making any change to the database is difficult.
  3. Database security is designed for a handful of roles and users. Achieving security for thousands of users is difficult.
  4. Row-based security can be achieved by predefined views, but if we have to have dynamic views or parameterized views, it is not possible in databases as of now.
  5. LLMs require the table names and column names to be self-explanatory. To achieve the same in a database, you will have to clone the database and make modifications to it, but then making the data sync in these tables becomes a difficult task.

LLM SQL Shield - Features

With SQL Shield, you can:

  1. Limit the tables that you expose to LLM.
  2. Rename the tables as per your wish by setting proper pub_name.
  3. Limit the columns from each table.
  4. Rename the columns to make them more meaningful by setting proper pub_name.
  5. Limit the rows that can be accessed by providing filters.
  6. The filters can have variables that you can fill in at the time of query generation and execution.

Getting started or How does it work?

Install

It is available as the pip package. You can simply install it like this:

pip install sqlshield

Prepare the Schema

Once installed, you can import the following:

# Models contain the models for Table, Column, and Database.
from sqlshield.models import *

# The main work of handling and generating SQL is done by this
# The entry point is Session object.
from sqlshield.shield import *

The basic idea is that you create a pseudo-schema of your database keeping track of the original table name and column names. You expose this pseudo-schema to LLM to generate a query and then translate this query to internal tables.

Further, in the pseudo-schema, you create the row filters or parameterized views that basically filter rows per user.

The first step is to create the schema. Either you can create a schema by creating instances of MDatabase, MTable, and MColumn or you can simply instantiate by loading existing tables from your database using SQLAlchemy. I would recommend the second approach because it is easier and more maintainable - think of the new tables being introduced.

Let's go ahead and load the database objects using SQLAlchemy.

pip install sqlalchemy
# Using SQLAlchemy 
import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

Now, using the inspector, you can create the SQL Shield's models: MTable, MColumn, and MDatabase automatically.

mDb = MDatabase.from_inspector(inspector)

Once you have the MDatabase instance ready, you can start modifying it to ensure that only minimal information is exposed to the LLMs.

The following code removes all the tables except for four tables. This also reduces the prompt size.

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

MDatabase.keep_only_tables is a handy utility method. To suit your business needs, you directly modify the MDatabase.tables object, which is a set().

If you want to modify each table, you can access the table directly from MDatabase.tables set() object. Since it is easier to create a dictionary and then access it, we have provided get_table_dict() as shown in the below code:

# It is a handy method to get the tables as a dictionary.
# Please use the MDatabase.tables
tables = mDb.get_table_dict()

Now, say we want to modify a table with the name Customer. We can change its name to Customers. Quite often the table names in your organization could be really messy and LLMs would not be able to generate good queries on this. You can give a good pub_name to each table. pub_name is what is shown to the LLM.

customer_table = tables['Customer']
# Change Name of table
customer_table.pub_name = 'Customers'

Filters are the most powerful feature of SQL Shield. Filters let you create subspaces for users. Using filters, you can limit the rows shown to the user. Say, there is a table that contains the data for all 500 teams in your organization. You want each team to see only their data. So you would create a filter now and at runtime, you would insert the team as part of parameters.

Here in this example, we want users to access only the rows of the Customers table that belong to their company.

# Add a filter
customer_table.filters = 'where company = {company}'

You can access the columns of an MTable using the columns which is a set(). It also provides a handy method drop_columns. I will be adding more handy methods based on the user's requests.

But you can simply access all columns from the set and modify it. Each column is MColumn, it has an important field pub_name. If you want to modify the name of a column that is visible to an LLM, just change pub_name. Don't change name.

# Drop some columns
customer_table.drop_columns(set(['Address']))

That's it. Your database is ready!

Generate Schema for LLM

With this MDatabase mDb prepared in previous steps, we can generate schema to augment the prompt as follows:

schema_generated = mDb.generate_schema()

We can send this schema along with the question in the prompt to any LLMs and once LLMs generate SQL, we can re-write the SQL. Say, an LLM has generated a query aSql.

We can now, generate the safe query using the aSQL as follows:

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)

That's it. gSQL would have a query that is safe. Please note that we created a dictionary of parameters and passed it to the constructor of Session because we had parameterized filters that would limit the rows to the company = Talus.

Now, you can execute the gSQL on your actual database peacefully.

Complete Example with Open AI

Here is an example:

import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

from sqlshield.models import *
from sqlshield.shield import *
import os

# TODO: Specify correct OpenAI key
os.environ["OPENAI_API_KEY"] = 'sk-XXXXXXXXX'

# TODO: You can download this SQLite3 DB file: https://github.com/terno-ai/llm-sql-shield/raw/main/tests/chinook.db
# And save it in your current directory
# Connect to DB
engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

# Load default DB
mDb = MDatabase.from_inspector(inspector)

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

tables = mDb.get_table_dict()
customer_table = tables['Customer']

# Change Name of table
customer_table.pub_name = 'Customers'

# Add a filter
customer_table.filters = 'where company = {company}'

# Drop some colums
customer_table.drop_columns(set(['Address']))

# Column renaming

question = "Show me all customers."

from openai import OpenAI
client = OpenAI()

schema_generated = mDb.generate_schema()
print('The following schema was generated: ', schema_generated)

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are an SQL Analyst. Your role is to generate the SQL given a question. Only generate SQL nothing else."},
        {"role": "user", "content": question},
        {"role": "assistant", "content": "The tables schema is follows: " + schema_generated},
    ]
)

aSql = response.choices[0].message.content
print('SQL Generated by LLM: ', aSql)

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)
print("Native SQL: ", gSQL)

print(" ===================== ")
with engine.connect() as con:
    rs = con.execute(text(gSQL))
    for row in rs:
        print(row)

Testing

pip install -e src coverage run test.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

sqlshield-0.0.10-cp312-cp312-win_amd64.whl (84.2 kB view details)

Uploaded CPython 3.12 Windows x86-64

sqlshield-0.0.10-cp312-cp312-macosx_10_13_universal2.whl (169.4 kB view details)

Uploaded CPython 3.12 macOS 10.13+ universal2 (ARM64, x86-64)

sqlshield-0.0.10-cp311-cp311-win_amd64.whl (84.5 kB view details)

Uploaded CPython 3.11 Windows x86-64

sqlshield-0.0.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (522.7 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

sqlshield-0.0.10-cp311-cp311-macosx_10_9_universal2.whl (168.2 kB view details)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.10-cp310-cp310-win_amd64.whl (84.3 kB view details)

Uploaded CPython 3.10 Windows x86-64

sqlshield-0.0.10-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (464.0 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

sqlshield-0.0.10-cp310-cp310-macosx_10_9_universal2.whl (168.8 kB view details)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.10-cp39-cp39-win_amd64.whl (84.7 kB view details)

Uploaded CPython 3.9 Windows x86-64

sqlshield-0.0.10-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (464.0 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

sqlshield-0.0.10-cp39-cp39-macosx_10_9_universal2.whl (169.4 kB view details)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.10-cp38-cp38-win_amd64.whl (85.0 kB view details)

Uploaded CPython 3.8 Windows x86-64

sqlshield-0.0.10-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (471.9 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

sqlshield-0.0.10-cp38-cp38-macosx_11_0_universal2.whl (170.5 kB view details)

Uploaded CPython 3.8 macOS 11.0+ universal2 (ARM64, x86-64)

File details

Details for the file sqlshield-0.0.10-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.10-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 5798d70aaa627790ebf9a5d260ddf310824f3260ccbe2d6fdefe827c6109b777
MD5 7a230e0957822bb0bb8ea1df5a0bfa53
BLAKE2b-256 78c9be2a99bafcfadda46d00668c94686dbf20b9fb16cdbdf4bd477a61b2e61a

See more details on using hashes here.

File details

Details for the file sqlshield-0.0.10-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.10-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 84606745c62a85209936c1cf8c6cd9d1a612b2f06156c928c6c959c8bd9e9820
MD5 003428263b2b291869abd3a76737e270
BLAKE2b-256 b19c5923191f341e2a92aad5cf56b3ad57bbabf092c15807f8eb7abe83f36115

See more details on using hashes here.

File details

Details for the file sqlshield-0.0.10-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.10-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 b83f44f69a912c5646ae335dee852db70809a9c900b8bfe3b8ebb227806f414d
MD5 88dab500353d73087539992466d4c94a
BLAKE2b-256 6a337944d4775df25e0b4aecb299903ae27d778d1c019d2f8f443b3cdd395177

See more details on using hashes here.

File details

Details for the file sqlshield-0.0.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a7e9380be3f69866de6d32079ff9ba635d5babcfd43b9c176acce3c1fd21734d
MD5 42922255901db5be0b29741897747575
BLAKE2b-256 9f5208e7fe4dd80a8ccdaf6fa3b40331a83744a8f651f5655363d0d8cd6e8adb

See more details on using hashes here.

File details

Details for the file sqlshield-0.0.10-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.10-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 1a82853751c8dd96d7321537035ee79b4845155642759d97944df26f65b70b0e
MD5 2d76aa05c83f8b3edddbbc4ab24b70e4
BLAKE2b-256 2cb9189547eb1b68860c69fed850343e881c11ea96716bf701cc4f4291fa3c39

See more details on using hashes here.

File details

Details for the file sqlshield-0.0.10-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.10-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 840890fe2c138ea58ad5d569ea77a98840b7def32f1336c0887b77d9ecca5ca8
MD5 b4d9d7d528de5be6a5c324dcc8802b76
BLAKE2b-256 e5a10268749a8c3db4ceb773e86137f660c8ddc0f31e7aeb80cdc1efd151303f

See more details on using hashes here.

File details

Details for the file sqlshield-0.0.10-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.10-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 83436d006e30b3abcb4baa95fd3ff6c3f61753325ab0cdacfed290ef3bd17959
MD5 0f27bb4790ed1c2e75ccf3b6076766ee
BLAKE2b-256 48d8b82f688c1812467b75412e92580df07eed0b60cb55288b8c49a2da0549ba

See more details on using hashes here.

File details

Details for the file sqlshield-0.0.10-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.10-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 96a2abc8324670c8826eaa969250158b40de63ac06b44fdd10ae47488f97cdc4
MD5 6d3e629b9d7c4747688a6debc544ec51
BLAKE2b-256 8d749adab88b23e256c99ec75c0f7d222dac7e9449e67edfa6813d18ab3ddce3

See more details on using hashes here.

File details

Details for the file sqlshield-0.0.10-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.10-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 84.7 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for sqlshield-0.0.10-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 21e691aae5f4c6772e26501023c600764cfb1e95a9c3d15cb33ffb7e3969ab7d
MD5 a8bc2d4be34d8e067a4862d1db94242c
BLAKE2b-256 6c206fc60a6f0b36c1c91f6619777849e36b39dc694c9083d9b3e938448a269b

See more details on using hashes here.

File details

Details for the file sqlshield-0.0.10-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.10-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ca4f852719e4d6c5c0a35a52672ff16bd89aa951844f7789f3245beda6d05b28
MD5 de870c6a65e833b05cb54ef533ee5685
BLAKE2b-256 2db1d2f48a3803d44d10b5f2403a3587bf0cb32dc587863bf66c23de2ff294d8

See more details on using hashes here.

File details

Details for the file sqlshield-0.0.10-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.10-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 6a156201991cd0110d0e7f7799d62532df218a1ceb447655b1e61ce8e1218e9c
MD5 cd4433acec7f231f584f82358863a796
BLAKE2b-256 367ae06f54af966455f2a6925bdbbfed258a966a5820fd2b24c5d82a808cc9d9

See more details on using hashes here.

File details

Details for the file sqlshield-0.0.10-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.10-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 85.0 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for sqlshield-0.0.10-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 ab3d1fa08248c32ed30a56f031822e97777262ff1ac3cd409519a274c7605918
MD5 cb8d86f3830857dd2a819f6a74e666e2
BLAKE2b-256 02827a739a414cb227914035c97672f49c09120dbbc2b9957940fdc98a267313

See more details on using hashes here.

File details

Details for the file sqlshield-0.0.10-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.10-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1821221fa394f508b3a33268fe0ad70da36d554922e105e7a7af8311ceb71e3a
MD5 f3f7dcfdd5409e57c9c9cfe0111c7cbf
BLAKE2b-256 259505ebc273cc85b0e2416f1e63e265dfb5c31604cb2f71896eababcc2fdec0

See more details on using hashes here.

File details

Details for the file sqlshield-0.0.10-cp38-cp38-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.10-cp38-cp38-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 39424febda806792c97f5a6fc4904348244702593954562ff21af325f92ea949
MD5 df2dccfcd80c0347dae1ec0f8f7e2380
BLAKE2b-256 a3f8d57999986e0020dacb4c402d69b8dee51514c80344c25fa344ca9b8c85e8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page