Skip to main content

A Shield for your LLM generated SQL Queries. It provides an application level control for securing the database from SQL generated by LLM.

Project description

LLM SQL Shield

A shield for your database to neutralize the SQL queries generated by Large Language Models (LLM).

Background

Large Language Models (LLM) have proven to be extraordinarily effective in generating SQL queries, revolutionizing SQL generation using large language models.

However, there is a significant challenge: a threat to the database. The queries generated by LLMs can access data that they aren't supposed to. No matter how constrained the prompt is, it is always possible to jailbreak. Given the nature of LLMs, they can never be controlled deterministically.

Securing the database is difficult because:

  1. Databases are usually controlled and managed by a different team. Therefore, any changes to the databases would take time.
  2. Databases are usually accessed and modified by multiple different services. Therefore, making any change to the database is difficult.
  3. Database security is designed for a handful of roles and users. Achieving security for thousands of users is difficult.
  4. Row-based security can be achieved by predefined views, but if we have to have dynamic views or parameterized views, it is not possible in databases as of now.
  5. LLMs require the table names and column names to be self-explanatory. To achieve the same in a database, you will have to clone the database and make modifications to it, but then making the data sync in these tables becomes a difficult task.

LLM SQL Shield - Features

With SQL Shield, you can:

  1. Limit the tables that you expose to LLM.
  2. Rename the tables as per your wish by setting proper pub_name.
  3. Limit the columns from each table.
  4. Rename the columns to make them more meaningful by setting proper pub_name.
  5. Limit the rows that can be accessed by providing filters.
  6. The filters can have variables that you can fill in at the time of query generation and execution.

Getting started or How does it work?

Install

It is available as the pip package. You can simply install it like this:

pip install sqlshield

Prepare the Schema

Once installed, you can import the following:

# Models contain the models for Table, Column, and Database.
from sqlshield.models import *

# The main work of handling and generating SQL is done by this
# The entry point is Session object.
from sqlshield.shield import *

The basic idea is that you create a pseudo-schema of your database keeping track of the original table name and column names. You expose this pseudo-schema to LLM to generate a query and then translate this query to internal tables.

Further, in the pseudo-schema, you create the row filters or parameterized views that basically filter rows per user.

The first step is to create the schema. Either you can create a schema by creating instances of MDatabase, MTable, and MColumn or you can simply instantiate by loading existing tables from your database using SQLAlchemy. I would recommend the second approach because it is easier and more maintainable - think of the new tables being introduced.

Let's go ahead and load the database objects using SQLAlchemy.

pip install sqlalchemy
# Using SQLAlchemy 
import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

Now, using the inspector, you can create the SQL Shield's models: MTable, MColumn, and MDatabase automatically.

mDb = MDatabase.from_inspector(inspector)

Once you have the MDatabase instance ready, you can start modifying it to ensure that only minimal information is exposed to the LLMs.

The following code removes all the tables except for four tables. This also reduces the prompt size.

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

MDatabase.keep_only_tables is a handy utility method. To suit your business needs, you directly modify the MDatabase.tables object, which is a set().

If you want to modify each table, you can access the table directly from MDatabase.tables set() object. Since it is easier to create a dictionary and then access it, we have provided get_table_dict() as shown in the below code:

# It is a handy method to get the tables as a dictionary.
# Please use the MDatabase.tables
tables = mDb.get_table_dict()

Now, say we want to modify a table with the name Customer. We can change its name to Customers. Quite often the table names in your organization could be really messy and LLMs would not be able to generate good queries on this. You can give a good pub_name to each table. pub_name is what is shown to the LLM.

customer_table = tables['Customer']
# Change Name of table
customer_table.pub_name = 'Customers'

Filters are the most powerful feature of SQL Shield. Filters let you create subspaces for users. Using filters, you can limit the rows shown to the user. Say, there is a table that contains the data for all 500 teams in your organization. You want each team to see only their data. So you would create a filter now and at runtime, you would insert the team as part of parameters.

Here in this example, we want users to access only the rows of the Customers table that belong to their company.

# Add a filter
customer_table.filters = 'where company = {company}'

You can access the columns of an MTable using the columns which is a set(). It also provides a handy method drop_columns. I will be adding more handy methods based on the user's requests.

But you can simply access all columns from the set and modify it. Each column is MColumn, it has an important field pub_name. If you want to modify the name of a column that is visible to an LLM, just change pub_name. Don't change name.

# Drop some columns
customer_table.drop_columns(set(['Address']))

That's it. Your database is ready!

Generate Schema for LLM

With this MDatabase mDb prepared in previous steps, we can generate schema to augment the prompt as follows:

schema_generated = mDb.generate_schema()

We can send this schema along with the question in the prompt to any LLMs and once LLMs generate SQL, we can re-write the SQL. Say, an LLM has generated a query aSql.

We can now, generate the safe query using the aSQL as follows:

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)

That's it. gSQL would have a query that is safe. Please note that we created a dictionary of parameters and passed it to the constructor of Session because we had parameterized filters that would limit the rows to the company = Talus.

Now, you can execute the gSQL on your actual database peacefully.

Complete Example with Open AI

Here is an example:

import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

from sqlshield.models import *
from sqlshield.shield import *
import os

# TODO: Specify correct OpenAI key
os.environ["OPENAI_API_KEY"] = 'sk-XXXXXXXXX'

# TODO: You can download this SQLite3 DB file: https://github.com/terno-ai/llm-sql-shield/raw/main/tests/chinook.db
# And save it in your current directory
# Connect to DB
engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

# Load default DB
mDb = MDatabase.from_inspector(inspector)

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

tables = mDb.get_table_dict()
customer_table = tables['Customer']

# Change Name of table
customer_table.pub_name = 'Customers'

# Add a filter
customer_table.filters = 'where company = {company}'

# Drop some colums
customer_table.drop_columns(set(['Address']))

# Column renaming

question = "Show me all customers."

from openai import OpenAI
client = OpenAI()

schema_generated = mDb.generate_schema()
print('The following schema was generated: ', schema_generated)

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are an SQL Analyst. Your role is to generate the SQL given a question. Only generate SQL nothing else."},
        {"role": "user", "content": question},
        {"role": "assistant", "content": "The tables schema is follows: " + schema_generated},
    ]
)

aSql = response.choices[0].message.content
print('SQL Generated by LLM: ', aSql)

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)
print("Native SQL: ", gSQL)

print(" ===================== ")
with engine.connect() as con:
    rs = con.execute(text(gSQL))
    for row in rs:
        print(row)

Testing

Comment out ext_modules and cmdclass from setup.py first

pip install -e .

Usage

import sqlshield
shield = sqlshield.SQLShield(...)

from sqlshield import SQLShield
shield = SQLShield(...)

Run inside tests directory

coverage run test.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

sqlshield-0.0.11-cp312-cp312-win_amd64.whl (84.7 kB view details)

Uploaded CPython 3.12 Windows x86-64

sqlshield-0.0.11-cp312-cp312-macosx_10_13_universal2.whl (170.3 kB view details)

Uploaded CPython 3.12 macOS 10.13+ universal2 (ARM64, x86-64)

sqlshield-0.0.11-cp311-cp311-win_amd64.whl (85.1 kB view details)

Uploaded CPython 3.11 Windows x86-64

sqlshield-0.0.11-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (527.0 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

sqlshield-0.0.11-cp311-cp311-macosx_10_9_universal2.whl (169.3 kB view details)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.11-cp310-cp310-win_amd64.whl (84.7 kB view details)

Uploaded CPython 3.10 Windows x86-64

sqlshield-0.0.11-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (466.5 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

sqlshield-0.0.11-cp310-cp310-macosx_10_9_universal2.whl (169.8 kB view details)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.11-cp39-cp39-win_amd64.whl (85.1 kB view details)

Uploaded CPython 3.9 Windows x86-64

sqlshield-0.0.11-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (466.4 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

sqlshield-0.0.11-cp39-cp39-macosx_10_9_universal2.whl (170.4 kB view details)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.11-cp38-cp38-win_amd64.whl (85.4 kB view details)

Uploaded CPython 3.8 Windows x86-64

sqlshield-0.0.11-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (474.7 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

sqlshield-0.0.11-cp38-cp38-macosx_11_0_universal2.whl (171.5 kB view details)

Uploaded CPython 3.8 macOS 11.0+ universal2 (ARM64, x86-64)

File details

Details for the file sqlshield-0.0.11-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.11-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 bb15c49c5c825bd4fccc14743772c09c4ea51eff221d84c432737e6f15df26f1
MD5 cd4e6829508a61513e62039bc07a5925
BLAKE2b-256 c06349f511bba6913368d3e7ab1eeb57e2b6d02f6fdf447fe0bdd9876c1ab677

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.11-cp312-cp312-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.11-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.11-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 468d6f156b660920e3daf033be623bbb23533e0d8022585f44158306a7b685e8
MD5 c6745c1fd16b19cfab98f0c4fa16b991
BLAKE2b-256 5032cb0b8ffcc0c0fac2afc1965c5651b7cce8c98eab481d7fd41366cb8c31e4

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.11-cp312-cp312-macosx_10_13_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.11-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.11-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 f1bbcb0d43162d58c883a462351cb2c6f70254ad43feffc6062410417de0726f
MD5 db48fc5ae8517af97affd919aa297211
BLAKE2b-256 bb61cb79e12b0882d6bcbdc86fed11663e6b21bdc611ae5a6ed5ef09cf01a1d2

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.11-cp311-cp311-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.11-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.11-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 53c3664396f2598fdd4fa8a2257ddab2e7685f0f798ec8cb538da002e8cb9d06
MD5 17ad6f0aa907a7b0fd4cc80205455f35
BLAKE2b-256 81e166788c851c56df9678af16dd65619763b5343b09660d50ce7810432c9a4a

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.11-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.11-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.11-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 47c558f2da5771cd4701b3e38d7ae6ef1c35f33bf0974e9f4b26742cacf7bf50
MD5 82206515935f2db4b1623cacb7f29196
BLAKE2b-256 2bdf070a5ff7b700390595a0b8c10cf795f1e91aa1b24ae3a6c5b96dbebd7cb0

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.11-cp311-cp311-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.11-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.11-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 9f56f3fcd3c73faa3824d270fc64c2745e454d8662f7a41dc3399bc1dd171f7d
MD5 4b0d4a63bf5f7fcc19c7eb5bdd197173
BLAKE2b-256 dd80f77a4a5169e82a962f231a2bd6f6c8d28db812442c2b3b5b7733eef6d07c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.11-cp310-cp310-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.11-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.11-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5612ec22cfe0f000d36c40bf803232e4474a0f84519b76b7387c4fba6f7269e2
MD5 be6db68749fe21166c11df850ec11571
BLAKE2b-256 69a7a1459cf5b346acc92b9d9e9aa05eccf8a18c60cda125f9d013d49da21bfe

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.11-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.11-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.11-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 143f72f4b9eef7a3281d4af17e726e46255f1b74c82f5d217ea37fa22491a41a
MD5 f1dbe203246bd0ad4f347ee7a8e76222
BLAKE2b-256 9db93f72deace18b1670f7dbda14d89968ed0056073d6006185289625ad75e91

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.11-cp310-cp310-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.11-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.11-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 1ea20fe14b1a49bf5f464b3a7ad07bd910d426926c57fd76b44cccd14aa819c5
MD5 d34bca6c3e04ab80effcac45fb736ea1
BLAKE2b-256 3912a5664b452b16149d7bb8b1507118d8cbb0017ed87248213bb7142b4e9920

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.11-cp39-cp39-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.11-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.11-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 931020393186388f39f6cc84aa43bb64668ea9dc310c87cc1fe4d9f8c684fdd4
MD5 f03c58248a1836d5ae4b715c7153088e
BLAKE2b-256 8bd3b84dce38d8491b42e3f123d4a96f3b6d9799433ae5bab7473c3113bc38b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.11-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.11-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.11-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 1d15b91d38f2a19d034eb8dcce3d398f075db870137af2e89d249ee5511efc3e
MD5 af8196b14265ae2f907c09c35f917418
BLAKE2b-256 71972332c8681fb571c38b3c5a95ca024662645bd1df8c5c1bc226cd3692c5c3

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.11-cp39-cp39-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.11-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.11-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 2d1af74f93ddc3263d1c4bab26650a627882c3c1766104d6d72415b8cdba4be9
MD5 c29c16095667d4600394e2fdd661f992
BLAKE2b-256 254ae5cf5baeadb827755c6e95866e387339b4928397d7a2ba38bbcb4546a366

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.11-cp38-cp38-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.11-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.11-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fb4eda62af838e3e6828ac5ead9f6560f71bb71c59b80fd1d8d58096bdd5683e
MD5 3da98c660fb27df3737873d6a3508072
BLAKE2b-256 be2c560a526f83ab6e45a23cb9bcaa0880db4e4a10ba3d35091b69bd57a7be25

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.11-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.11-cp38-cp38-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.11-cp38-cp38-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 eabe9881a2a55e783977bcd4657a6c75ee9b35cbef3301047400220b754ad10d
MD5 b646900732670e899b7e4826fc0158c9
BLAKE2b-256 dd98e2581280e2b6af7734d9ab487e8ac03b5e4b5be0c859695c081b60032981

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.11-cp38-cp38-macosx_11_0_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page