Skip to main content

A Shield for your LLM generated SQL Queries. It provides an application level control for securing the database from SQL generated by LLM.

Project description

LLM SQL Shield

A shield for your database to neutralize the SQL queries generated by Large Language Models (LLM).

Background

Large Language Models (LLM) have proven to be extraordinarily effective in generating SQL queries, revolutionizing SQL generation using large language models.

However, there is a significant challenge: a threat to the database. The queries generated by LLMs can access data that they aren't supposed to. No matter how constrained the prompt is, it is always possible to jailbreak. Given the nature of LLMs, they can never be controlled deterministically.

Securing the database is difficult because:

  1. Databases are usually controlled and managed by a different team. Therefore, any changes to the databases would take time.
  2. Databases are usually accessed and modified by multiple different services. Therefore, making any change to the database is difficult.
  3. Database security is designed for a handful of roles and users. Achieving security for thousands of users is difficult.
  4. Row-based security can be achieved by predefined views, but if we have to have dynamic views or parameterized views, it is not possible in databases as of now.
  5. LLMs require the table names and column names to be self-explanatory. To achieve the same in a database, you will have to clone the database and make modifications to it, but then making the data sync in these tables becomes a difficult task.

LLM SQL Shield - Features

With SQL Shield, you can:

  1. Limit the tables that you expose to LLM.
  2. Rename the tables as per your wish by setting proper pub_name.
  3. Limit the columns from each table.
  4. Rename the columns to make them more meaningful by setting proper pub_name.
  5. Limit the rows that can be accessed by providing filters.
  6. The filters can have variables that you can fill in at the time of query generation and execution.

Getting started or How does it work?

Install

It is available as the pip package. You can simply install it like this:

pip install sqlshield

Prepare the Schema

Once installed, you can import the following:

# Models contain the models for Table, Column, and Database.
from sqlshield.models import *

# The main work of handling and generating SQL is done by this
# The entry point is Session object.
from sqlshield.shield import *

The basic idea is that you create a pseudo-schema of your database keeping track of the original table name and column names. You expose this pseudo-schema to LLM to generate a query and then translate this query to internal tables.

Further, in the pseudo-schema, you create the row filters or parameterized views that basically filter rows per user.

The first step is to create the schema. Either you can create a schema by creating instances of MDatabase, MTable, and MColumn or you can simply instantiate by loading existing tables from your database using SQLAlchemy. I would recommend the second approach because it is easier and more maintainable - think of the new tables being introduced.

Let's go ahead and load the database objects using SQLAlchemy.

pip install sqlalchemy
# Using SQLAlchemy 
import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

Now, using the inspector, you can create the SQL Shield's models: MTable, MColumn, and MDatabase automatically.

mDb = MDatabase.from_inspector(inspector)

Once you have the MDatabase instance ready, you can start modifying it to ensure that only minimal information is exposed to the LLMs.

The following code removes all the tables except for four tables. This also reduces the prompt size.

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

MDatabase.keep_only_tables is a handy utility method. To suit your business needs, you directly modify the MDatabase.tables object, which is a set().

If you want to modify each table, you can access the table directly from MDatabase.tables set() object. Since it is easier to create a dictionary and then access it, we have provided get_table_dict() as shown in the below code:

# It is a handy method to get the tables as a dictionary.
# Please use the MDatabase.tables
tables = mDb.get_table_dict()

Now, say we want to modify a table with the name Customer. We can change its name to Customers. Quite often the table names in your organization could be really messy and LLMs would not be able to generate good queries on this. You can give a good pub_name to each table. pub_name is what is shown to the LLM.

customer_table = tables['Customer']
# Change Name of table
customer_table.pub_name = 'Customers'

Filters are the most powerful feature of SQL Shield. Filters let you create subspaces for users. Using filters, you can limit the rows shown to the user. Say, there is a table that contains the data for all 500 teams in your organization. You want each team to see only their data. So you would create a filter now and at runtime, you would insert the team as part of parameters.

Here in this example, we want users to access only the rows of the Customers table that belong to their company.

# Add a filter
customer_table.filters = 'where company = {company}'

You can access the columns of an MTable using the columns which is a set(). It also provides a handy method drop_columns. I will be adding more handy methods based on the user's requests.

But you can simply access all columns from the set and modify it. Each column is MColumn, it has an important field pub_name. If you want to modify the name of a column that is visible to an LLM, just change pub_name. Don't change name.

# Drop some columns
customer_table.drop_columns(set(['Address']))

That's it. Your database is ready!

Generate Schema for LLM

With this MDatabase mDb prepared in previous steps, we can generate schema to augment the prompt as follows:

schema_generated = mDb.generate_schema()

We can send this schema along with the question in the prompt to any LLMs and once LLMs generate SQL, we can re-write the SQL. Say, an LLM has generated a query aSql.

We can now, generate the safe query using the aSQL as follows:

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)

That's it. gSQL would have a query that is safe. Please note that we created a dictionary of parameters and passed it to the constructor of Session because we had parameterized filters that would limit the rows to the company = Talus.

Now, you can execute the gSQL on your actual database peacefully.

Complete Example with Open AI

Here is an example:

import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

from sqlshield.models import *
from sqlshield.shield import *
import os

# TODO: Specify correct OpenAI key
os.environ["OPENAI_API_KEY"] = 'sk-XXXXXXXXX'

# TODO: You can download this SQLite3 DB file: https://github.com/terno-ai/llm-sql-shield/raw/main/tests/chinook.db
# And save it in your current directory
# Connect to DB
engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

# Load default DB
mDb = MDatabase.from_inspector(inspector)

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

tables = mDb.get_table_dict()
customer_table = tables['Customer']

# Change Name of table
customer_table.pub_name = 'Customers'

# Add a filter
customer_table.filters = 'where company = {company}'

# Drop some colums
customer_table.drop_columns(set(['Address']))

# Column renaming

question = "Show me all customers."

from openai import OpenAI
client = OpenAI()

schema_generated = mDb.generate_schema()
print('The following schema was generated: ', schema_generated)

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are an SQL Analyst. Your role is to generate the SQL given a question. Only generate SQL nothing else."},
        {"role": "user", "content": question},
        {"role": "assistant", "content": "The tables schema is follows: " + schema_generated},
    ]
)

aSql = response.choices[0].message.content
print('SQL Generated by LLM: ', aSql)

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)
print("Native SQL: ", gSQL)

print(" ===================== ")
with engine.connect() as con:
    rs = con.execute(text(gSQL))
    for row in rs:
        print(row)

Testing

Comment out ext_modules and cmdclass from setup.py first

pip install -e .

Usage

import sqlshield
shield = sqlshield.SQLShield(...)

from sqlshield import SQLShield
shield = SQLShield(...)

Run inside tests directory

coverage run test.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sqlshield-0.0.17-cp312-cp312-win_amd64.whl (82.8 kB view details)

Uploaded CPython 3.12Windows x86-64

sqlshield-0.0.17-cp312-cp312-macosx_10_13_universal2.whl (168.6 kB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

sqlshield-0.0.17-cp311-cp311-win_amd64.whl (85.5 kB view details)

Uploaded CPython 3.11Windows x86-64

sqlshield-0.0.17-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (526.2 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

sqlshield-0.0.17-cp311-cp311-macosx_10_9_universal2.whl (170.4 kB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.17-cp310-cp310-win_amd64.whl (85.4 kB view details)

Uploaded CPython 3.10Windows x86-64

sqlshield-0.0.17-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (492.2 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

sqlshield-0.0.17-cp310-cp310-macosx_10_9_universal2.whl (166.7 kB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.17-cp39-cp39-win_amd64.whl (85.5 kB view details)

Uploaded CPython 3.9Windows x86-64

sqlshield-0.0.17-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (490.7 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

sqlshield-0.0.17-cp39-cp39-macosx_10_9_universal2.whl (167.4 kB view details)

Uploaded CPython 3.9macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.17-cp38-cp38-win_amd64.whl (86.8 kB view details)

Uploaded CPython 3.8Windows x86-64

sqlshield-0.0.17-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (511.4 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

sqlshield-0.0.17-cp38-cp38-macosx_11_0_universal2.whl (173.9 kB view details)

Uploaded CPython 3.8macOS 11.0+ universal2 (ARM64, x86-64)

File details

Details for the file sqlshield-0.0.17-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.17-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 82.8 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.17-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 4ec7350a34e43138b8ee2b76c4a187fab297e6500f9a34b2586e5005a7da512d
MD5 3ad2a80931c34b649e08fbe796c65df9
BLAKE2b-256 3f3db9cd3d9fdafa1bec6a2cc3f91d6ee0842a00020049ab4bf34d30f5355a3e

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.17-cp312-cp312-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.17-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.17-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 f84bbfa2062abd80718d0dbd750b1647df8fb13d8ea14a962f5458f08e345f38
MD5 a14cd60ac407576a7288bfd7b500f750
BLAKE2b-256 9a916c2e7efc6bd840d0d845c7bf1e2b9aa52c5f5d1667303e6d3e9695aa98ea

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.17-cp312-cp312-macosx_10_13_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.17-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.17-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 85.5 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.17-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 7549d9fed973aab54c4998709d412bd4b9fe4629562249d215552ab5767370e6
MD5 7afb60f2bb1411af97e088491a535aa2
BLAKE2b-256 951b9884f0fc64a2b9358cf98b18901d5de0ac61d98653d52ffe9a49baef695c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.17-cp311-cp311-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.17-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.17-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ec00e21fb46ed79e5ca01e0c3a17aff14c40dd4d2c455bb438b0951b785c8805
MD5 2a46bca82beb299a605758bb9b2749ea
BLAKE2b-256 256dd77972c39fa3da08940068ab70fc3e413723a00652184d1ec90bcf7ae896

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.17-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.17-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.17-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 d9feab9eaab6a6e5cf0e57ff1be98bd7d1839a4d50ab8b1ab00a28f86f4a69f7
MD5 114a1d10ba704265ace67f5bc65b8e7d
BLAKE2b-256 645f4bf23c21bd3fefea82a719dc9d219209724aedd962f72757344cbed70635

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.17-cp311-cp311-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.17-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.17-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 85.4 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.17-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 25fee0e6ee59151e02474ccc5e15cde7d8b515a045165e1b88a870590dbf269a
MD5 615b56f94c54cbb84d5a589019acd05a
BLAKE2b-256 030ed26cc9627f1653ab8a11d73a3735f8ac972148e53280536246bf402b5a71

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.17-cp310-cp310-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.17-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.17-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ac9d87e66cf73c609d10c84867964b4edb7c479ca8e1d4a29919ad999b84f242
MD5 2e0deb17525032eace1a057befb29f23
BLAKE2b-256 a2234185c7faa9bdc3475dec2152ac30594c76710f740cc5cef2799588c98595

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.17-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.17-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.17-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 bc0136985b78f04d536e5dbf29aae8609f33e4e869ce4f8a6c464a787d1938c3
MD5 34c80ec32f196a4b198e0d04263df439
BLAKE2b-256 afc29635be789858fd9e2366b0c4290ef8faca58c29bf0ef8896445ff09cb0bb

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.17-cp310-cp310-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.17-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.17-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 85.5 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.17-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 abc76eb78f06cae9d6ff867ff42c5b87294594fa80406f0922b9c48fc78abb0e
MD5 50c372ee5297d4220bfa87a641620cff
BLAKE2b-256 4135fa18e78e9129355806ab7340ebfbd22e333f3896820d3206d0cc9b8db679

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.17-cp39-cp39-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.17-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.17-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f041107d79047c96c966cbaa4cf4f4e3e6385de1428c490df3d2a0c68eb71307
MD5 157256232c5e389ea5a64841d49c2a4c
BLAKE2b-256 7388eb04243f74af1bcc113442a76dfa14bb7c3a3c17b5d7ac6887e86bfbce30

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.17-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.17-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.17-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 a5006d4683e2cdfbe80548eefc1c56b9c9118925ad345841a2a4e4058d5e1767
MD5 3ade66cef1a4d337a90477f2db2ef938
BLAKE2b-256 be705f5d47f773bd416517752f8390357e855b95f511102d037f1aa223885437

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.17-cp39-cp39-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.17-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.17-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 86.8 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.17-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 3d30b0d7b1b51a1db6eb94319d25fb39c10c439870ec4b78fa1ce6f03c477579
MD5 825eb3ad51d586c9543c48e44f655c65
BLAKE2b-256 38b8b884924df99e624163a80dd740d28fa67e739e09332ef97e8ac10c61ff7d

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.17-cp38-cp38-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.17-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.17-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cd1c48d19d2f70201c092cf92fb10bf5efbe79c0bed21103b70f1d48bd44e75e
MD5 130c0145b47940dc7036642516734538
BLAKE2b-256 0d0362a7fae1c7cf1a6ee29f690fcd0e425b49324ed270d84e5654438d5b9bf2

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.17-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.17-cp38-cp38-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.17-cp38-cp38-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 9bc65fa5adee78f3f99688af605cd2b44a0fb34987a6d700be73acb0cae1bd19
MD5 d71bbbeb37ce4504c7216fa166fe8c88
BLAKE2b-256 19c4559b4ba07c990a39f1dd4c383a51a85439eda99cba71510d6e0778e94b60

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.17-cp38-cp38-macosx_11_0_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page