Skip to main content

A Shield for your LLM generated SQL Queries. It provides an application level control for securing the database from SQL generated by LLM.

Project description

LLM SQL Shield

A shield for your database to neutralize the SQL queries generated by Large Language Models (LLM).

Background

Large Language Models (LLM) have proven to be extraordinarily effective in generating SQL queries, revolutionizing SQL generation using large language models.

However, there is a significant challenge: a threat to the database. The queries generated by LLMs can access data that they aren't supposed to. No matter how constrained the prompt is, it is always possible to jailbreak. Given the nature of LLMs, they can never be controlled deterministically.

Securing the database is difficult because:

  1. Databases are usually controlled and managed by a different team. Therefore, any changes to the databases would take time.
  2. Databases are usually accessed and modified by multiple different services. Therefore, making any change to the database is difficult.
  3. Database security is designed for a handful of roles and users. Achieving security for thousands of users is difficult.
  4. Row-based security can be achieved by predefined views, but if we have to have dynamic views or parameterized views, it is not possible in databases as of now.
  5. LLMs require the table names and column names to be self-explanatory. To achieve the same in a database, you will have to clone the database and make modifications to it, but then making the data sync in these tables becomes a difficult task.

LLM SQL Shield - Features

With SQL Shield, you can:

  1. Limit the tables that you expose to LLM.
  2. Rename the tables as per your wish by setting proper pub_name.
  3. Limit the columns from each table.
  4. Rename the columns to make them more meaningful by setting proper pub_name.
  5. Limit the rows that can be accessed by providing filters.
  6. The filters can have variables that you can fill in at the time of query generation and execution.

Getting started or How does it work?

Install

It is available as the pip package. You can simply install it like this:

pip install sqlshield

Prepare the Schema

Once installed, you can import the following:

# Models contain the models for Table, Column, and Database.
from sqlshield.models import *

# The main work of handling and generating SQL is done by this
# The entry point is Session object.
from sqlshield.shield import *

The basic idea is that you create a pseudo-schema of your database keeping track of the original table name and column names. You expose this pseudo-schema to LLM to generate a query and then translate this query to internal tables.

Further, in the pseudo-schema, you create the row filters or parameterized views that basically filter rows per user.

The first step is to create the schema. Either you can create a schema by creating instances of MDatabase, MTable, and MColumn or you can simply instantiate by loading existing tables from your database using SQLAlchemy. I would recommend the second approach because it is easier and more maintainable - think of the new tables being introduced.

Let's go ahead and load the database objects using SQLAlchemy.

pip install sqlalchemy
# Using SQLAlchemy 
import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

Now, using the inspector, you can create the SQL Shield's models: MTable, MColumn, and MDatabase automatically.

mDb = MDatabase.from_inspector(inspector)

Once you have the MDatabase instance ready, you can start modifying it to ensure that only minimal information is exposed to the LLMs.

The following code removes all the tables except for four tables. This also reduces the prompt size.

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

MDatabase.keep_only_tables is a handy utility method. To suit your business needs, you directly modify the MDatabase.tables object, which is a set().

If you want to modify each table, you can access the table directly from MDatabase.tables set() object. Since it is easier to create a dictionary and then access it, we have provided get_table_dict() as shown in the below code:

# It is a handy method to get the tables as a dictionary.
# Please use the MDatabase.tables
tables = mDb.get_table_dict()

Now, say we want to modify a table with the name Customer. We can change its name to Customers. Quite often the table names in your organization could be really messy and LLMs would not be able to generate good queries on this. You can give a good pub_name to each table. pub_name is what is shown to the LLM.

customer_table = tables['Customer']
# Change Name of table
customer_table.pub_name = 'Customers'

Filters are the most powerful feature of SQL Shield. Filters let you create subspaces for users. Using filters, you can limit the rows shown to the user. Say, there is a table that contains the data for all 500 teams in your organization. You want each team to see only their data. So you would create a filter now and at runtime, you would insert the team as part of parameters.

Here in this example, we want users to access only the rows of the Customers table that belong to their company.

# Add a filter
customer_table.filters = 'where company = {company}'

You can access the columns of an MTable using the columns which is a set(). It also provides a handy method drop_columns. I will be adding more handy methods based on the user's requests.

But you can simply access all columns from the set and modify it. Each column is MColumn, it has an important field pub_name. If you want to modify the name of a column that is visible to an LLM, just change pub_name. Don't change name.

# Drop some columns
customer_table.drop_columns(set(['Address']))

That's it. Your database is ready!

Generate Schema for LLM

With this MDatabase mDb prepared in previous steps, we can generate schema to augment the prompt as follows:

schema_generated = mDb.generate_schema()

We can send this schema along with the question in the prompt to any LLMs and once LLMs generate SQL, we can re-write the SQL. Say, an LLM has generated a query aSql.

We can now, generate the safe query using the aSQL as follows:

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)

That's it. gSQL would have a query that is safe. Please note that we created a dictionary of parameters and passed it to the constructor of Session because we had parameterized filters that would limit the rows to the company = Talus.

Now, you can execute the gSQL on your actual database peacefully.

Complete Example with Open AI

Here is an example:

import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

from sqlshield.models import *
from sqlshield.shield import *
import os

# TODO: Specify correct OpenAI key
os.environ["OPENAI_API_KEY"] = 'sk-XXXXXXXXX'

# TODO: You can download this SQLite3 DB file: https://github.com/terno-ai/llm-sql-shield/raw/main/tests/chinook.db
# And save it in your current directory
# Connect to DB
engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

# Load default DB
mDb = MDatabase.from_inspector(inspector)

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

tables = mDb.get_table_dict()
customer_table = tables['Customer']

# Change Name of table
customer_table.pub_name = 'Customers'

# Add a filter
customer_table.filters = 'where company = {company}'

# Drop some colums
customer_table.drop_columns(set(['Address']))

# Column renaming

question = "Show me all customers."

from openai import OpenAI
client = OpenAI()

schema_generated = mDb.generate_schema()
print('The following schema was generated: ', schema_generated)

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are an SQL Analyst. Your role is to generate the SQL given a question. Only generate SQL nothing else."},
        {"role": "user", "content": question},
        {"role": "assistant", "content": "The tables schema is follows: " + schema_generated},
    ]
)

aSql = response.choices[0].message.content
print('SQL Generated by LLM: ', aSql)

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)
print("Native SQL: ", gSQL)

print(" ===================== ")
with engine.connect() as con:
    rs = con.execute(text(gSQL))
    for row in rs:
        print(row)

Testing

Comment out ext_modules and cmdclass from setup.py first

pip install -e .

Usage

import sqlshield
shield = sqlshield.SQLShield(...)

from sqlshield import SQLShield
shield = SQLShield(...)

Run inside tests directory

coverage run test.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sqlshield-0.0.16-cp312-cp312-win_amd64.whl (82.8 kB view details)

Uploaded CPython 3.12Windows x86-64

sqlshield-0.0.16-cp312-cp312-macosx_10_13_universal2.whl (168.6 kB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

sqlshield-0.0.16-cp311-cp311-win_amd64.whl (85.5 kB view details)

Uploaded CPython 3.11Windows x86-64

sqlshield-0.0.16-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (526.4 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

sqlshield-0.0.16-cp311-cp311-macosx_10_9_universal2.whl (170.4 kB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.16-cp310-cp310-win_amd64.whl (85.4 kB view details)

Uploaded CPython 3.10Windows x86-64

sqlshield-0.0.16-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (492.3 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

sqlshield-0.0.16-cp310-cp310-macosx_10_9_universal2.whl (166.7 kB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.16-cp39-cp39-win_amd64.whl (85.5 kB view details)

Uploaded CPython 3.9Windows x86-64

sqlshield-0.0.16-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (490.8 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

sqlshield-0.0.16-cp39-cp39-macosx_10_9_universal2.whl (167.4 kB view details)

Uploaded CPython 3.9macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.16-cp38-cp38-win_amd64.whl (86.8 kB view details)

Uploaded CPython 3.8Windows x86-64

sqlshield-0.0.16-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (511.5 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

sqlshield-0.0.16-cp38-cp38-macosx_11_0_universal2.whl (173.9 kB view details)

Uploaded CPython 3.8macOS 11.0+ universal2 (ARM64, x86-64)

File details

Details for the file sqlshield-0.0.16-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.16-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 82.8 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.16-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 c4036c6ec84f20703135342cb84f4db55743ab4c36bd1e21e4505e834c0b41e2
MD5 8893ed94132f6ed794d0dd88911a09b0
BLAKE2b-256 1d8a61a3f8a74575b1f81160cafe80faa9b3dcea8e5d3f2abfa52935affe46a3

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.16-cp312-cp312-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.16-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.16-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 255f4dfc9664a9254e395125979c2d6d202368644c7118fe10285d3fccfba471
MD5 910c37773c39b460e5b1975f34bce892
BLAKE2b-256 359bb455d61cf771b34a05374c28d8b12941f8e7c09dea70acea3ec608f0ae16

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.16-cp312-cp312-macosx_10_13_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.16-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.16-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 85.5 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.16-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 548c2284e4b84a5a167f4481a5f0b56e6d3d2639cc8995e3c96c26501fd7f9a0
MD5 c2f6296f3dd5078e9ffd41618e57a7dd
BLAKE2b-256 a2264774ecc1c3cfc086be39ab538090356df7d16084def0bbf2dd04ac2f766c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.16-cp311-cp311-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.16-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.16-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 485ed8c80272b81499187f1fb64c834430b800fb959cf0ad9d87afdc1c637832
MD5 7e81c89edb6d71d38c926d00bb645499
BLAKE2b-256 157283372d3e67bf163ed32484383cca906f123dc8321bb20059b1e1e2990c52

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.16-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.16-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.16-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 ce10d60bea18746b7479641ee822a8f9b54292d5a3ffb9ba26f9a9a9afd27685
MD5 103f6ad0346a40b7d8eb9b64a8302774
BLAKE2b-256 e95042cd1a9a49aa203481c7092796cf5e0e557fc6eaee0c7755a958dda89d69

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.16-cp311-cp311-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.16-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.16-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 85.4 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.16-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 b2db4a7a4fef94c371ac84028b4b8fb08f166bd9a3e2b90e3fc0d33906d160ae
MD5 9af4f0ff4ac6800f229125c795ac5cae
BLAKE2b-256 1bc8409773f066069829871e46d59c0dc8d55d9e9fc7cf335dc14ff7eacfe60d

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.16-cp310-cp310-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.16-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.16-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 77af37d1694ce2da83daf8cdebc63624df9d3414165977247ceab059c74929a3
MD5 d220686a38a7a1315c91e6786bf12d78
BLAKE2b-256 d5803a28f02b330bbced636d6e36604bde68da21ebd2c669a7078eb107301147

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.16-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.16-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.16-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 edc27ebe0dd6fdee30d360967188131dd6f3553165e3c0dc690d6b17c232dfce
MD5 52e26cf660ac5830b1956156edb35568
BLAKE2b-256 1964e011165a39205e83918c2f3aab5dc0148d22466d7ea1875b435cb8e1273d

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.16-cp310-cp310-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.16-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.16-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 85.5 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.16-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 60fed5c077cc2496ae49e12dd764aa219d2226b0d2c5ce2f1f9324e5ea56243e
MD5 f9d4ddbb94ad2782befc56ba3554a8aa
BLAKE2b-256 cb6e7c3fb6b8bdcc20f8035cd6d13bc92dfc5b8601d086851c1460a24ccc0aae

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.16-cp39-cp39-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.16-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.16-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d192fa273ed0b8ab3e5e19d37c12466f78f7fd3109e187231724f4f3865e526e
MD5 c68f91b6d77f04d46ef834a43ea5d233
BLAKE2b-256 bfa55b033842059c6d424184e37d86f0f4458f2bf87ffc7544ea3c30a0039e93

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.16-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.16-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.16-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 aa91f9145f8b14642d6201fc362724c90f816758004938f10f2c24325c16ffb6
MD5 01075035cc308a38a985a7230e8ed06c
BLAKE2b-256 63a9430391674660e3c24a3bd6e698a13f5b361f0662cc50727b112fe2201583

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.16-cp39-cp39-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.16-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.16-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 86.8 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlshield-0.0.16-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 76dbedeb17b15049c583f426ef9232982bca84eaabb3502df50b969bce5ac012
MD5 d5d2647453892d02f32842352e7708ab
BLAKE2b-256 01a2abd7c642f7c648af750396c5139c86b10b75cbad93ebe7d93f5c5530183d

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.16-cp38-cp38-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.16-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.16-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 30bae87fa5917f1c9af2057e9f443a8224979aede40793fecb5fb7c18ca5df2a
MD5 e395e5a671805862041e921319733f05
BLAKE2b-256 c30f16e7d73d2bd6a7506c505dfab38fc8948459f47e514d06200442f1c9d15f

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.16-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlshield-0.0.16-cp38-cp38-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.16-cp38-cp38-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 e4f45cddb304ff754a8efc9f3bff14df24e445dd2ee7b29d3a7e6063cdc4b71f
MD5 49ce2d552dc234c1880ed7c9d28da7ee
BLAKE2b-256 bfcdf729dc736b4cc8a8500c72907cc8cb4a50b1012a46cc7ccd89a1e2d7cf48

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.16-cp38-cp38-macosx_11_0_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page