Skip to main content

A Shield for your LLM generated SQL Queries. It provides an application level control for securing the database from SQL generated by LLM.

Project description

LLM SQL Shield

A shield for your database to neutralize the SQL queries generated by Large Language Models (LLM).

Background

Large Language Models (LLM) have proven to be extraordinarily effective in generating SQL queries, revolutionizing SQL generation using large language models.

However, there is a significant challenge: a threat to the database. The queries generated by LLMs can access data that they aren't supposed to. No matter how constrained the prompt is, it is always possible to jailbreak. Given the nature of LLMs, they can never be controlled deterministically.

Securing the database is difficult because:

  1. Databases are usually controlled and managed by a different team. Therefore, any changes to the databases would take time.
  2. Databases are usually accessed and modified by multiple different services. Therefore, making any change to the database is difficult.
  3. Database security is designed for a handful of roles and users. Achieving security for thousands of users is difficult.
  4. Row-based security can be achieved by predefined views, but if we have to have dynamic views or parameterized views, it is not possible in databases as of now.
  5. LLMs require the table names and column names to be self-explanatory. To achieve the same in a database, you will have to clone the database and make modifications to it, but then making the data sync in these tables becomes a difficult task.

LLM SQL Shield - Features

With SQL Shield, you can:

  1. Limit the tables that you expose to LLM.
  2. Rename the tables as per your wish by setting proper pub_name.
  3. Limit the columns from each table.
  4. Rename the columns to make them more meaningful by setting proper pub_name.
  5. Limit the rows that can be accessed by providing filters.
  6. The filters can have variables that you can fill in at the time of query generation and execution.

Getting started or How does it work?

Install

It is available as the pip package. You can simply install it like this:

pip install sqlshield

Prepare the Schema

Once installed, you can import the following:

# Models contain the models for Table, Column, and Database.
from sqlshield.models import *

# The main work of handling and generating SQL is done by this
# The entry point is Session object.
from sqlshield.shield import *

The basic idea is that you create a pseudo-schema of your database keeping track of the original table name and column names. You expose this pseudo-schema to LLM to generate a query and then translate this query to internal tables.

Further, in the pseudo-schema, you create the row filters or parameterized views that basically filter rows per user.

The first step is to create the schema. Either you can create a schema by creating instances of MDatabase, MTable, and MColumn or you can simply instantiate by loading existing tables from your database using SQLAlchemy. I would recommend the second approach because it is easier and more maintainable - think of the new tables being introduced.

Let's go ahead and load the database objects using SQLAlchemy.

pip install sqlalchemy
# Using SQLAlchemy 
import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

Now, using the inspector, you can create the SQL Shield's models: MTable, MColumn, and MDatabase automatically.

mDb = MDatabase.from_inspector(inspector)

Once you have the MDatabase instance ready, you can start modifying it to ensure that only minimal information is exposed to the LLMs.

The following code removes all the tables except for four tables. This also reduces the prompt size.

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

MDatabase.keep_only_tables is a handy utility method. To suit your business needs, you directly modify the MDatabase.tables object, which is a set().

If you want to modify each table, you can access the table directly from MDatabase.tables set() object. Since it is easier to create a dictionary and then access it, we have provided get_table_dict() as shown in the below code:

# It is a handy method to get the tables as a dictionary.
# Please use the MDatabase.tables
tables = mDb.get_table_dict()

Now, say we want to modify a table with the name Customer. We can change its name to Customers. Quite often the table names in your organization could be really messy and LLMs would not be able to generate good queries on this. You can give a good pub_name to each table. pub_name is what is shown to the LLM.

customer_table = tables['Customer']
# Change Name of table
customer_table.pub_name = 'Customers'

Filters are the most powerful feature of SQL Shield. Filters let you create subspaces for users. Using filters, you can limit the rows shown to the user. Say, there is a table that contains the data for all 500 teams in your organization. You want each team to see only their data. So you would create a filter now and at runtime, you would insert the team as part of parameters.

Here in this example, we want users to access only the rows of the Customers table that belong to their company.

# Add a filter
customer_table.filters = 'where company = {company}'

You can access the columns of an MTable using the columns which is a set(). It also provides a handy method drop_columns. I will be adding more handy methods based on the user's requests.

But you can simply access all columns from the set and modify it. Each column is MColumn, it has an important field pub_name. If you want to modify the name of a column that is visible to an LLM, just change pub_name. Don't change name.

# Drop some columns
customer_table.drop_columns(set(['Address']))

That's it. Your database is ready!

Generate Schema for LLM

With this MDatabase mDb prepared in previous steps, we can generate schema to augment the prompt as follows:

schema_generated = mDb.generate_schema()

We can send this schema along with the question in the prompt to any LLMs and once LLMs generate SQL, we can re-write the SQL. Say, an LLM has generated a query aSql.

We can now, generate the safe query using the aSQL as follows:

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)

That's it. gSQL would have a query that is safe. Please note that we created a dictionary of parameters and passed it to the constructor of Session because we had parameterized filters that would limit the rows to the company = Talus.

Now, you can execute the gSQL on your actual database peacefully.

Complete Example with Open AI

Here is an example:

import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

from sqlshield.models import *
from sqlshield.shield import *
import os

# TODO: Specify correct OpenAI key
os.environ["OPENAI_API_KEY"] = 'sk-XXXXXXXXX'

# TODO: You can download this SQLite3 DB file: https://github.com/terno-ai/llm-sql-shield/raw/main/tests/chinook.db
# And save it in your current directory
# Connect to DB
engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

# Load default DB
mDb = MDatabase.from_inspector(inspector)

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

tables = mDb.get_table_dict()
customer_table = tables['Customer']

# Change Name of table
customer_table.pub_name = 'Customers'

# Add a filter
customer_table.filters = 'where company = {company}'

# Drop some colums
customer_table.drop_columns(set(['Address']))

# Column renaming

question = "Show me all customers."

from openai import OpenAI
client = OpenAI()

schema_generated = mDb.generate_schema()
print('The following schema was generated: ', schema_generated)

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are an SQL Analyst. Your role is to generate the SQL given a question. Only generate SQL nothing else."},
        {"role": "user", "content": question},
        {"role": "assistant", "content": "The tables schema is follows: " + schema_generated},
    ]
)

aSql = response.choices[0].message.content
print('SQL Generated by LLM: ', aSql)

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)
print("Native SQL: ", gSQL)

print(" ===================== ")
with engine.connect() as con:
    rs = con.execute(text(gSQL))
    for row in rs:
        print(row)

Testing

Comment out ext_modules and cmdclass from setup.py first

pip install -e .

Usage

import sqlshield
shield = sqlshield.SQLShield(...)

from sqlshield import SQLShield
shield = SQLShield(...)

Run inside tests directory

coverage run test.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

sqlshield-0.0.13-cp312-cp312-win_amd64.whl (89.2 kB view details)

Uploaded CPython 3.12 Windows x86-64

sqlshield-0.0.13-cp312-cp312-macosx_10_13_universal2.whl (181.3 kB view details)

Uploaded CPython 3.12 macOS 10.13+ universal2 (ARM64, x86-64)

sqlshield-0.0.13-cp311-cp311-win_amd64.whl (89.5 kB view details)

Uploaded CPython 3.11 Windows x86-64

sqlshield-0.0.13-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (569.0 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

sqlshield-0.0.13-cp311-cp311-macosx_10_9_universal2.whl (180.5 kB view details)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.13-cp310-cp310-win_amd64.whl (89.0 kB view details)

Uploaded CPython 3.10 Windows x86-64

sqlshield-0.0.13-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (504.6 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

sqlshield-0.0.13-cp310-cp310-macosx_10_9_universal2.whl (180.9 kB view details)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.13-cp39-cp39-win_amd64.whl (89.4 kB view details)

Uploaded CPython 3.9 Windows x86-64

sqlshield-0.0.13-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (504.6 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

sqlshield-0.0.13-cp39-cp39-macosx_10_9_universal2.whl (181.5 kB view details)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.13-cp38-cp38-win_amd64.whl (89.7 kB view details)

Uploaded CPython 3.8 Windows x86-64

sqlshield-0.0.13-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (512.7 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

sqlshield-0.0.13-cp38-cp38-macosx_11_0_universal2.whl (182.5 kB view details)

Uploaded CPython 3.8 macOS 11.0+ universal2 (ARM64, x86-64)

File details

Details for the file sqlshield-0.0.13-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.13-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 f0853c2178c5a9d51b16e2e4bde860e2d0f5916a081f06a439a2ee2226e58bed
MD5 f6d66989f16c9836300671a31ff54a79
BLAKE2b-256 1539ba3861a9bf414bc2b95192408891d67a7c51d7b0847b7ca64ee5ea55692e

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.13-cp312-cp312-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.13-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.13-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 4ed4890b81cf6734f40839384958de32a34c6ba454a99a97470057ddf2a2b08e
MD5 3a16e4234e79c6e0f34052f05fc5bf41
BLAKE2b-256 29fd015ad609a9f9bc2fc6f749de68c8b9d82373591293febf943e98de20522e

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.13-cp312-cp312-macosx_10_13_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.13-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.13-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 0b7cabdead81dc17d1e62daaf45d16050ba9b5537565b37e478c97f15d27a5ad
MD5 026646c9f1618d80b514b335816f5262
BLAKE2b-256 65bf009fd13d506f273f7cb45c328fcd262764e10820e2376f21931ed380a832

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.13-cp311-cp311-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.13-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.13-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c72c6ebe0104f5f77657ba28f7a9b9641086f1b8ba2a6393825fc4430c03453e
MD5 257a5456e1ef569a2f053fb84c1915cd
BLAKE2b-256 c8a26b51bb064b3aec5affd36259e9c6d121a33d49df12c5cc57fa4520658f48

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.13-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.13-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.13-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 9c0dce5d4119b24f75a390e1eb08aed5d8b65afe32f58d9a96c92b9e0c04fe61
MD5 a3a5771c7b5b487a99f0feb1b4c6844a
BLAKE2b-256 781bf6a60635482082b096fe00989e6966a05668210539709aad7ceb323a71e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.13-cp311-cp311-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.13-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.13-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 1999f0af9a99034e462a615fec461fdb2c976501073d61a662858063a2a65458
MD5 65938f481f18b9a31af7843d31dc2d14
BLAKE2b-256 f86ef958371fe59845503df29b78214c90ebe6e2c7775eba3901b60f63323899

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.13-cp310-cp310-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.13-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.13-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 859396cc6d4364a2729345221b9bf388e67b533d82af01b05c0668337a3f0ada
MD5 eafd13183bd441f0b50ae4f2dc420e05
BLAKE2b-256 5163fd98801381de218f9ad31174db5553060604c6f0404a406c039ac5db6c49

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.13-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.13-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.13-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 0155ffcbd3e378eebff1d8ef623f3546fe613022984f69645f4eb64a9130cf9e
MD5 8ac9c66108b5a638ae0fe61556e0fc14
BLAKE2b-256 bba63a7827b8c21a17fbcd9d9a22be503c549b95df256792fbdca307e1a17a75

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.13-cp310-cp310-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.13-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.13-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 89.4 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for sqlshield-0.0.13-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 8cdb9f87cf23c2f757b31dd0c5db0f31e383f5d0e90266ae69eb4dde82d95ed0
MD5 6b84447b1be8ff0bb3de6c6fbd53dd06
BLAKE2b-256 88369c4f54d63bdc2a2521d5760b690191656bc734aa329d17ff357f1f247450

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.13-cp39-cp39-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.13-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.13-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f0b233f914e3360ea33f27fbc405243c2719105646572a743e1630353c722a32
MD5 2936b7e890f49ab301623b9a87108289
BLAKE2b-256 d8d24b2317f6b00c033af602403aaf7fe16061599e8226196d53919b31b0939f

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.13-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.13-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.13-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 bd5cb85035d2e352f9a4169630467bc16353bcf278cdb852c5a814aa538cf5c1
MD5 4b41d972057c09cd9e8c96c4621665db
BLAKE2b-256 4708256388b01d7c6d0bc48436fc26f4a8e8c51f3096454f5f826ad68699ea23

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.13-cp39-cp39-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.13-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.13-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 89.7 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for sqlshield-0.0.13-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 c2f1d645750a23978c5ed571655d2dd8f812d6f4f2b28d98082ed80b2263bf74
MD5 afcc12eae68de50ba114c961098109e6
BLAKE2b-256 66b457b7a3d25f88e9e6f4e587a10c971bd2758859ee7e249fc0c7e96748762d

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.13-cp38-cp38-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.13-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.13-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d902fdf5786e050ed747aa702c306f3d079d0b8a6ada48d2b08b7cae0191ba0f
MD5 3d17f543eea07f7f03f491497fa97ff7
BLAKE2b-256 7ada810aaf16e8bbe3ce1cb7bfe41b3a85344c714c9cb996e7b48a68654c76b5

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.13-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.13-cp38-cp38-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.13-cp38-cp38-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 ce8b1e2aec95cca7123780b763e3a86eecb5b65ab2a42b87930f26325d5f1779
MD5 008a6d3f82a7b9b7a40409682d6b5a33
BLAKE2b-256 672d0a6718d8b39ddc1a19beee7a94f6097d0bba6044f9b8b9b0a5b52335a7c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.13-cp38-cp38-macosx_11_0_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page