Skip to main content

A Shield for your LLM generated SQL Queries. It provides an application level control for securing the database from SQL generated by LLM.

Project description

LLM SQL Shield

A shield for your database to neutralize the SQL queries generated by Large Language Models (LLM).

Background

Large Language Models (LLM) have proven to be extraordinarily effective in generating SQL queries, revolutionizing SQL generation using large language models.

However, there is a significant challenge: a threat to the database. The queries generated by LLMs can access data that they aren't supposed to. No matter how constrained the prompt is, it is always possible to jailbreak. Given the nature of LLMs, they can never be controlled deterministically.

Securing the database is difficult because:

  1. Databases are usually controlled and managed by a different team. Therefore, any changes to the databases would take time.
  2. Databases are usually accessed and modified by multiple different services. Therefore, making any change to the database is difficult.
  3. Database security is designed for a handful of roles and users. Achieving security for thousands of users is difficult.
  4. Row-based security can be achieved by predefined views, but if we have to have dynamic views or parameterized views, it is not possible in databases as of now.
  5. LLMs require the table names and column names to be self-explanatory. To achieve the same in a database, you will have to clone the database and make modifications to it, but then making the data sync in these tables becomes a difficult task.

LLM SQL Shield - Features

With SQL Shield, you can:

  1. Limit the tables that you expose to LLM.
  2. Rename the tables as per your wish by setting proper pub_name.
  3. Limit the columns from each table.
  4. Rename the columns to make them more meaningful by setting proper pub_name.
  5. Limit the rows that can be accessed by providing filters.
  6. The filters can have variables that you can fill in at the time of query generation and execution.

Getting started or How does it work?

Install

It is available as the pip package. You can simply install it like this:

pip install sqlshield

Prepare the Schema

Once installed, you can import the following:

# Models contain the models for Table, Column, and Database.
from sqlshield.models import *

# The main work of handling and generating SQL is done by this
# The entry point is Session object.
from sqlshield.shield import *

The basic idea is that you create a pseudo-schema of your database keeping track of the original table name and column names. You expose this pseudo-schema to LLM to generate a query and then translate this query to internal tables.

Further, in the pseudo-schema, you create the row filters or parameterized views that basically filter rows per user.

The first step is to create the schema. Either you can create a schema by creating instances of MDatabase, MTable, and MColumn or you can simply instantiate by loading existing tables from your database using SQLAlchemy. I would recommend the second approach because it is easier and more maintainable - think of the new tables being introduced.

Let's go ahead and load the database objects using SQLAlchemy.

pip install sqlalchemy
# Using SQLAlchemy 
import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

Now, using the inspector, you can create the SQL Shield's models: MTable, MColumn, and MDatabase automatically.

mDb = MDatabase.from_inspector(inspector)

Once you have the MDatabase instance ready, you can start modifying it to ensure that only minimal information is exposed to the LLMs.

The following code removes all the tables except for four tables. This also reduces the prompt size.

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

MDatabase.keep_only_tables is a handy utility method. To suit your business needs, you directly modify the MDatabase.tables object, which is a set().

If you want to modify each table, you can access the table directly from MDatabase.tables set() object. Since it is easier to create a dictionary and then access it, we have provided get_table_dict() as shown in the below code:

# It is a handy method to get the tables as a dictionary.
# Please use the MDatabase.tables
tables = mDb.get_table_dict()

Now, say we want to modify a table with the name Customer. We can change its name to Customers. Quite often the table names in your organization could be really messy and LLMs would not be able to generate good queries on this. You can give a good pub_name to each table. pub_name is what is shown to the LLM.

customer_table = tables['Customer']
# Change Name of table
customer_table.pub_name = 'Customers'

Filters are the most powerful feature of SQL Shield. Filters let you create subspaces for users. Using filters, you can limit the rows shown to the user. Say, there is a table that contains the data for all 500 teams in your organization. You want each team to see only their data. So you would create a filter now and at runtime, you would insert the team as part of parameters.

Here in this example, we want users to access only the rows of the Customers table that belong to their company.

# Add a filter
customer_table.filters = 'where company = {company}'

You can access the columns of an MTable using the columns which is a set(). It also provides a handy method drop_columns. I will be adding more handy methods based on the user's requests.

But you can simply access all columns from the set and modify it. Each column is MColumn, it has an important field pub_name. If you want to modify the name of a column that is visible to an LLM, just change pub_name. Don't change name.

# Drop some columns
customer_table.drop_columns(set(['Address']))

That's it. Your database is ready!

Generate Schema for LLM

With this MDatabase mDb prepared in previous steps, we can generate schema to augment the prompt as follows:

schema_generated = mDb.generate_schema()

We can send this schema along with the question in the prompt to any LLMs and once LLMs generate SQL, we can re-write the SQL. Say, an LLM has generated a query aSql.

We can now, generate the safe query using the aSQL as follows:

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)

That's it. gSQL would have a query that is safe. Please note that we created a dictionary of parameters and passed it to the constructor of Session because we had parameterized filters that would limit the rows to the company = Talus.

Now, you can execute the gSQL on your actual database peacefully.

Complete Example with Open AI

Here is an example:

import sqlalchemy
from sqlalchemy import inspect
from sqlalchemy import text

from sqlshield.models import *
from sqlshield.shield import *
import os

# TODO: Specify correct OpenAI key
os.environ["OPENAI_API_KEY"] = 'sk-XXXXXXXXX'

# TODO: You can download this SQLite3 DB file: https://github.com/terno-ai/llm-sql-shield/raw/main/tests/chinook.db
# And save it in your current directory
# Connect to DB
engine = sqlalchemy.create_engine('sqlite:///chinook.db')
inspector = inspect(engine)

# Load default DB
mDb = MDatabase.from_inspector(inspector)

mDb.keep_only_tables(set(['Customer', 'Employee', 'Invoice', 'InvoiceLine']))

tables = mDb.get_table_dict()
customer_table = tables['Customer']

# Change Name of table
customer_table.pub_name = 'Customers'

# Add a filter
customer_table.filters = 'where company = {company}'

# Drop some colums
customer_table.drop_columns(set(['Address']))

# Column renaming

question = "Show me all customers."

from openai import OpenAI
client = OpenAI()

schema_generated = mDb.generate_schema()
print('The following schema was generated: ', schema_generated)

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are an SQL Analyst. Your role is to generate the SQL given a question. Only generate SQL nothing else."},
        {"role": "user", "content": question},
        {"role": "assistant", "content": "The tables schema is follows: " + schema_generated},
    ]
)

aSql = response.choices[0].message.content
print('SQL Generated by LLM: ', aSql)

d = {'company':'\'Telus\''}
sess = Session(mDb, d)
gSQL = sess.generateNativeSQL(aSql)
print("Native SQL: ", gSQL)

print(" ===================== ")
with engine.connect() as con:
    rs = con.execute(text(gSQL))
    for row in rs:
        print(row)

Testing

Comment out ext_modules and cmdclass from setup.py first

pip install -e .

Usage

import sqlshield
shield = sqlshield.SQLShield(...)

from sqlshield import SQLShield
shield = SQLShield(...)

Run inside tests directory

coverage run test.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

sqlshield-0.0.12-cp312-cp312-win_amd64.whl (88.8 kB view details)

Uploaded CPython 3.12 Windows x86-64

sqlshield-0.0.12-cp312-cp312-macosx_10_13_universal2.whl (180.5 kB view details)

Uploaded CPython 3.12 macOS 10.13+ universal2 (ARM64, x86-64)

sqlshield-0.0.12-cp311-cp311-win_amd64.whl (89.5 kB view details)

Uploaded CPython 3.11 Windows x86-64

sqlshield-0.0.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (567.7 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

sqlshield-0.0.12-cp311-cp311-macosx_10_9_universal2.whl (179.6 kB view details)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.12-cp310-cp310-win_amd64.whl (88.8 kB view details)

Uploaded CPython 3.10 Windows x86-64

sqlshield-0.0.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (504.0 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

sqlshield-0.0.12-cp310-cp310-macosx_10_9_universal2.whl (180.1 kB view details)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.12-cp39-cp39-win_amd64.whl (89.3 kB view details)

Uploaded CPython 3.9 Windows x86-64

sqlshield-0.0.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (503.8 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

sqlshield-0.0.12-cp39-cp39-macosx_10_9_universal2.whl (180.6 kB view details)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64)

sqlshield-0.0.12-cp38-cp38-win_amd64.whl (89.6 kB view details)

Uploaded CPython 3.8 Windows x86-64

sqlshield-0.0.12-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (511.8 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

sqlshield-0.0.12-cp38-cp38-macosx_11_0_universal2.whl (182.1 kB view details)

Uploaded CPython 3.8 macOS 11.0+ universal2 (ARM64, x86-64)

File details

Details for the file sqlshield-0.0.12-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.12-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 d178d3d65303aae00ece173a96f19e13a1d2d1db9aff47f354845ec6d99fc9df
MD5 ba35fa77bd9bf77dd83135b86b1571f8
BLAKE2b-256 78ba56f89cc0b2c694ab33c188090b17b90d0c9c2887fa25a955b66707afb1d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.12-cp312-cp312-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.12-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.12-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 a4d15bd26ff5ebac3bdede964a5475f463ccc30c64d54f79254a053df848583f
MD5 37d3999ef3556db4fa0e021fa3fd0bbb
BLAKE2b-256 1de8b22f2347d79039a0f9f84340559ac45adabd83212e979b863f393215af14

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.12-cp312-cp312-macosx_10_13_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.12-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.12-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 9128cb4ce28ab79ccf4ff5d5641b2d053ad13a26cb779f1828a67e2f4c5cba1f
MD5 d219a331793f97c1edd98d7b579b5eb0
BLAKE2b-256 1fd1f822d218964c95829e3f3951fba80bbc649d51045614a68d390e530620b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.12-cp311-cp311-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 263fbca7c134b0b38bf844513b9a61f1d46f2df5473f85410c3cef5571acce3c
MD5 0b72eb1340f23f14d86555156a89b9f9
BLAKE2b-256 1b5d9f5b2615d88023e7277a6ba3044ff38ca341025f9a6a1f1f4b7221bbe119

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.12-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.12-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 407539115c56332826f1bcbe1af7cce9047af080c99d4fe37baf15cf11d5bd62
MD5 b1d6a6d449d885acc2d6499a68c929ae
BLAKE2b-256 a54f9842c496f5ae2f61524550afe1f8e047c8588a0177d1b53e7a4b6f01a814

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.12-cp311-cp311-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.12-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.12-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 fe9f2305f9626bcca2d9eb86ce01a8b38baa53e5332b384112f297fda334f36e
MD5 646629c3fb0c0db27d56b02f2731d70c
BLAKE2b-256 f499c746d4162ce10b2dcdd2bf18cf9f767aaaf6a031d5f73a15ece5c97b08ba

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.12-cp310-cp310-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 064ade85cfdef6d9455dae706ca01f47674d0045436c0489b6b26ce6af7e8f32
MD5 4c5f4c4f89c3bef4d3444881a437b764
BLAKE2b-256 87c2524336eb2eed544587c3e25111df4352600a05c081f4059a35b2b6ab08f0

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.12-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.12-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 4f5a34c2a04444c4798567cb94cc8da4e8b209e760c9225c109f1b286a3d4395
MD5 bf61dc3522c5959aa00dcd6aefada61b
BLAKE2b-256 71b35ef78ba925a8dcce7316c88cc30a83c01ca4b11b8c29f084ffd086a87216

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.12-cp310-cp310-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.12-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.12-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 89.3 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for sqlshield-0.0.12-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 eec019bb9f757d08babe8afe0d443cc019ca00239838c7b258cddc7fbfc991ba
MD5 a95df525e3098da9b1b56f744962f3d2
BLAKE2b-256 5198001f2192046e7ef647ca5d400fd3cbd40bb548ad0d5017949bf05095e0dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.12-cp39-cp39-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 88425a43e07588ba946f50e7e3d7b2e8dfe90c6f77494f17c34efba6127baa35
MD5 ca161b0af6e99751b6e6c873602d9c33
BLAKE2b-256 99d68a875e806b2c1c971f5496052fbd3926fac09252029c245ecdda47008f79

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.12-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.12-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 68b849c573fab7c95f42d37358f4aed4c0d44b4a13b04334250c2c64dafcae9e
MD5 078ebb7b971436ea27654659714896c2
BLAKE2b-256 46f79ecebddbcab57d677cf4de183f3e99c8fe35c3c04743379552652a56f94e

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.12-cp39-cp39-macosx_10_9_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.12-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: sqlshield-0.0.12-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 89.6 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for sqlshield-0.0.12-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 99d2e36bddf8e5c6053c942babc05e0b9251152c8c08b263a0edea55692be459
MD5 a729faebbb2e97a59139ad5abde2b98b
BLAKE2b-256 e6e75ff0736c0fbcb11dddb24c00c0207f11449bf59092e892fb251d72d619db

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.12-cp38-cp38-win_amd64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.12-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.12-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c85fa6fb089beb7305787df7b8ea80ed9acf28d828f3408fad7f9ff71e831bb2
MD5 4600a48a000a845cb9c3e83cd6a80f96
BLAKE2b-256 ab7b06a57f2a6ab8a342d209243814b52712f9aac967d3c6325e89239243004e

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.12-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

File details

Details for the file sqlshield-0.0.12-cp38-cp38-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for sqlshield-0.0.12-cp38-cp38-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 cfccf56a7b048b7826ef6b0cec89eddd816465a390942d8c47b1d2d2e0b142da
MD5 b7ce9afa541a05b4d2063dd7d631798d
BLAKE2b-256 1c7bb1dac55a2c9a3ce167e14c9dda8fe5c91e7a0f2b6a169044fa5919a1087a

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlshield-0.0.12-cp38-cp38-macosx_11_0_universal2.whl:

Publisher: build.yml on terno-ai/llm-sql-shield

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page