Construct trusted SQL queries from untrusted input
Project description
HeimdaLLM
Heimdall, the watchman of the gods, dwelt at its entrance, where he guarded Bifrost, the shimmering path connecting the realms.
HeimdaLLM safely bridges the gap between untrusted human input and trusted machine-readable output by augmenting LLMs with a robust validation framework. This enables you externalize LLM technology to your users, so that you can do things like execute trusted SQL queries from their untrusted input.
To accomplish this, HeimdaLLM introduces a new technology, the 🌈✨ Bifrost, composed of 4 parts: an LLM prompt envelope, an LLM integration, a grammar, and a constraint validator. These 4 components operate as a single unit—a Bifrost—which is capable of translating untrusted human input into trusted machine output.
✨ This allows you to perform magic ✨
Imagine giving your users natural language access to their data in your database, without having to worry about dangerous queries. This is an actual query on the Sakila Sample Database:
traverse("Show me the movies I rented the longest, and the number of days I had them for.")
✅ Ensuring SELECT statement...
✅ Resolving column and table aliases...
✅ Allowlisting selectable columns...
✅ Removing 4 forbidden columns...
✅ Ensuring correct row LIMIT exists...
✅ Lowering row LIMIT to 5...
✅ Checking JOINed tables and conditions...
✅ Checking required WHERE conditions...
✅ Ensuring query is constrained to requester's identity...
✅ Allowlisting SQL functions...
Title | Rental Date | Return Date | Rental Days |
---|---|---|---|
OUTLAW HANKY | 2005-08-19 05:48:12.000 | 2005-08-28 10:10:12.000 | 9.181944 |
BOULEVARD MOB | 2005-08-19 07:06:51.000 | 2005-08-28 10:35:51.000 | 9.145139 |
MINDS TRUMAN | 2005-08-02 17:42:49.000 | 2005-08-11 18:14:49.000 | 9.022222 |
AMERICAN CIRCUS | 2005-07-12 16:37:55.000 | 2005-07-21 16:04:55.000 | 8.977083 |
LADY STAGE | 2005-07-28 10:07:04.000 | 2005-08-06 08:16:04.000 | 8.922917 |
You can safely run this example here:
or view the read-only notebook
📋 Explanation
So, what is actually happening above?
- Unsafe free-form input is provided, presumably from some front end user interface.
- That unsafe input is wrapped in a prompt envelope, producing a prompt with additional context to help an LLM produce a correct query.
- The unsafe prompt is sent to an LLM of your choice, which then produces an unsafe SQL query.
- The LLM response is parsed by a strict grammar which defines only the SQL features that are allowed.
- If parsing succeeds, we know at the very least we're dealing with a valid SQL query albeit an untrusted one.
- Different features of the parsed query are extracted for validation.
- A soft validation pass is performed on the extracted features, and we potentially
modify the query to be compliant, for example, to add a
LIMIT
clause, or to remove disallowed columns. - A hard validation pass is performed with your custom constraints to ensure that the query is only accessing allowed tables, columns, and functions, while containing required conditions.
- If validation succeeds, the resulting SQL query can then be sent to the database.
- If validation fails, you'll see a helpful exception explaining exactly why.
🥽 Safety
I am in the process of organizing an independent security audit of HeimdaLLM. Until this audit is complete, I do not recommend using HeimdaLLM against any production system without a careful risk assessment. These audits are self-funded, so if you will get value from the confidence that they bring, consider sponsoring me or inquire about interest in a commercial license.
To understand some of the potential vulnerabilities, take a look at the attack surface to see the risks and the mitigations.
📚 Database support
- Sqlite
- MySQL
There is active development for the other top relational SQL databases. To help me prioritize, please vote on which database you would like to see supported:
📜 License
HeimdaLLM is dual-licensed for open-source or for commercial use.
🤝 Open-source license
The open-source license is AGPLv3, which permits free usage, modification, and distribution, and is appropriate for individual or open-source usage. For commercial usage, AGPLv3 has key obligations that your organization may want to avoid:
-
Source Code Disclosure: Any changes you make and use over a network must be made publicly available, potentially revealing your proprietary modifications.
-
Copyleft Clause: If HeimdaLLM is integrated into your application, the whole application may need to adhere to AGPLv3 terms, including code disclosure of your application.
-
Service Providers: If you use HeimdaLLM to provide services, your clients also need to abide by AGPLv3, complicating contracts.
📈 Commercial license
The commercial license eliminates the above restrictions, providing flexibility and protection for your business operations. This commercial license is recommended for commercial use. Please inquire about a commerical license here:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for heimdallm-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b11d06f6851479a2a0805dca7160d0b1cfc9c499e17940ed9f08ec71e28a2d0 |
|
MD5 | 862f16c18098f4c13b2f947fd44674f1 |
|
BLAKE2b-256 | 60b3523ccbba81b569003155d33a20b6ac34f6a1fd7736e85002910b07d138fa |