Skip to main content

Query language to blend SQL logic and LLM reasoning across multi-modal data.

Project description


blendsql

SQL 🤝 LLMs

Check out our online documentation for a more comprehensive overview.

Results from the paper are available here


pip install blendsql

BlendSQL is a superset of SQLite for problem decomposition and hybrid question-answering with LLMs.

As a result, we can Blend together...

  • 🥤 ...operations over heterogeneous data sources (e.g. tables, text, images)
  • 🥤 ...the structured & interpretable reasoning of SQL with the generalizable reasoning of LLMs

It can be viewed as an inversion of the typical text-to-SQL paradigm, where a user calls a LLM, and the LLM calls a SQL program.

Now, the user is given the control to oversee all calls (LLM + SQL) within a unified query language.

comparison

For example, imagine we have the following table titles parks, containing info on national parks in the United States.

We can use BlendSQL to build a travel planning LLM chatbot to help us navigate the options below.

Name Image Location Area Recreation Visitors (2022) Description
Death Valley death_valley.jpeg California, Nevada 3,408,395.63 acres (13,793.3 km2) 1,128,862 Death Valley is the hottest, lowest, and driest place in the United States, with daytime temperatures that have exceeded 130 °F (54 °C).
Everglades everglades.jpeg Alaska 7,523,897.45 acres (30,448.1 km2) 9,457 The country's northernmost park protects an expanse of pure wilderness in Alaska's Brooks Range and has no park facilities.
New River Gorge new_river_gorge.jpeg West Virgina 7,021 acres (28.4 km2) 1,593,523 The New River Gorge is the deepest river gorge east of the Mississippi River.
Katmai katmai.jpg Alaska 3,674,529.33 acres (14,870.3 km2) 33,908 This park on the Alaska Peninsula protects the Valley of Ten Thousand Smokes, an ash flow formed by the 1912 eruption of Novarupta.

BlendSQL allows us to ask the following questions by injecting "ingredients", which are callable functions denoted by double curly brackets ({{, }}).

Which parks don't have park facilities?

SELECT * FROM parks
    WHERE NOT {{
        LLMValidate(
            'Does this location have park facilities?',
            context=(SELECT "Name" AS "Park", "Description" FROM parks),
        )
    }}

What does the largest park in Alaska look like?

SELECT {{VQA('Describe this image.', 'parks::Image')}} FROM parks
    WHERE "Location" = 'Alaska'
    ORDER BY {{
        LLMMap(
            'Size in km2?',
            'parks::Area'
        )
    }} LIMIT 1

Which park protects an ash flow formed by a volcano eruption?

{{
    LLMQA(
      'Which park protects an ash flow formed by a volcano?',
      context=(SELECT "Name", "Description" FROM parks),
      options="parks::Name"
    ) 
}}

For in-depth descriptions of the above queries, check out our documentation.

Features

  • Supports many DBMS 💾
    • Currently, SQLite and PostgreSQL are functional - more to come!
  • Easily extendable to multi-modal usecases 🖼️
  • Smart parsing optimizes what is passed to external functions 🧠
    • Traverses abstract syntax tree with sqlglot to minimize LLM function calls 🌳
  • Constrained decoding with outlines 🚀
  • LLM function caching, built on diskcache 🔑

Quickstart

from blendsql import blend, LLMQA
from blendsql.db import SQLite
from blendsql.models import OpenaiLLM, TransformersLLM
from blendsql.utils import fetch_from_hub

blendsql = """
SELECT * FROM w
WHERE city = {{
    LLMQA(
        'Which city is located 120 miles west of Sydney?',
        (SELECT * FROM documents WHERE documents MATCH 'sydney OR 120'),
        options='w::city'
    )
}}
"""
# Make our smoothie - the executed BlendSQL script
smoothie = blend(
    query=blendsql,
    db=SQLite(fetch_from_hub("1884_New_Zealand_rugby_union_tour_of_New_South_Wales_1.db")),
    blender=OpenaiLLM("gpt-3.5-turbo"),
    # If you don't have OpenAI setup, you can use this small Transformers model below instead
    # blender=TransformersLLM("Qwen/Qwen1.5-0.5B"),
    ingredients={LLMQA},
    verbose=True
)
print(smoothie.df)
print(smoothie.meta.prompts)

Citation

@article{glenn2024blendsql,
      title={BlendSQL: A Scalable Dialect for Unifying Hybrid Question Answering in Relational Algebra},
      author={Parker Glenn and Parag Pravin Dakle and Liang Wang and Preethi Raghavan},
      year={2024},
      eprint={2402.17882},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Acknowledgements

Special thanks to those below for inspiring this project. Definitely recommend checking out the linked work below, and citing when applicable!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blendsql-0.0.17.tar.gz (125.6 kB view details)

Uploaded Source

Built Distribution

blendsql-0.0.17-py3-none-any.whl (116.2 kB view details)

Uploaded Python 3

File details

Details for the file blendsql-0.0.17.tar.gz.

File metadata

  • Download URL: blendsql-0.0.17.tar.gz
  • Upload date:
  • Size: 125.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for blendsql-0.0.17.tar.gz
Algorithm Hash digest
SHA256 0859b0c9c5e9ab7e80a9e726eaf7493175b498d879dc48fcc47746481e66a630
MD5 242ecbd907069c6823900e11f31b4506
BLAKE2b-256 9c640179bb9ce42b4fb868af5183df1d2f2e2eb842d817e9de7658be07091fcd

See more details on using hashes here.

File details

Details for the file blendsql-0.0.17-py3-none-any.whl.

File metadata

  • Download URL: blendsql-0.0.17-py3-none-any.whl
  • Upload date:
  • Size: 116.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for blendsql-0.0.17-py3-none-any.whl
Algorithm Hash digest
SHA256 07729d93e6f3a10e00fb0fbb224e019c1efb8db27e8936aa736f29d7020143d9
MD5 6e3bf14879f7de6bdd5a6742b631cedb
BLAKE2b-256 03d28ad52f515993c55d146e7f2fed1d4d835b6e72d7b689bb3d7d02e217b9a0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page