Skip to main content

Dynamic Query-Driven Table Generation with LLMs

Project description

License: MIT Python application Package version

SwellDB

Query any data — from LLMs, databases, or the web — using just DataFrames or SQL

Overview

SwellDB is a new kind of data system that enables SQL-based analytical querying over dynamically generated tables. These tables are synthesized in real-time from a combination of sources, including:

  • Large Language Models (LLMs)
  • Existing databases
  • File formats (e.g., CSV, Parquet)
  • Web search results

Unlike traditional systems operating under a closed-world assumption (queries only run on pre-loaded data), SwellDB generates tables on-demand, tailored to user-defined prompts and schemas.

This enables bridging structured SQL querying with the flexibility of unstructured data retrieval.

SwellDB Architecture

Figure: SwellDB Architecture

Key Features

  • 🔄 Dynamic Table Generation
    Automatically synthesizes tables on-the-fly from queries and schema prompts — no need for preloaded data.

  • 🌐 Multi-Source Integration
    Combines data from:

    • Large Language Models (LLMs)
    • Structured sources (e.g., CSV, SQL databases)
    • Unstructured sources (e.g., web pages, text files)
    • Web search results
  • 🧠 LLM-Powered Reasoning
    Uses LLMs to:

    • Generate SQL queries over datasets
    • Extract, augment, and synthesize missing information
    • Transform unstructured text into structured tables
  • 🧩 Modular & Extensible
    Easy to plug in new data sources via a clean Data Source API (structured + unstructured).

  • 🧪 Fully SQL-Compatible
    Query generated tables with standard SQL — powered by Apache DataFusion.

  • 🌍 Open-World Query Execution
    Go beyond what’s stored — SwellDB fetches or generates the missing pieces on demand.

  • ⚡ Seamless Developer Experience
    Define tables declaratively using natural language and schema annotations. Then just write SQL.

Use Cases — Examples

  • Populating relational databases from unstructured sources
    Generate tables for a relational database with DSA interview questions in SQLite.

  • Ad-hoc querying across hybrid sources
    Seamlessly blend local CSVs, remote databases, LLM completions, and web results into a unified DataFrame. See example.

  • Building completely new tables on-the-fly
    Dynamically generate subject-specific datasets without predefining complex ETL pipelines. See example.

🚀 Get Started

Install SwellDB

pip install swelldb

Obtain OpenAI API Key

To run the following example, you need to obtain an API key for OpenAI. You can sign up for OpenAI here. Then you can set the API keys as environment variables:

export OPENAI_API_KEY=your_openai_api_key

Create a table

from swelldb import SwellDB, OpenAILLM

swelldb: SwellDB = SwellDB(OpenAILLM(model="gpt-4o"))

table_builder = swelldb.table_builder()
table_builder.set_content("A table that contains all the US states")
table_builder.set_schema("state_name str, region str")

tbl = table_builder.build()

# Explore the table generation plan
tbl.explain()

# Create the table
table = tbl.materialize()

print(table.to_pandas())

Output

    state_name     region
0      Alabama      South
1       Alaska       West
2      Arizona       West
3     Arkansas      South
4   California       West

Querying with SQL using DataFusion

import datafusion
import pyarrow as pa

sc = datafusion.SessionContext()
sc.register_dataset("us_states", pa.dataset.dataset(table))

# Get 5 states from the South region
print(sc.sql("SELECT * FROM us_states where region = 'South' LIMIT 5"))

# Count the number of states per region
print(sc.sql("SELECT COUNT(*), region FROM us_states GROUP BY region"))

Output

DataFrame()
+------------+--------+
| state_name | region |
+------------+--------+
| Alabama    | South  |
| Arkansas   | South  |
| Delaware   | South  |
| Florida    | South  |
| Georgia    | South  |
+------------+--------+
DataFrame()
+----------+-----------+
| count(*) | region    |
+----------+-----------+
| 12       | Midwest   |
| 9        | Northeast |
| 16       | South     |
| 13       | West      |
+----------+-----------+

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swelldb-0.1.2.tar.gz (25.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swelldb-0.1.2-py3-none-any.whl (38.2 kB view details)

Uploaded Python 3

File details

Details for the file swelldb-0.1.2.tar.gz.

File metadata

  • Download URL: swelldb-0.1.2.tar.gz
  • Upload date:
  • Size: 25.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for swelldb-0.1.2.tar.gz
Algorithm Hash digest
SHA256 4a36f04e464dc54f33abe62a923b293b7a1b56ff32c26926344e9bb3173b3cb9
MD5 6c0af409abf0ad75456cd97836a68498
BLAKE2b-256 4115e375df7dfd05a57c52515f6bc725abd7f8a8ab4b2c6483420cee27a106f4

See more details on using hashes here.

File details

Details for the file swelldb-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: swelldb-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 38.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for swelldb-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7424cb7353659963b40ab80b98de33554ccea26364df09970e0208040e3836cd
MD5 c8a64830b6bc1600187990b1ad44f0e6
BLAKE2b-256 a9c56bcff9c6cbd62b6312b24ae711ea3c42e771deccc188b40e7f3276a1432d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page