Skip to main content

Dynamic Query-Driven Table Generation with LLMs

Project description

License: MIT Python application

SwellDB

Query any data — from LLMs, databases, or the web — using just DataFrames or SQL

Overview

SwellDB is a new kind of data system that enables SQL-based analytical querying over dynamically generated tables. These tables are synthesized in real-time from a combination of sources, including:

  • Large Language Models (LLMs)
  • Existing databases
  • File formats (e.g., CSV, Parquet)
  • Web search results

Unlike traditional systems operating under a closed-world assumption (queries only run on pre-loaded data), SwellDB generates tables on-demand, tailored to user-defined prompts and schemas.

This enables bridging structured SQL querying with the flexibility of unstructured data retrieval.

SwellDB Architecture

Figure: SwellDB Architecture

Key Features

  • 🔄 Dynamic Table Generation
    Automatically synthesizes tables on-the-fly from queries and schema prompts — no need for preloaded data.

  • 🌐 Multi-Source Integration
    Combines data from:

    • Large Language Models (LLMs)
    • Structured sources (e.g., CSV, SQL databases)
    • Unstructured sources (e.g., web pages, text files)
    • Web search results
  • 🧠 LLM-Powered Reasoning
    Uses LLMs to:

    • Generate SQL queries over datasets
    • Extract, augment, and synthesize missing information
    • Transform unstructured text into structured tables
  • 🧩 Modular & Extensible
    Easy to plug in new data sources via a clean Data Source API (structured + unstructured).

  • 🧪 Fully SQL-Compatible
    Query generated tables with standard SQL — powered by Apache DataFusion.

  • 🌍 Open-World Query Execution
    Go beyond what’s stored — SwellDB fetches or generates the missing pieces on demand.

  • ⚡ Seamless Developer Experience
    Define tables declaratively using natural language and schema annotations. Then just write SQL.

Use Cases — Examples

  • Populating relational databases from unstructured sources
    Generate tables for a relational database with DSA interview questions in SQLite.

  • Ad-hoc querying across hybrid sources
    Seamlessly blend local CSVs, remote databases, LLM completions, and web results into a unified DataFrame. See example.

  • Building completely new tables on-the-fly
    Dynamically generate subject-specific datasets without predefining complex ETL pipelines. See example.

🚀 Get Started

Pre-requisites — Obtain OpenAI and Serper API Keys

To run the example, you need to obtain API keys for OpenAI and Serper. You can sign up for OpenAI here and for Serper here. Then you can set the API keys as environment variables:

export OPENAI_API_KEY=your_openai_api_key
export SERPER_API_KEY=your_serper_api_key

Create a table

from swelldb import SwellDB

swelldb: SwellDB = SwellDB()

table_builder = swelldb.table_builder()
table_builder.set_content("A table that contains all the US states")
table_builder.set_schema("state_name str, region str")

tbl = table_builder.build()

# Explore the table generation plan
tbl.explain()

# Create the table
table = tbl.materialize()

print(table.to_pandas())

Output

    state_name     region
0      Alabama      South
1       Alaska       West
2      Arizona       West
3     Arkansas      South
4   California       West

Querying with SQL using DataFusion

import datafusion
import pyarrow as pa

sc = datafusion.SessionContext()
sc.register_dataset("us_states", pa.dataset.dataset(table))

# Get 5 states from the South region
print(sc.sql("SELECT * FROM us_states where region = 'South' LIMIT 5"))

# Count the number of states per region
print(sc.sql("SELECT COUNT(*), region FROM us_states GROUP BY region"))

Output

DataFrame()
+------------+--------+
| state_name | region |
+------------+--------+
| Alabama    | South  |
| Arkansas   | South  |
| Delaware   | South  |
| Florida    | South  |
| Georgia    | South  |
+------------+--------+
DataFrame()
+----------+-----------+
| count(*) | region    |
+----------+-----------+
| 12       | Midwest   |
| 9        | Northeast |
| 16       | South     |
| 13       | West      |
+----------+-----------+

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swelldb-0.0.1.tar.gz (25.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swelldb-0.0.1-py3-none-any.whl (37.9 kB view details)

Uploaded Python 3

File details

Details for the file swelldb-0.0.1.tar.gz.

File metadata

  • Download URL: swelldb-0.0.1.tar.gz
  • Upload date:
  • Size: 25.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for swelldb-0.0.1.tar.gz
Algorithm Hash digest
SHA256 a7ae0d0a84a812bef7420c149649a826b72ebbd4de7bec2f89271519e0705e2f
MD5 7f583e86861a8946fed180c25b21e7b5
BLAKE2b-256 3c1153be0877c2434e37237cfd1e74d1f8234c12b50c1aa9a10fed31f3d2ae95

See more details on using hashes here.

File details

Details for the file swelldb-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: swelldb-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 37.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for swelldb-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 41d9ab643f831c605879528c6e2abffa67277afd2bc47658cf208916e06b2197
MD5 ceedd25eaee2e88c839eb42d032380de
BLAKE2b-256 b3149683fc4af692f4bc940fdab8962dc0118df54f67cad917515e11091d0261

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page