Skip to main content

Dynamic Query-Driven Table Generation with LLMs

Project description

License: MIT Python application Package version

SwellDB

Query any data — from LLMs, databases, or the web — using just DataFrames or SQL

Overview

SwellDB is a new kind of data system that enables SQL-based analytical querying over dynamically generated tables. These tables are synthesized in real-time from a combination of sources, including:

  • Large Language Models (LLMs)
  • Existing databases
  • File formats (e.g., CSV, Parquet)
  • Web search results

Unlike traditional systems operating under a closed-world assumption (queries only run on pre-loaded data), SwellDB generates tables on-demand, tailored to user-defined prompts and schemas.

This enables bridging structured SQL querying with the flexibility of unstructured data retrieval.

SwellDB Architecture

Figure: SwellDB Architecture

Key Features

  • 🔄 Dynamic Table Generation
    Automatically synthesizes tables on-the-fly from queries and schema prompts — no need for preloaded data.

  • 🌐 Multi-Source Integration
    Combines data from:

    • Large Language Models (LLMs)
    • Structured sources (e.g., CSV, SQL databases)
    • Unstructured sources (e.g., web pages, text files)
    • Web search results
  • 🧠 LLM-Powered Reasoning
    Uses LLMs to:

    • Generate SQL queries over datasets
    • Extract, augment, and synthesize missing information
    • Transform unstructured text into structured tables
  • 🧩 Modular & Extensible
    Easy to plug in new data sources via a clean Data Source API (structured + unstructured).

  • 🧪 Fully SQL-Compatible
    Query generated tables with standard SQL — powered by Apache DataFusion.

  • 🌍 Open-World Query Execution
    Go beyond what’s stored — SwellDB fetches or generates the missing pieces on demand.

  • ⚡ Seamless Developer Experience
    Define tables declaratively using natural language and schema annotations. Then just write SQL.

Use Cases — Examples

  • Populating relational databases from unstructured sources
    Generate tables for a relational database with DSA interview questions in SQLite.

  • Ad-hoc querying across hybrid sources
    Seamlessly blend local CSVs, remote databases, LLM completions, and web results into a unified DataFrame. See example.

  • Building completely new tables on-the-fly
    Dynamically generate subject-specific datasets without predefining complex ETL pipelines. See example.

🚀 Get Started

Install SwellDB

pip install swelldb

Obtain OpenAI API Key

To run the following example, you need to obtain an API key for OpenAI. You can sign up for OpenAI here. Then you can set the API keys as environment variables:

export OPENAI_API_KEY=your_openai_api_key

Create a table

from swelldb import SwellDB

swelldb: SwellDB = SwellDB()

table_builder = swelldb.table_builder()
table_builder.set_content("A table that contains all the US states")
table_builder.set_schema("state_name str, region str")

tbl = table_builder.build()

# Explore the table generation plan
tbl.explain()

# Create the table
table = tbl.materialize()

print(table.to_pandas())

Output

    state_name     region
0      Alabama      South
1       Alaska       West
2      Arizona       West
3     Arkansas      South
4   California       West

Querying with SQL using DataFusion

import datafusion
import pyarrow as pa

sc = datafusion.SessionContext()
sc.register_dataset("us_states", pa.dataset.dataset(table))

# Get 5 states from the South region
print(sc.sql("SELECT * FROM us_states where region = 'South' LIMIT 5"))

# Count the number of states per region
print(sc.sql("SELECT COUNT(*), region FROM us_states GROUP BY region"))

Output

DataFrame()
+------------+--------+
| state_name | region |
+------------+--------+
| Alabama    | South  |
| Arkansas   | South  |
| Delaware   | South  |
| Florida    | South  |
| Georgia    | South  |
+------------+--------+
DataFrame()
+----------+-----------+
| count(*) | region    |
+----------+-----------+
| 12       | Midwest   |
| 9        | Northeast |
| 16       | South     |
| 13       | West      |
+----------+-----------+

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swelldb-0.1.0.tar.gz (25.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swelldb-0.1.0-py3-none-any.whl (38.0 kB view details)

Uploaded Python 3

File details

Details for the file swelldb-0.1.0.tar.gz.

File metadata

  • Download URL: swelldb-0.1.0.tar.gz
  • Upload date:
  • Size: 25.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for swelldb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d4d52b8687f3469d6eadce13b2c076dcd801352dc797f29c009f498ef9382c5d
MD5 83cfc5c91eba15701e14ee848ca01ea8
BLAKE2b-256 1619b064fe96f2ed35bad8b33d46e3ded31068e34733b3f378e80ef8b4096509

See more details on using hashes here.

File details

Details for the file swelldb-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: swelldb-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 38.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for swelldb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2f6a1a48f40133109e278f9aa2afc2258df319a66c9066a95d945837d79f0847
MD5 67d24cf1904423a3fad0e840de33c118
BLAKE2b-256 5b3f29702d01c0b7bc1ed191f96ce45db4740f79969ea3682d2312a01593d5cb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page