Dynamic Query-Driven Table Generation with LLMs
Project description
SwellDB
Query any data — from LLMs, databases, or the web — using just DataFrames or SQL
Overview
SwellDB is a new kind of data system that enables SQL-based analytical querying over dynamically generated tables. These tables are synthesized in real-time from a combination of sources, including:
- Large Language Models (LLMs)
- Existing databases
- File formats (e.g., CSV, Parquet)
- Web search results
Unlike traditional systems operating under a closed-world assumption (queries only run on pre-loaded data), SwellDB generates tables on-demand, tailored to user-defined prompts and schemas.
This enables bridging structured SQL querying with the flexibility of unstructured data retrieval.
Figure: SwellDB Architecture
Key Features
-
🔄 Dynamic Table Generation
Automatically synthesizes tables on-the-fly from queries and schema prompts — no need for preloaded data. -
🌐 Multi-Source Integration
Combines data from:- Large Language Models (LLMs)
- Structured sources (e.g., CSV, SQL databases)
- Unstructured sources (e.g., web pages, text files)
- Web search results
-
🧠 LLM-Powered Reasoning
Uses LLMs to:- Generate SQL queries over datasets
- Extract, augment, and synthesize missing information
- Transform unstructured text into structured tables
-
🧩 Modular & Extensible
Easy to plug in new data sources via a clean Data Source API (structured + unstructured). -
🧪 Fully SQL-Compatible
Query generated tables with standard SQL — powered by Apache DataFusion. -
🌍 Open-World Query Execution
Go beyond what’s stored — SwellDB fetches or generates the missing pieces on demand. -
⚡ Seamless Developer Experience
Define tables declaratively using natural language and schema annotations. Then just write SQL.
Use Cases — Examples
-
Populating relational databases from unstructured sources
Generate tables for a relational database with DSA interview questions in SQLite. -
Ad-hoc querying across hybrid sources
Seamlessly blend local CSVs, remote databases, LLM completions, and web results into a unified DataFrame. See example. -
Building completely new tables on-the-fly
Dynamically generate subject-specific datasets without predefining complex ETL pipelines. See example.
🚀 Get Started
Install SwellDB
pip install swelldb
Obtain OpenAI API Key
To run the following example, you need to obtain an API key for OpenAI. You can sign up for OpenAI here. Then you can set the API keys as environment variables:
export OPENAI_API_KEY=your_openai_api_key
Create a table
from swelldb import SwellDB, OpenAILLM
swelldb: SwellDB = SwellDB(OpenAILLM(model="gpt-4o"))
table_builder = swelldb.table_builder()
table_builder.set_content("A table that contains all the US states")
table_builder.set_schema("state_name str, region str")
tbl = table_builder.build()
# Explore the table generation plan
tbl.explain()
# Create the table
table = tbl.materialize()
print(table.to_pandas())
Output
state_name region
0 Alabama South
1 Alaska West
2 Arizona West
3 Arkansas South
4 California West
Querying with SQL using DataFusion
import datafusion
import pyarrow as pa
sc = datafusion.SessionContext()
sc.register_dataset("us_states", pa.dataset.dataset(table))
# Get 5 states from the South region
print(sc.sql("SELECT * FROM us_states where region = 'South' LIMIT 5"))
# Count the number of states per region
print(sc.sql("SELECT COUNT(*), region FROM us_states GROUP BY region"))
Output
DataFrame()
+------------+--------+
| state_name | region |
+------------+--------+
| Alabama | South |
| Arkansas | South |
| Delaware | South |
| Florida | South |
| Georgia | South |
+------------+--------+
DataFrame()
+----------+-----------+
| count(*) | region |
+----------+-----------+
| 12 | Midwest |
| 9 | Northeast |
| 16 | South |
| 13 | West |
+----------+-----------+
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file swelldb-0.1.2.tar.gz.
File metadata
- Download URL: swelldb-0.1.2.tar.gz
- Upload date:
- Size: 25.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a36f04e464dc54f33abe62a923b293b7a1b56ff32c26926344e9bb3173b3cb9
|
|
| MD5 |
6c0af409abf0ad75456cd97836a68498
|
|
| BLAKE2b-256 |
4115e375df7dfd05a57c52515f6bc725abd7f8a8ab4b2c6483420cee27a106f4
|
File details
Details for the file swelldb-0.1.2-py3-none-any.whl.
File metadata
- Download URL: swelldb-0.1.2-py3-none-any.whl
- Upload date:
- Size: 38.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7424cb7353659963b40ab80b98de33554ccea26364df09970e0208040e3836cd
|
|
| MD5 |
c8a64830b6bc1600187990b1ad44f0e6
|
|
| BLAKE2b-256 |
a9c56bcff9c6cbd62b6312b24ae711ea3c42e771deccc188b40e7f3276a1432d
|