A powerful Python utility designed to efficiently reconstruct nested SQLModel object graphs from a flat Pandas DataFrame
Project description
DanticSQL
DanticSQL is a powerful Python utility designed to efficiently reconstruct nested SQLModel object graphs from a flat Pandas DataFrame. It's the perfect tool for when you perform a complex SQL JOIN query and need to transform the resulting tabular data back into your structured, relational Pydantic/SQLModel objects.
It bridges the gap between database query results and a clean, object-oriented data representation in your application.
The Problem It Solves
When using an ORM like SQLAlchemy with SQLModel, you define neat relationships between your data models (e.g., a User has many Posts). However, when you fetch data with a JOIN to get all information in one query, you get a flat, denormalized table.
For example, a LEFT JOIN between users and posts might produce a result like this:
| user_id | name | post_id | title |
|---|---|---|---|
| 1 | Alice | 101 | Hello World |
| 1 | Alice | 102 | My Second Post |
| 2 | Bob | 103 | A Post by Bob |
The challenge is to efficiently parse this DataFrame back into two User objects and three Post objects, with user_alice.posts correctly containing a list of her two posts. Manually looping through this data is inefficient and error-prone. DanticSQL automates this entire process.
How It Works
The library operates in a two-stage process:
1. Parsing and Grouping (pydantic_all)
First, DanticSQL processes the input DataFrame for each of your SQLModel tables.
- Grouping: It groups the
DataFrameby each table's primary key. This consolidates all rows belonging to a single object instance (like the two rows foruser_id=1above). - Intelligent Aggregation:
- For regular data fields (
name), it takes the first value (as it should be the same across all grouped rows). - For fields that represent relationship keys (
post_id), it aggregates all unique values into aset. This correctly captures one-to-many and many-to-many relationships.
- For regular data fields (
- Pydantic Validation: It uses
pydantic.TypeAdapterfor highly efficient, bulk validation and creation of yourSQLModelinstances from the cleaned data.
At the end of this stage, you have distinct, de-duplicated instances of each model, but they are not yet linked together.
2. Connecting Relationships (connect_all)
This is where the magic happens. The connect_all method links the newly created objects.
- Lookup Maps: It first creates fast lookup dictionaries (hash maps) for all created instances, mapping each object's primary key to the object itself (e.g.,
{1: <User object with id=1>}). - Relationship Data: It then iterates over the intermediate relationship data generated in the first step.
- Linking: For each source object, it finds the corresponding foreign key values it needs to link. It uses these keys to instantly retrieve the related objects from the lookup maps and sets the relationship attribute (
user.postsorpost.user) accordingly.
This approach avoids slow, nested loops and is significantly more performant, especially for large datasets.
Key Features
- Efficient: Leverages
pandasfor high-performance grouping andpydanticfor fast, C-optimized data validation. - Relationship Aware: Automatically inspects
SQLModelrelationships (One-to-Many,Many-to-One,Many-to-Many) to guide the reconstruction process. - Decoupled: Separates the data-fetching logic (your SQL query) from the object-modeling logic.
- Robust: Handles single and composite primary keys.
Installation
This package is not published on PyPI yet. You can install it directly from the GitHub repository:
pip install https://github.com/saya-ashen/DanticSQL/releases/download/0.1.0/danticsql-0.1.0-py3-none-any.whl
Usage Example
Let's walk through a complete example with a User and Post relationship.
1. Define Your SQLModels
from typing import List, Optional
from danticsql import DanticSQL
from sqlmodel import Field, Relationship, SQLModel
import pandas as pd
class User(SQLModel, table=True):
# NOTE: The column name must be unique in all tables which you want to query.
# WRONG EXAMPLE: use id instead of user_id
user_id: Optional[int] = Field(default=None, primary_key=True)
name: str
posts: List["Post"] = Relationship(back_populates="user")
class Post(SQLModel, table=True):
# NOTE: The column name must be unique in all tables which you want to query.
post_id: Optional[int] = Field(default=None, primary_key=True)
title: str
user_id: Optional[int] = Field(default=None, foreign_key="user.user_id")
user: Optional[User] = Relationship(back_populates="posts")
2. Simulate a DataFrame
This DataFrame is what you might get from a JOIN query. Note that all column names are unique.
# Note: DanticSQL expects related table PKs to be present.
data = {
"user_id": [1, 1, 2, 3],
"name": ["Alice", "Alice", "Bob", "Charlie"],
"post_id": [101, 102, 103, None],
"title": ["Post 1 by Alice", "Post 2 by Alice", "Post 1 by Bob", None],
}
df = pd.DataFrame(data, dtype="object")
3. Use DanticSQL to Reconstruct Objects
# 1. Initialize DanticSQL with the models and all columns from the query
models = [User, Post]
queried_columns = list(df.columns)
dantic = DanticSQL(models=models, queried_columns=queried_columns)
# 2. Parse the DataFrame into individual SQLModel instances
dantic.pydantic_all(df)
# 3. Connect the instances based on their defined relationships
dantic.connect_all()
# 4. Access your fully formed, nested objects!
instances = dantic.instances
# --- Verification ---
users = instances.get("user", [])
for user in users:
print(f"User: {user.name} (ID: {user.user_id})")
if user.posts:
for post in user.posts:
# The back-reference is also populated!
print(f" - Post: '{post.title}' (ID: {post.post_id}), User: {post.user.name}")
else:
print(" - No posts found.")
Expected Output:
User: Alice (ID: 1)
- Post: 'Post 1 by Alice' (ID: 101), User: Alice
- Post: 'Post 2 by Alice' (ID: 102), User: Alice
User: Bob (ID: 2)
- Post: 'Post 1 by Bob' (ID: 103), User: Bob
User: Charlie (ID: 3)
- No posts found.
Important Considerations: Unique Field Naming
Core Requirement: All column names in the input DataFrame passed to DanticSQL must be unique.
When you JOIN multiple tables, it's common to have conflicting column names like id or created_at. If duplicate column names exist, DanticSQL cannot determine which model the column belongs to.
Recommended Solution: Use a Database View
For Text-to-SQL applications, using a database VIEW is the highly recommended approach to solve this. A view is a virtual table based on the result-set of an SQL statement. You can create a view that encapsulates all the necessary JOIN logic and, crucially, renames columns to ensure uniqueness.
For example, you could create a view to join user and post tables like this:
CREATE VIEW user_posts_view AS
SELECT
u.id AS user_id, -- Ensure the name is unique
u.name AS user_name,
p.id AS post_id, -- Ensure the name is unique
p.title AS post_title,
p.user_id AS user_id -- The foreign key column must be present
FROM
"user" u
LEFT JOIN
post p ON u.id = p.user_id;
Why is this effective in Text-to-SQL?
- Stability: It provides a stable and clearly structured data source for the LLM. Even if the underlying tables are complex, the view presents a simplified interface.
- Flexibility: The data requirements for a Text-to-SQL application can change frequently. Modifying a view's definition is far easier and safer than altering the physical database schema.
- Decoupling: It decouples the data model that the LLM queries from the database's physical storage, making your application more robust and easier to maintain.
By querying this view (SELECT * FROM user_posts_view;), the resulting DataFrame will have unique column names, ready to be processed directly by DanticSQL.
API Overview
class DanticSQL
The main class that orchestrates the process.
-
__init__(self, models: list[type[SQLModel]], queried_columns: list[str])models: A list of theSQLModelclasses you expect to parse from the data.queried_columns: A list of all column names from the inputDataFrame. This is used to ensure the primary keys required for parsing are present.
-
pydantic_all(self, records: pd.DataFrame)- Parses the
DataFrameand creates de-duplicated model instances. Populates an internal_instancesdictionary.
- Parses the
-
connect_all(self)- Connects the instances created by
pydantic_allby setting theirRelationshipattributes.
- Connects the instances created by
-
instances(property)- Returns the final result: a dictionary where keys are table names and values are lists of the fully reconstructed
SQLModelobjects.
- Returns the final result: a dictionary where keys are table names and values are lists of the fully reconstructed
Motivation: Bridging the Gap in Text-to-SQL
DanticSQL was born out of a practical challenge encountered while developing modern Text-to-SQL applications. Large Language Models (LLMs) are incredibly effective at translating natural language questions into complex SQL queries. However, a significant gap exists between the output of these models and the structured, object-oriented world of modern Python applications.
LLMs generate raw SQL strings. When you execute these queries—especially complex ones with JOINs—you get a flat, tabular result, typically as a Pandas DataFrame. While this data is correct, it doesn't integrate with the rich object models (like SQLModel or Pydantic models) that developers use to build robust applications. The benefits of using an ORM, such as automatic data validation, type hinting, and clearly defined relationships (user.posts), are lost at this stage.
I first developed the core logic of DanticSQL to solve this very problem in my own Text-to-SQL project. The goal was to allow the application to leverage the dynamic query generation of an LLM while still benefiting from the clean, maintainable, and type-safe data structures provided by SQLModel. After realizing this was a common hurdle, I decided to extract and refine the code into this standalone library.
DanticSQL is the bridge. It takes the raw, flat output from your LLM-generated query and intelligently reconstructs it into the nested SQLModel object graphs your application is designed to work with. This allows you to combine the power of natural language querying with the elegance and safety of an object-relational mapper.
Dependencies
pandassqlalchemysqlmodelpydantic
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file danticsql-0.1.0.tar.gz.
File metadata
- Download URL: danticsql-0.1.0.tar.gz
- Upload date:
- Size: 53.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fdfc603aadbe44933ccf164f2c11d27c747e259d812ca219cb69ae4927b03cb3
|
|
| MD5 |
8e41f62a9fb43476d630c3875c005555
|
|
| BLAKE2b-256 |
9e7a05021ee480332b6c303a26e5371eb2af5de6e68b8a169b30959ea00be688
|
Provenance
The following attestation bundles were made for danticsql-0.1.0.tar.gz:
Publisher:
publish-to-release.yml on saya-ashen/DanticSQL
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
danticsql-0.1.0.tar.gz -
Subject digest:
fdfc603aadbe44933ccf164f2c11d27c747e259d812ca219cb69ae4927b03cb3 - Sigstore transparency entry: 266423612
- Sigstore integration time:
-
Permalink:
saya-ashen/DanticSQL@c1c877e764ec79dd17e70801f82480579ac9271b -
Branch / Tag:
refs/tags/0.1.1 - Owner: https://github.com/saya-ashen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-release.yml@c1c877e764ec79dd17e70801f82480579ac9271b -
Trigger Event:
release
-
Statement type:
File details
Details for the file danticsql-0.1.0-py3-none-any.whl.
File metadata
- Download URL: danticsql-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09b5008189817d8fede7315506d72749102a3040737cfcc59f8b2e8bd73600ee
|
|
| MD5 |
bdf50278922f9d783b24b78ae9053738
|
|
| BLAKE2b-256 |
b062eaa24afd448664ea1ee8a508286e192790b83a58b354dd6478f95652ff63
|
Provenance
The following attestation bundles were made for danticsql-0.1.0-py3-none-any.whl:
Publisher:
publish-to-release.yml on saya-ashen/DanticSQL
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
danticsql-0.1.0-py3-none-any.whl -
Subject digest:
09b5008189817d8fede7315506d72749102a3040737cfcc59f8b2e8bd73600ee - Sigstore transparency entry: 266423618
- Sigstore integration time:
-
Permalink:
saya-ashen/DanticSQL@c1c877e764ec79dd17e70801f82480579ac9271b -
Branch / Tag:
refs/tags/0.1.1 - Owner: https://github.com/saya-ashen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-release.yml@c1c877e764ec79dd17e70801f82480579ac9271b -
Trigger Event:
release
-
Statement type: