A powerful Python utility designed to efficiently reconstruct nested SQLModel object graphs from a flat Pandas DataFrame

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

saya-ashen

Project description

DanticSQL

DanticSQL is a powerful Python utility designed to efficiently reconstruct nested SQLModel object graphs from a flat Pandas DataFrame. It's the perfect tool for when you perform a complex SQL JOIN query and need to transform the resulting tabular data back into your structured, relational Pydantic/SQLModel objects.

It bridges the gap between database query results and a clean, object-oriented data representation in your application.

The Problem It Solves

When using an ORM like SQLAlchemy with SQLModel, you define neat relationships between your data models (e.g., a User has many Posts). However, when you fetch data with a JOIN to get all information in one query, you get a flat, denormalized table.

For example, a LEFT JOIN between users and posts might produce a result like this:

user_id	name	post_id	title
1	Alice	101	Hello World
1	Alice	102	My Second Post
2	Bob	103	A Post by Bob

The challenge is to efficiently parse this DataFrame back into two User objects and three Post objects, with user_alice.posts correctly containing a list of her two posts. Manually looping through this data is inefficient and error-prone. DanticSQL automates this entire process.

How It Works

The library operates in a two-stage process:

1. Parsing and Grouping (`pydantic_all`)

First, DanticSQL processes the input DataFrame for each of your SQLModel tables.

Grouping: It groups the DataFrame by each table's primary key. This consolidates all rows belonging to a single object instance (like the two rows for user_id=1 above).
Intelligent Aggregation:
- For regular data fields (name), it takes the first value (as it should be the same across all grouped rows).
- For fields that represent relationship keys (post_id), it aggregates all unique values into a set. This correctly captures one-to-many and many-to-many relationships.
Pydantic Validation: It uses pydantic.TypeAdapter for highly efficient, bulk validation and creation of your SQLModel instances from the cleaned data.

At the end of this stage, you have distinct, de-duplicated instances of each model, but they are not yet linked together.

2. Connecting Relationships (`connect_all`)

This is where the magic happens. The connect_all method links the newly created objects.

Lookup Maps: It first creates fast lookup dictionaries (hash maps) for all created instances, mapping each object's primary key to the object itself (e.g., {1: <User object with id=1>}).
Relationship Data: It then iterates over the intermediate relationship data generated in the first step.
Linking: For each source object, it finds the corresponding foreign key values it needs to link. It uses these keys to instantly retrieve the related objects from the lookup maps and sets the relationship attribute (user.posts or post.user) accordingly.

This approach avoids slow, nested loops and is significantly more performant, especially for large datasets.

Key Features

Efficient: Leverages pandas for high-performance grouping and pydantic for fast, C-optimized data validation.
Relationship Aware: Automatically inspects SQLModel relationships (One-to-Many, Many-to-One, Many-to-Many) to guide the reconstruction process.
Decoupled: Separates the data-fetching logic (your SQL query) from the object-modeling logic.
Robust: Handles single and composite primary keys.

Installation

This package is not published on PyPI yet. You can install it directly from the GitHub repository:

pip install https://github.com/saya-ashen/DanticSQL/releases/download/0.1.0/danticsql-0.1.0-py3-none-any.whl

Usage Example

Let's walk through a complete example with a User and Post relationship.

1. Define Your SQLModels

from typing import List, Optional
from danticsql import DanticSQL
from sqlmodel import Field, Relationship, SQLModel
import pandas as pd

class User(SQLModel, table=True):
    # NOTE: The column name must be unique in all tables which you want to query.
    # WRONG EXAMPLE: use id instead of user_id
    user_id: Optional[int] = Field(default=None, primary_key=True)
    name: str
    posts: List["Post"] = Relationship(back_populates="user")

class Post(SQLModel, table=True):
    # NOTE: The column name must be unique in all tables which you want to query.
    post_id: Optional[int] = Field(default=None, primary_key=True)
    title: str
    user_id: Optional[int] = Field(default=None, foreign_key="user.user_id")
    user: Optional[User] = Relationship(back_populates="posts")

2. Simulate a DataFrame

This DataFrame is what you might get from a JOIN query. Note that all column names are unique.

# Note: DanticSQL expects related table PKs to be present.
data = {
    "user_id": [1, 1, 2, 3],
    "name": ["Alice", "Alice", "Bob", "Charlie"],
    "post_id": [101, 102, 103, None],
    "title": ["Post 1 by Alice", "Post 2 by Alice", "Post 1 by Bob", None],
}
df = pd.DataFrame(data, dtype="object")

3. Use DanticSQL to Reconstruct Objects

# 1. Initialize DanticSQL with the models and all columns from the query
models = [User, Post]
queried_columns = list(df.columns)
dantic = DanticSQL(models=models, queried_columns=queried_columns)

# 2. Parse the DataFrame into individual SQLModel instances
dantic.pydantic_all(df)

# 3. Connect the instances based on their defined relationships
dantic.connect_all()

# 4. Access your fully formed, nested objects!
instances = dantic.instances

# --- Verification ---
users = instances.get("user", [])
for user in users:
    print(f"User: {user.name} (ID: {user.user_id})")
    if user.posts:
        for post in user.posts:
            # The back-reference is also populated!
            print(f"  - Post: '{post.title}' (ID: {post.post_id}), User: {post.user.name}")
    else:
        print("  - No posts found.")

Expected Output:

User: Alice (ID: 1)
  - Post: 'Post 1 by Alice' (ID: 101), User: Alice
  - Post: 'Post 2 by Alice' (ID: 102), User: Alice
User: Bob (ID: 2)
  - Post: 'Post 1 by Bob' (ID: 103), User: Bob
User: Charlie (ID: 3)
  - No posts found.

Important Considerations: Unique Field Naming

Core Requirement: All column names in the input DataFrame passed to DanticSQL must be unique.

When you JOIN multiple tables, it's common to have conflicting column names like id or created_at. If duplicate column names exist, DanticSQL cannot determine which model the column belongs to.

API Overview

`class DanticSQL`

The main class that orchestrates the process.

__init__(self, models: list[type[SQLModel]], queried_columns: list[str])
- models: A list of the SQLModel classes you expect to parse from the data.
- queried_columns: A list of all column names from the input DataFrame. This is used to ensure the primary keys required for parsing are present.
pydantic_all(self, records: pd.DataFrame)
- Parses the DataFrame and creates de-duplicated model instances. Populates an internal _instances dictionary.
connect_all(self)
- Connects the instances created by pydantic_all by setting their Relationship attributes.
instances (property)
- Returns the final result: a dictionary where keys are table names and values are lists of the fully reconstructed SQLModel objects.

Motivation: Bridging the Gap in Text-to-SQL

DanticSQL was born out of a practical challenge encountered while developing modern Text-to-SQL applications. Large Language Models (LLMs) are incredibly effective at translating natural language questions into complex SQL queries. However, a significant gap exists between the output of these models and the structured, object-oriented world of modern Python applications.

LLMs generate raw SQL strings. When you execute these queries—especially complex ones with JOINs—you get a flat, tabular result, typically as a Pandas DataFrame. While this data is correct, it doesn't integrate with the rich object models (like SQLModel or Pydantic models) that developers use to build robust applications. The benefits of using an ORM, such as automatic data validation, type hinting, and clearly defined relationships (user.posts), are lost at this stage.

I first developed the core logic of DanticSQL to solve this very problem in my own Text-to-SQL project. The goal was to allow the application to leverage the dynamic query generation of an LLM while still benefiting from the clean, maintainable, and type-safe data structures provided by SQLModel. After realizing this was a common hurdle, I decided to extract and refine the code into this standalone library.

DanticSQL is the bridge. It takes the raw, flat output from your LLM-generated query and intelligently reconstructs it into the nested SQLModel object graphs your application is designed to work with. This allows you to combine the power of natural language querying with the elegance and safety of an object-relational mapper.

Dependencies

pandas
sqlalchemy
sqlmodel
pydantic

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

saya-ashen

Release history Release notifications | RSS feed

This version

0.1.0

Jul 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

danticsql-0.1.0.tar.gz (53.2 kB view details)

Uploaded Jul 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

danticsql-0.1.0-py3-none-any.whl (10.2 kB view details)

Uploaded Jul 8, 2025 Python 3

File details

Details for the file danticsql-0.1.0.tar.gz.

File metadata

Download URL: danticsql-0.1.0.tar.gz
Upload date: Jul 8, 2025
Size: 53.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for danticsql-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`fdfc603aadbe44933ccf164f2c11d27c747e259d812ca219cb69ae4927b03cb3`
MD5	`8e41f62a9fb43476d630c3875c005555`
BLAKE2b-256	`9e7a05021ee480332b6c303a26e5371eb2af5de6e68b8a169b30959ea00be688`

See more details on using hashes here.

Provenance

The following attestation bundles were made for danticsql-0.1.0.tar.gz:

Publisher: publish-to-release.yml on saya-ashen/DanticSQL

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: danticsql-0.1.0.tar.gz
- Subject digest: fdfc603aadbe44933ccf164f2c11d27c747e259d812ca219cb69ae4927b03cb3
- Sigstore transparency entry: 266423612
- Sigstore integration time: Jul 8, 2025
Source repository:
- Permalink: saya-ashen/DanticSQL@c1c877e764ec79dd17e70801f82480579ac9271b
- Branch / Tag: refs/tags/0.1.1
- Owner: https://github.com/saya-ashen
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-release.yml@c1c877e764ec79dd17e70801f82480579ac9271b
- Trigger Event: release

File details

Details for the file danticsql-0.1.0-py3-none-any.whl.

File metadata

Download URL: danticsql-0.1.0-py3-none-any.whl
Upload date: Jul 8, 2025
Size: 10.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for danticsql-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`09b5008189817d8fede7315506d72749102a3040737cfcc59f8b2e8bd73600ee`
MD5	`bdf50278922f9d783b24b78ae9053738`
BLAKE2b-256	`b062eaa24afd448664ea1ee8a508286e192790b83a58b354dd6478f95652ff63`

See more details on using hashes here.

Provenance

The following attestation bundles were made for danticsql-0.1.0-py3-none-any.whl:

Publisher: publish-to-release.yml on saya-ashen/DanticSQL

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: danticsql-0.1.0-py3-none-any.whl
- Subject digest: 09b5008189817d8fede7315506d72749102a3040737cfcc59f8b2e8bd73600ee
- Sigstore transparency entry: 266423618
- Sigstore integration time: Jul 8, 2025
Source repository:
- Permalink: saya-ashen/DanticSQL@c1c877e764ec79dd17e70801f82480579ac9271b
- Branch / Tag: refs/tags/0.1.1
- Owner: https://github.com/saya-ashen
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-release.yml@c1c877e764ec79dd17e70801f82480579ac9271b
- Trigger Event: release

danticsql 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

DanticSQL

The Problem It Solves

How It Works

1. Parsing and Grouping (pydantic_all)

2. Connecting Relationships (connect_all)

Key Features

Installation

Usage Example

1. Define Your SQLModels

2. Simulate a DataFrame

3. Use DanticSQL to Reconstruct Objects

Expected Output:

Important Considerations: Unique Field Naming

Recommended Solution: Use a Database View

API Overview

class DanticSQL

Motivation: Bridging the Gap in Text-to-SQL

Dependencies

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

1. Parsing and Grouping (`pydantic_all`)

2. Connecting Relationships (`connect_all`)

`class DanticSQL`