Library for converting pandas dataframes to pydantic models
Project description
pandas-to-pydantic
WARNING: Library is currently unstable and in beta.
This library provides functions for converting Pandas Dataframes to Pydantic Models. This allows you to easily transform data in a table-like format into a json-like format. Pydantic Model annotations are matched with Pandas Dataframe columns. Supports models nested in lists.
Table of Contents
Installation
pip install pandas-to-pydantic
Example 1
This example will show how to convert data from a flat structure (.csv file, pandas dataframe) to a hierarchical structure (json file, pydantic models)
BookID | Title | AuthorName | Genre | PublishedYear |
---|---|---|---|---|
1 | Harry Potter and the Philosopher's Stone | J.K. Rowling | Fantasy | 1997 |
2 | Harry Potter and the Chamber of Secrets | J.K. Rowling | Fantasy | 1998 |
3 | 1984 | George Orwell | Dystopian Fiction | 1949 |
4 | Animal Farm | George Orwell | Political Satire | 1945 |
5 | Pride and Prejudice | Jane Austen | Romance | 1813 |
7 | Murder on the Orient Express | Agatha Christie | Mystery | 1934 |
9 | Adventures of Huckleberry Finn | Mark Twain | Adventure | 1884 |
10 | The Adventures of Tom Sawyer | Mark Twain | Adventure | 1876 |
11 | The Hobbit | J.R.R. Tolkien | Fantasy | 1937 |
12 | The Lord of the Rings | J.R.R. Tolkien | Fantasy | 1954 |
import pandas as pd
from pydantic import BaseModel
from pandas_to_pydantic import dataframe_to_pydantic
# Declare pydantic models
class Book(BaseModel):
BookID: int
Title: str
AuthorName: str
Genre: str
PublishedYear: int
# Update this to your your file path
book_data = pd.read_csv(FILE_PATH)
# Convert pandas dataframe to a pydantic root model
book_list_root = dataframe_to_pydantic(book_data, Book)
dataframe_to_pydantic
returns a pydantic RootModel
. Data can be accessed using its attributes and methods. https://docs.pydantic.dev/latest/api/root_model/
For example:
# Access data as a list of pydantic models
book_list_root.root
Returns (output shortened):
[Book(BookID=1, Title="Harry Potter and the Philosopher's Stone", AuthorName='J.K. Rowling', Genre='Fantasy', PublishedYear=1997),
Book(BookID=2, Title='Harry Potter and the Chamber of Secrets', AuthorName='J.K. Rowling', Genre='Fantasy', PublishedYear=1998),
Book(BookID=3, Title='1984', AuthorName='George Orwell', Genre='Dystopian Fiction', PublishedYear=1949),
...]
For example:
# Access data as a list of dict
book_list_root.model_dump()
Returns (output shortened):
[{'BookID': 1,
'Title': "Harry Potter and the Philosopher's Stone",
'AuthorName': 'J.K. Rowling',
'Genre': 'Fantasy',
'PublishedYear': 1997},
{'BookID': 2,
'Title': 'Harry Potter and the Chamber of Secrets',
'AuthorName': 'J.K. Rowling',
'Genre': 'Fantasy',
'PublishedYear': 1998},
{'BookID': 3,
'Title': '1984',
'AuthorName': 'George Orwell',
'Genre': 'Dystopian Fiction',
'PublishedYear': 1949},
...]
Example 2
In this example, Pydantic models are nested using the list
type annotation. When there are multiple layers of nesting, unique id fields should be provided for each list field with a child model using id_column_map
.
Here, the unique id column for the Genre
model is Genre
, and the unique id column for the Author
model is AuthorName
. Keys in id_column_map
can be the model name or field name. Values in id_column_map
are the unique column name.
For example:
class Book(BaseModel):
BookID: int
Title: str
PublishedYear: int
class Author(BaseModel):
AuthorName: str
BookList: list[Book]
class Genre(BaseModel):
Genre: str
AuthorList: list[Author]
dataframe_to_pydantic(
data=bookData,
model=Genre,
id_column_map={"Genre": "Genre", "AuthorList": "AuthorName"},
).model_dump()
Returns (output shortened)
[{'Genre': 'Fantasy',
'AuthorList': [{'AuthorName': 'J.K. Rowling',
'BookList': [{'BookID': 1,
'Title': "Harry Potter and the Philosopher's Stone",
'PublishedYear': 1997},
{'BookID': 2,
'Title': 'Harry Potter and the Chamber of Secrets',
'PublishedYear': 1998}]},
{'AuthorName': 'J.R.R. Tolkien',
'BookList': [{'BookID': 11, 'Title': 'The Hobbit', 'PublishedYear': 1937},
{'BookID': 12,
'Title': 'The Lord of the Rings',
'PublishedYear': 1954}]}]},
{'Genre': 'Dystopian Fiction',
'AuthorList': [{'AuthorName': 'George Orwell',
'BookList': [{'BookID': 3, 'Title': '1984', 'PublishedYear': 1949}]}]},
...]
dataframe_to_pydantic
Args
- data (
pandas.DataFrame
)- Dataframe with columns matching fields in the pydantic model
- When the pydantic model includes nested models, it is assumed that the first column is unique. See Example 2
- model (
pydantic._internal._model_construction.ModelMetaClass
)- Accepts classes created with pydantic.BaseModel
- Supports nested models in lists
- Annotation names must match columns in the dataframe
- id_column_map(
dict[str,str]
)- Required when nesting Pydantic models
- Each key corresponds with field name or model name
- Each value corresponds with the unique id column for the nested Pydantic model
- For the parent level model, use the model name as key
Returns
- model_list (
pydantic.RootModel
)- Pydantic root model created as a list of the input model
- https://docs.pydantic.dev/latest/api/root_model/
Advanced Example
This example uses a larger data set with additional nesting.
import pandas as pd
from pydantic import BaseModel
from pandas_to_pydantic import dataframe_to_pydantic
# Declare pydantic models
class LibaryDetail(BaseModel):
LibraryName: str
Location: str
EstablishedYear: int
BookCollectionSize: int
class Author(BaseModel):
AuthorID: int
AuthorName: str
AuthorBirthdate: str
class Book(BaseModel):
BookID: int
Title: str
Genre: str
PublishedYear: int
class Library(BaseModel):
LibraryID: int
Detail: LibaryDetail
AuthorList: list[Author]
BookList: list[Book]
# Input data is a pandas dataframe
data = pd.read_csv(FILE_PATH)
# Convert pandas dataframe to a pydantic root model
library_list_root = dataframe_to_pydantic(
data,
Library,
{
"Library": "LibraryID",
"BookList": "BookID",
"AuthorList": "AuthorID",
},
)
# Access data as a list of pydantic models
library_list_root.root
# Access data as a list of dict
library_list_root.model_dump()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pandas_to_pydantic-0.1.4.tar.gz
.
File metadata
- Download URL: pandas_to_pydantic-0.1.4.tar.gz
- Upload date:
- Size: 11.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 737da9414a564249fd9a36979ab4d09d148b0856158c7867328eb49661e6ae87 |
|
MD5 | 03745be8216c657100f271a2c20e529a |
|
BLAKE2b-256 | bb6a924123677d1a18b368b715450b0999849cb69d3eabf8a1c245509bf07036 |
Provenance
The following attestation bundles were made for pandas_to_pydantic-0.1.4.tar.gz
:
Publisher:
publish-package.yml
on magicalpuffin/pandas-to-pydantic
-
Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
pandas_to_pydantic-0.1.4.tar.gz
- Subject digest:
737da9414a564249fd9a36979ab4d09d148b0856158c7867328eb49661e6ae87
- Sigstore transparency entry: 149383585
- Sigstore integration time:
- Predicate type:
File details
Details for the file pandas_to_pydantic-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: pandas_to_pydantic-0.1.4-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92bc3f5ea47d67214957f18444a56105703bf559c365e5da3bbbb717ef7c0c6a |
|
MD5 | 6d23bfdd906f27ef9fcb170995d75f59 |
|
BLAKE2b-256 | 35399f03799fb6517297ba45eb648af72414b1bbb364f677638019775aba4956 |
Provenance
The following attestation bundles were made for pandas_to_pydantic-0.1.4-py3-none-any.whl
:
Publisher:
publish-package.yml
on magicalpuffin/pandas-to-pydantic
-
Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
pandas_to_pydantic-0.1.4-py3-none-any.whl
- Subject digest:
92bc3f5ea47d67214957f18444a56105703bf559c365e5da3bbbb717ef7c0c6a
- Sigstore transparency entry: 149383587
- Sigstore integration time:
- Predicate type: