Skip to main content

Library for converting pandas dataframes to pydantic models

Project description

pandas-to-pydantic

WARNING: Library is currently unstable and in beta.

Library for converting pandas dataframes into pydantic models. This allows conversion between popular python formats for flat and structured data. Pydantic model annotations are matched with pandas dataframe columns. Supports models nested in lists.

PyPI - Version PyPI - Python Version


Table of Contents

Installation

pip install pandas-to-pydantic

Example 1

This example will show how to convert data from a flat structure (.csv file, pandas dataframe) to a hierarchical structure (json file, pydantic models)

Example Book Data

BookID Title AuthorName Genre PublishedYear
1 Harry Potter and the Philosopher's Stone J.K. Rowling Fantasy 1997
2 Harry Potter and the Chamber of Secrets J.K. Rowling Fantasy 1998
3 1984 George Orwell Dystopian Fiction 1949
4 Animal Farm George Orwell Political Satire 1945
5 Pride and Prejudice Jane Austen Romance 1813
7 Murder on the Orient Express Agatha Christie Mystery 1934
9 Adventures of Huckleberry Finn Mark Twain Adventure 1884
10 The Adventures of Tom Sawyer Mark Twain Adventure 1876
11 The Hobbit J.R.R. Tolkien Fantasy 1937
12 The Lord of the Rings J.R.R. Tolkien Fantasy 1954
import pandas as pd
from pydantic import BaseModel
from pandas_to_pydantic import dataframe_to_pydantic

# Declare pydantic models
class Book(BaseModel):
    BookID: int
    Title: str
    AuthorName: str
    Genre: str
    PublishedYear: int

# Update this to your your file path
book_data = pd.read_csv(FILE_PATH)

# Convert pandas dataframe to a pydantic root model
book_list_root = dataframe_to_pydantic(book_data, Book)

dataframe_to_pydantic returns a pydantic RootModel. Data can be accessed using its attributes and methods. https://docs.pydantic.dev/latest/api/root_model/

For example:

# Access data as a list of pydantic models
book_list_root.root

Returns (output shortened):

[Book(BookID=1, Title="Harry Potter and the Philosopher's Stone", AuthorName='J.K. Rowling', Genre='Fantasy', PublishedYear=1997),
Book(BookID=2, Title='Harry Potter and the Chamber of Secrets', AuthorName='J.K. Rowling', Genre='Fantasy', PublishedYear=1998),
Book(BookID=3, Title='1984', AuthorName='George Orwell', Genre='Dystopian Fiction', PublishedYear=1949),
...]

For example:

# Access data as a list of dict
book_list_root.model_dump()

Returns (output shortened):

[{'BookID': 1,
  'Title': "Harry Potter and the Philosopher's Stone",
  'AuthorName': 'J.K. Rowling',
  'Genre': 'Fantasy',
  'PublishedYear': 1997},
 {'BookID': 2,
  'Title': 'Harry Potter and the Chamber of Secrets',
  'AuthorName': 'J.K. Rowling',
  'Genre': 'Fantasy',
  'PublishedYear': 1998},
 {'BookID': 3,
  'Title': '1984',
  'AuthorName': 'George Orwell',
  'Genre': 'Dystopian Fiction',
  'PublishedYear': 1949},
...]

Example 2

Pydantic models can be nested using list annotations. This requires another unique field to be available. In this example, it is AuthorName and Genre.

For example:

class Book(BaseModel):
    BookID: int
    Title: str
    PublishedYear: int

class Author(BaseModel):
    AuthorName: str
    BookList: list[Book]

class Genre(BaseModel):
    Genre: str
    AuthorList: list[Author]

dataframe_to_pydantic(book_data, Genre).model_dump()

Returns (output shortened)

[{'Genre': 'Fantasy',
  'AuthorList': [{'AuthorName': 'J.K. Rowling',
    'BookList': [{'BookID': 1,
      'Title': "Harry Potter and the Philosopher's Stone",
      'PublishedYear': 1997},
     {'BookID': 2,
      'Title': 'Harry Potter and the Chamber of Secrets',
      'PublishedYear': 1998}]},
   {'AuthorName': 'J.R.R. Tolkien',
    'BookList': [{'BookID': 11, 'Title': 'The Hobbit', 'PublishedYear': 1937},
     {'BookID': 12,
      'Title': 'The Lord of the Rings',
      'PublishedYear': 1954}]}]},
 {'Genre': 'Dystopian Fiction',
  'AuthorList': [{'AuthorName': 'George Orwell',
    'BookList': [{'BookID': 3, 'Title': '1984', 'PublishedYear': 1949}]}]},
...]

dataframe_to_pydantic

Args

  • data (pandas.DataFrame)
    • Dataframe with columns matching fields in the pydantic model
    • When the pydantic model includes nested models, it is assumed that the first column is unique. See Example 2
  • model (pydantic._internal._model_construction.ModelMetaClass)
    • Accepts classes created with pydantic.BaseModel
    • Supports nested models in lists
    • Annotation names must match columns in the dataframe

Returns

Advanced Example

This example uses a larger data set with additional nesting.

Example Library Data

import pandas as pd
from pydantic import BaseModel
from pandas_to_pydantic import dataframe_to_pydantic

# Declare pydantic models
class Book(BaseModel):
    BookID: int
    Title: str
    Genre: str
    PublishedYear: int
    AvailableCopies: int


class Author(BaseModel):
    AuthorID: int
    AuthorName: str
    AuthorBirthdate: str
    BookList: list[Book]


class Library(BaseModel):
    LibraryID: int
    LibraryName: str
    Location: str
    EstablishedYear: int
    BookCollectionSize: int
    AuthorList: list[Author]

# Input data is a pandas dataframe
data = pd.read_csv(FILE_PATH)

# Convert pandas dataframe to a pydantic root model
library_list_root = dataframe_to_pydantic(data, Library)

# Access data as a list of pydantic models
library_list_root.root

# Access data as a list of dict
library_list_root.model_dump()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_to_pydantic-0.0.7.tar.gz (9.8 kB view hashes)

Uploaded Source

Built Distribution

pandas_to_pydantic-0.0.7-py3-none-any.whl (6.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page