Skip to main content

Convert pydantic model to aws glue schema for terraform

Project description

JSON Schema to AWS Glue schema converter

Installation

pip install pydantic-glue

What?

Converts pydantic schemas to json schema and then to AWS glue schema, so in theory anything that can be converted to JSON Schema could also work.

Why?

When using AWS Kinesis Firehose in a configuration that receives JSONs and writes parquet files on S3, one needs to define a AWS Glue table so Firehose knows what schema to use when creating the parquet files.

AWS Glue lets you define a schema using Avro or JSON Schema and then to create a table from that schema, but as of *May 2022` there are limitations on AWS that tables that are created that way can't be used with Kinesis Firehose.

https://stackoverflow.com/questions/68125501/invalid-schema-error-in-aws-glue-created-via-terraform

This is also confirmed by AWS support.

What one could do is create a table set the columns manually, but this means you now have two sources of truth to maintain.

This tool allows you to define a table in pydantic and generate a JSON with column types that can be used with terraform to create a Glue table.

Example

Take the following pydantic class

from pydantic import BaseModel
from typing import List


class Bar(BaseModel):
    name: str
    age: int


class Foo(BaseModel):
    nums: List[int]
    bars: List[Bar]
    other: str

Running pydantic-glue

pydantic-glue -f example.py -c Foo

you get this JSON in the terminal:

{
  "//": "Generated by pydantic-glue at 2022-05-25 12:35:55.333570. DO NOT EDIT",
  "columns": {
    "nums": "array<int>",
    "bars": "array<struct<name:string,age:int>>",
    "other": "string"
  }
}

and can be used in terraform like that

locals {
  columns = jsondecode(file("${path.module}/glue_schema.json")).columns
}

resource "aws_glue_catalog_table" "table" {
  name          = "table_name"
  database_name = "db_name"

  storage_descriptor {
    dynamic "columns" {
      for_each = local.columns

      content {
        name = columns.key
        type = columns.value
      }
    }
  }
}

Alternatively you can run CLI with -o flag to set output file location:

pydantic-glue -f example.py -c Foo -o example.json -l

How it works?

  • pydantic gets converted to JSON Schema
  • the JSON Schema types get mapped to Glue types recursively

Future work

  • Not all types are supported, I just add types as I need them, but adding types is very easy, feel free to open issues or send a PR if you stumbled upon a non-supported use case
  • the tool could be easily extended to working with JSON Schema directly
  • thus, anything that can be converted to a JSON Schema should also work.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydantic_glue-0.5.0.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

pydantic_glue-0.5.0-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file pydantic_glue-0.5.0.tar.gz.

File metadata

  • Download URL: pydantic_glue-0.5.0.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.1 Linux/6.5.0-1023-azure

File hashes

Hashes for pydantic_glue-0.5.0.tar.gz
Algorithm Hash digest
SHA256 8471f1c4f27fd5ea80856c69869e8e52720c3a3b5e4cc6ebcfcab42d90c385e2
MD5 7f4893967a6c69bc2e8e6c7a8faafb27
BLAKE2b-256 7c305c43465c54a50e4fd82864f119d7c995ce4829c9a49bf6a054cca09b3acb

See more details on using hashes here.

File details

Details for the file pydantic_glue-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: pydantic_glue-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 5.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.1 Linux/6.5.0-1023-azure

File hashes

Hashes for pydantic_glue-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 118fc42faba72e3d6875b9032473928300de02f0daab63bfd7501b707149e90c
MD5 1401bb469e701a317b93fa231ee60da5
BLAKE2b-256 e42712ccec3610e35687082447c0ac6eee6e782f5e5af616ca9ed2b6b2295be1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page