Convert pydantic model to aws glue schema for terraform
Project description
JSON Schema to AWS Glue schema converter
Installation
pip install pydantic-glue
What?
Converts pydantic
schemas to json schema
and then to AWS glue schema
,
so in theory anything that can be converted to JSON Schema could also work.
Why?
When using AWS Kinesis Firehose
in a configuration that receives JSONs and writes parquet
files on S3,
one needs to define a AWS Glue
table so Firehose knows what schema to use when creating the parquet files.
AWS Glue lets you define a schema using Avro
or JSON Schema
and then to create a table from that schema,
but as of May 2022
there are limitations on AWS that tables that are created that way can't be used with Kinesis Firehose.
https://stackoverflow.com/questions/68125501/invalid-schema-error-in-aws-glue-created-via-terraform
This is also confirmed by AWS support.
What one could do is create a table set the columns manually, but this means you now have two sources of truth to maintain.
This tool allows you to define a table in pydantic
and generate a JSON with column types that can be used with terraform
to create a Glue table.
Example
Take the following pydantic class
from pydantic import BaseModel
from typing import List
class Bar(BaseModel):
name: str
age: int
class Foo(BaseModel):
nums: List[int]
bars: List[Bar]
other: str
Running pydantic-glue
pydantic-glue -f example.py -c Foo
you get this JSON in the terminal:
{
"//": "Generated by pydantic-glue at 2022-05-25 12:35:55.333570. DO NOT EDIT",
"columns": {
"nums": "array<int>",
"bars": "array<struct<name:string,age:int>>",
"other": "string"
}
}
and can be used in terraform like that
locals {
columns = jsondecode(file("${path.module}/glue_schema.json")).columns
}
resource "aws_glue_catalog_table" "table" {
name = "table_name"
database_name = "db_name"
storage_descriptor {
dynamic "columns" {
for_each = local.columns
content {
name = columns.key
type = columns.value
}
}
}
}
Alternatively you can run CLI with -o
flag to set output file location:
pydantic-glue -f example.py -c Foo -o example.json -l
Override the type for the AWS Glue Schema
Wherever there is a type
key in the input JSON Schema, an additional key glue_type
may be
defined to override the type that is used in the AWS Glue Schema. This is, for example, useful for
a pydantic model that has a field of type int
that is unix epoch time, while the column type you
would like in Glue is of type timestamp
.
Additional JSON Schema keys to a pydantic model can be added by using the
Field
function
with the argument json_schema_extra
like so:
from pydantic import BaseModel, Field
class A(BaseModel):
epoch_time: int = Field(
...,
json_schema_extra={
"glue_type": "timestamp",
},
)
The resulting JSON Schema will be:
{
"properties": {
"epoch_time": {
"glue_type": "timestamp",
"title": "Epoch Time",
"type": "integer"
}
},
"required": [
"epoch_time"
],
"title": "A",
"type": "object"
}
And the result after processing with pydantic-glue:
{
"//": "Generated by pydantic-glue at 2022-05-25 12:35:55.333570. DO NOT EDIT",
"columns": {
"epoch_time": "timestamp",
}
}
Recursing through object properties terminates when you supply a glue_type
to use. If the type is
complex, you must supply the full complex type yourself.
How it works?
pydantic
gets converted to JSON Schema- the JSON Schema types get mapped to Glue types recursively
Future work
- Not all types are supported, I just add types as I need them, but adding types is very easy, feel free to open issues or send a PR if you stumbled upon a non-supported use case
- the tool could be easily extended to working with JSON Schema directly
- thus, anything that can be converted to a JSON Schema should also work.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pydantic_glue-0.6.0.tar.gz
.
File metadata
- Download URL: pydantic_glue-0.6.0.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.1 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 190d42a073f5666791df760a40f22df3a7e0daf6970fb6f30661825d89cdcf61 |
|
MD5 | 4e859eb9b0be34351699284c1f6580f7 |
|
BLAKE2b-256 | 7723b219e56aba861232b712c82fdd8def4a347daae59fa6cffd8e1c42acc9af |
File details
Details for the file pydantic_glue-0.6.0-py3-none-any.whl
.
File metadata
- Download URL: pydantic_glue-0.6.0-py3-none-any.whl
- Upload date:
- Size: 6.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.1 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 624e38043e50e0a417efb02de58258434d37fc7d86dae602b46fcad543645071 |
|
MD5 | 37342327e9f06512bddc26228b6d2a64 |
|
BLAKE2b-256 | 697d2b20b773c1c9432b832541ddb30210017743ba00bf6babd681a0c71aa9fe |