Skip to main content

A declarative, type-safe Python DSL for mapping complex nested JSON to relational database schemas

Project description

Quickstart: Declarative JSON-to-Relational Mapping in Python

etielle is a simple, powerful Python library for reshaping nested JSON data, typically from an API, into relational tables that fit your database schema. Think of etielle as a “JSON extractor” that you program with clear instructions: “Go here in the JSON, pull this data, and put it in that table.” The library’s name is a play on ETL (“Extract, Transform, Load”), which is the technical term for this set of operations.

Why Use etielle? (For Beginners)

JSON data from APIs is often deeply nested and requires complicated parsing. etielle helps by:

  • Traversing nested structures: Walk through arrays-within-dictionaries-within-arrays to any arbitrary depth.
  • Performing arbitrary transformations: Use built-in functions or define your own custom transforms.
  • Building relationships: Link records across your different output tables automatically.
  • Emitting to arbitrary formats: Emit data to Pydantic models, TypedDicts, or ORM objects directly.
  • Optionally loading data into a database: Load data into a database using SQLAlchemy or SQLModel.

Learning Path

  1. Quickstart: Quick and dirty introduction to etielle and how to use it.
  2. Introduction to ETL: The problem etielle is solving: JSON data ETL.
  3. Navigation: How to navigate through your JSON data with goto() and each().
  4. Transforms: Getting and altering values from the JSON data.
  5. Mapping Tables: Outputting data with Field and TempField.
  6. Relationships: Linking tables together with link_to().
  7. Database Loading: Persisting data with load() and run().

Installation

We recommend using uv for faster installs, but pip works too.

uv add etielle
# or
pip install etielle

Optional: Install with ORM adapters

If you plan to bind relationships and flush to your database via SQLAlchemy or SQLModel, install with the optional extra for your ORM:

uv add "etielle[sqlalchemy]"
# or
uv add "etielle[sqlmodel]"

Quick Start: Your First Mapping

Let’s start with a simple example. Suppose you have this JSON:

import json

data = {
  "users": [
    {"id": "u1", "name": "Alice", "posts": [{"id": "p1", "title": "Hello"}, {"id": "p2", "title": "World"}]},
    {"id": "u2", "name": "Bob", "posts": []}
  ]
}

We want two tables: “users” (id, name) and “posts” (id, user_id, title).

Here’s the code using the fluent API:

from etielle import etl, Field, TempField, get, get_from_parent

# Build and run the pipeline
result = (
    etl(data)
    # Extract users
    .goto("users").each()
    .map_to(table="users", fields=[
        Field("id", get("id")),
        Field("name", get("name")),
    ])
    # Extract posts (nested under each user)
    .goto("posts").each()
    .map_to(table="posts", fields=[
        Field("id", get("id")),
        Field("user_id", get_from_parent("id")),  # Link to parent user
        Field("title", get("title")),
    ])
    .run()
)

# result.tables gives you dict access by table name
out = {table: list(rows.values()) for table, rows in result.tables.items()}
print(json.dumps(out, indent=2))
{
  "users": [
    {
      "id": "u1",
      "name": "Alice"
    },
    {
      "id": "u2",
      "name": "Bob"
    }
  ],
  "posts": [
    {
      "id": "p1",
      "user_id": "u1",
      "title": "Hello"
    },
    {
      "id": "p2",
      "user_id": "u1",
      "title": "World"
    }
  ]
}

Congrats! You’ve mapped your first JSON using the fluent E->T->L pattern.

Core Concepts: Breaking It Down

1. The etl() Entry Point

Everything starts with etl(data). This creates a pipeline builder that you chain methods onto:

from etielle import etl

result = (
    etl(data)           # Start with your JSON
    .goto("users")      # Navigate to a path
    .each()             # Iterate over items
    .map_to(...)        # Emit table rows
    .run()              # Execute and get results
)

2. Navigation: goto() and each()

Navigation tells etielle where to find your data:

  • goto(path): Navigate to a nested location (supports dot notation: "data.users")
  • each(): Iterate over items in a list or dict
# Navigate to data["response"]["users"] and iterate
.goto("response.users").each()

# Or use list syntax
.goto(["response", "users"]).each()

3. Fields and TempFields

When you call map_to(), you define what data to extract:

  • Field(name, transform): A column that appears in your output
  • TempField(name, transform): Used for joins/relationships, but NOT in output
.map_to(table="users", fields=[
    Field("id", get("id")),          # Output column (also available for joins)
    Field("name", get("name")),      # Output column
])

4. Transforms: Smart Data Extractors

Transforms are functions that pull values from the current context:

Transform Purpose
get("name") Get field from current node
get_from_parent("id") Get field from parent context
get_from_root("config.version") Get field from JSON root
literal(42) Constant value
key() Current dict key
index() Current list index
concat(a, b, c) Join strings
# Combine transforms
Field("full_id", concat(literal("user_"), get("id")))  # "user_u1"

5. Running the Pipeline

Call .run() to execute and get results:

result = pipeline.run()

# Access tables by name
users = result.tables["users"]    # Dict[tuple, dict]

# Or by model class (if using typed models)
users = result.tables[User]       # Dict[tuple, User]

# Check for errors
if result.errors:
    for table, errs in result.errors.items():
        print(f"{table}: {errs}")

Detailed Examples

Example 1: Merging Data from Multiple Paths

Merge user info from two parts of JSON using goto_root():

from etielle import etl, Field, TempField, get

data = {
    "users": [{"id": "u1", "name": "Alice"}],
    "profiles": [{"user_id": "u1", "email": "alice@example.com"}]
}

result = (
    etl(data)
    # First path: basic user data
    .goto("users").each()
    .map_to(table="users", join_on=["id"], fields=[
        Field("id", get("id")),
        Field("name", get("name")),
    ])
    # Second path: profile data (same table, merged by id)
    .goto_root()
    .goto("profiles").each()
    .map_to(table="users", join_on=["id"], fields=[
        Field("email", get("email")),
        TempField("id", get("user_id"))  # Join key for merging
    ])
    .run()
)

user = list(result.tables["users"].values())[0]
print(user)  # Has id, name, AND email merged together
{'id': 'u1', 'name': 'Alice', 'email': 'alice@example.com'}

Example 2: Deep Nesting

Handle deeply nested structures with chained navigation:

# servers -> channels -> messages -> reactions (3 levels deep)
result = (
    etl(data)
    .goto("servers").each()
    .map_to(table="servers", fields=[...])

    .goto("channels").each()
    .map_to(table="channels", fields=[
        Field("server_id", get_from_parent("id", depth=1)),
        ...
    ])

    .goto("messages").each()
    .map_to(table="messages", fields=[
        Field("channel_id", get_from_parent("id", depth=1)),
        Field("server_id", get_from_parent("id", depth=2)),
        ...
    ])
    .run()
)

Example 3: Typed Output with Pydantic

Use model classes for validated, typed output:

from pydantic import BaseModel
from etielle import etl, Field, TempField, get

class User(BaseModel):
    id: str
    name: str
    email: str | None = None

data = {"users": [{"id": "u1", "name": "Alice"}]}

result = (
    etl(data)
    .goto("users").each()
    .map_to(table=User, fields=[  # Pass model class, not string
        Field("id", get("id")),
        Field("name", get("name")),
    ])
    .run()
)

user = list(result.tables[User].values())[0]
print(f"Type: {type(user).__name__}, name: {user.name}")
Type: User, name: Alice

Transform Cheatsheet

Transform Purpose Example
get(path) From current node get("user.name")
get_from_parent(path, depth=1) From ancestor get_from_parent("id")
get_from_root(path) From JSON root get_from_root("version")
key() Current dict key When iterating {"a": 1, "b": 2}
index() Current list index 0, 1, 2, …
parent_key() Parent’s dict key Access parent iteration key
parent_index() Parent’s list index Access parent iteration index
node() Current node value The whole current object
literal(value) Constant literal(42)
concat(*parts) Join strings concat(get("first"), literal(" "), get("last"))
coalesce(*transforms) First non-None coalesce(get("nickname"), get("name"))
format_id(*parts, sep="_") Join with separator format_id(get("type"), get("id"))
len_of(inner) Length of list/dict/string len_of(get("tags"))

Common Mistakes

  • Empty results?
    • Check your goto() path matches the JSON structure exactly
    • Make sure you called .each() to iterate
  • Missing parent data?
    • Check the depth parameter in get_from_parent()
    • Ensure the parent context exists in your navigation chain
  • Duplicate or missing rows?
    • Verify TempField values are unique for each row
    • Check that join keys don’t contain None values

Next Steps

Glossary

  • Pipeline: The chain of operations from etl() to run()
  • Navigation: Methods like goto() and each() that position you in the JSON
  • Transform: A function that extracts values from the current context
  • Field: An output column in your table
  • TempField: A field used for joins/linking but not in final output
  • Join Key: Values that uniquely identify a row (derived from TempFields)

License

MIT

Need help? Open an issue on GitHub!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

etielle-3.3.0.tar.gz (98.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

etielle-3.3.0-py3-none-any.whl (35.8 kB view details)

Uploaded Python 3

File details

Details for the file etielle-3.3.0.tar.gz.

File metadata

  • Download URL: etielle-3.3.0.tar.gz
  • Upload date:
  • Size: 98.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for etielle-3.3.0.tar.gz
Algorithm Hash digest
SHA256 904b6ea058059fae839188dddb0ca53b497344cfb1cf7fafd4d9245fd8c8e1b5
MD5 f4d8d6028c0c5efbbca682decd97f02b
BLAKE2b-256 7f5a13624aa446a7d66dca29829fe2cfbf7a7132333f18e06adc6597f56deff9

See more details on using hashes here.

Provenance

The following attestation bundles were made for etielle-3.3.0.tar.gz:

Publisher: release.yml on Promptly-Technologies-LLC/etielle

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file etielle-3.3.0-py3-none-any.whl.

File metadata

  • Download URL: etielle-3.3.0-py3-none-any.whl
  • Upload date:
  • Size: 35.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for etielle-3.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 30c0b2049dcacc530230f488eaaf05216fb68de1260a428ea4788f0bae93ff36
MD5 29c4f333069a26e92bb0078c0f5fdbec
BLAKE2b-256 ddef4f514f443f4a87c84d1e11a21d074587337b6cf45339a78a192cc54b88e5

See more details on using hashes here.

Provenance

The following attestation bundles were made for etielle-3.3.0-py3-none-any.whl:

Publisher: release.yml on Promptly-Technologies-LLC/etielle

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page