Skip to main content

A declarative, type-safe Python DSL for mapping complex nested JSON to relational database schemas

Project description

Quickstart: Declarative JSON-to-Relational Mapping in Python

etielle is a simple, powerful Python library for reshaping nested JSON data, typically from an API, into relational tables that fit your database schema. Think of etielle as a “JSON extractor” that you program with clear instructions: “Go here in the JSON, pull this data, and put it in that table.” The library’s name is a play on ETL (“Extract, Transform, Load”), which is the technical term for this set of operations.

Why Use etielle? (For Beginners)

JSON data from APIs is often deeply nested and requires complicated parsing. etielle helps by:

  • Traversing nested structures: Walk through arrays-within-dictionaries-within-arrays to any arbitrary depth.
  • Performing arbitrary transformations: Use built-in functions or define your own custom transforms.
  • Building relationships: Link records across your different output tables automatically.
  • Emitting to arbitrary formats: Emit data to Pydantic models, TypedDicts, or ORM objects directly.
  • Optionally loading data into a database: Load data into a database using SQLAlchemy or SQLModel.

Learning Path

  1. Quickstart: Quick and dirty introduction to etielle and how to use it.
  2. Introduction to ETL: The problem etielle is solving: JSON data ETL.
  3. Navigation: How to navigate through your JSON data with goto() and each().
  4. Transforms: Getting and altering values from the JSON data.
  5. Mapping Tables: Outputting data with Field and TempField.
  6. Relationships: Linking tables together with link_to().
  7. Database Loading: Persisting data with load() and run().

Installation

We recommend using uv for faster installs, but pip works too.

uv add etielle
# or
pip install etielle

Optional: Install with ORM adapters

If you plan to bind relationships and flush to your database via SQLAlchemy or SQLModel, install with the optional extra for your ORM:

uv add "etielle[sqlalchemy]"
# or
uv add "etielle[sqlmodel]"

Quick Start: Your First Mapping

Let’s start with a simple example. Suppose you have this JSON:

import json

data = {
  "users": [
    {"id": "u1", "name": "Alice", "posts": [{"id": "p1", "title": "Hello"}, {"id": "p2", "title": "World"}]},
    {"id": "u2", "name": "Bob", "posts": []}
  ]
}

We want two tables: “users” (id, name) and “posts” (id, user_id, title).

Here’s the code using the fluent API:

from etielle import etl, Field, TempField, get, get_from_parent

# Build and run the pipeline
result = (
    etl(data)
    # Extract users
    .goto("users").each()
    .map_to(table="users", fields=[
        Field("id", get("id")),
        Field("name", get("name")),
    ])
    # Extract posts (nested under each user)
    .goto("posts").each()
    .map_to(table="posts", fields=[
        Field("id", get("id")),
        Field("user_id", get_from_parent("id")),  # Link to parent user
        Field("title", get("title")),
    ])
    .run()
)

# result.tables gives you dict access by table name
out = {table: list(rows.values()) for table, rows in result.tables.items()}
print(json.dumps(out, indent=2))
{
  "users": [
    {
      "id": "u1",
      "name": "Alice"
    },
    {
      "id": "u2",
      "name": "Bob"
    }
  ],
  "posts": [
    {
      "id": "p1",
      "user_id": "u1",
      "title": "Hello"
    },
    {
      "id": "p2",
      "user_id": "u1",
      "title": "World"
    }
  ]
}

Congrats! You’ve mapped your first JSON using the fluent E->T->L pattern.

Core Concepts: Breaking It Down

1. The etl() Entry Point

Everything starts with etl(data). This creates a pipeline builder that you chain methods onto:

from etielle import etl

result = (
    etl(data)           # Start with your JSON
    .goto("users")      # Navigate to a path
    .each()             # Iterate over items
    .map_to(...)        # Emit table rows
    .run()              # Execute and get results
)

2. Navigation: goto() and each()

Navigation tells etielle where to find your data:

  • goto(path): Navigate to a nested location (supports dot notation: "data.users")
  • each(): Iterate over items in a list or dict
# Navigate to data["response"]["users"] and iterate
.goto("response.users").each()

# Or use list syntax
.goto(["response", "users"]).each()

3. Fields and TempFields

When you call map_to(), you define what data to extract:

  • Field(name, transform): A column that appears in your output
  • TempField(name, transform): Used for joins/relationships, but NOT in output
.map_to(table="users", fields=[
    Field("id", get("id")),          # Output column (also available for joins)
    Field("name", get("name")),      # Output column
])

4. Transforms: Smart Data Extractors

Transforms are functions that pull values from the current context:

Transform Purpose
get("name") Get field from current node
get_from_parent("id") Get field from parent context
get_from_root("config.version") Get field from JSON root
literal(42) Constant value
key() Current dict key
index() Current list index
concat(a, b, c) Join strings
# Combine transforms
Field("full_id", concat(literal("user_"), get("id")))  # "user_u1"

5. Running the Pipeline

Call .run() to execute and get results:

result = pipeline.run()

# Access tables by name
users = result.tables["users"]    # Dict[tuple, dict]

# Or by model class (if using typed models)
users = result.tables[User]       # Dict[tuple, User]

# Check for errors
if result.errors:
    for table, errs in result.errors.items():
        print(f"{table}: {errs}")

Detailed Examples

Example 1: Merging Data from Multiple Paths

Merge user info from two parts of JSON using goto_root():

from etielle import etl, Field, TempField, get

data = {
    "users": [{"id": "u1", "name": "Alice"}],
    "profiles": [{"user_id": "u1", "email": "alice@example.com"}]
}

result = (
    etl(data)
    # First path: basic user data
    .goto("users").each()
    .map_to(table="users", join_on=["id"], fields=[
        Field("id", get("id")),
        Field("name", get("name")),
    ])
    # Second path: profile data (same table, merged by id)
    .goto_root()
    .goto("profiles").each()
    .map_to(table="users", join_on=["id"], fields=[
        Field("email", get("email")),
        TempField("id", get("user_id"))  # Join key for merging
    ])
    .run()
)

user = list(result.tables["users"].values())[0]
print(user)  # Has id, name, AND email merged together
{'id': 'u1', 'name': 'Alice', 'email': 'alice@example.com'}

Example 2: Deep Nesting

Handle deeply nested structures with chained navigation:

# servers -> channels -> messages -> reactions (3 levels deep)
result = (
    etl(data)
    .goto("servers").each()
    .map_to(table="servers", fields=[...])

    .goto("channels").each()
    .map_to(table="channels", fields=[
        Field("server_id", get_from_parent("id", depth=1)),
        ...
    ])

    .goto("messages").each()
    .map_to(table="messages", fields=[
        Field("channel_id", get_from_parent("id", depth=1)),
        Field("server_id", get_from_parent("id", depth=2)),
        ...
    ])
    .run()
)

Example 3: Typed Output with Pydantic

Use model classes for validated, typed output:

from pydantic import BaseModel
from etielle import etl, Field, TempField, get

class User(BaseModel):
    id: str
    name: str
    email: str | None = None

data = {"users": [{"id": "u1", "name": "Alice"}]}

result = (
    etl(data)
    .goto("users").each()
    .map_to(table=User, fields=[  # Pass model class, not string
        Field("id", get("id")),
        Field("name", get("name")),
    ])
    .run()
)

user = list(result.tables[User].values())[0]
print(f"Type: {type(user).__name__}, name: {user.name}")
Type: User, name: Alice

Transform Cheatsheet

Transform Purpose Example
get(path) From current node get("user.name")
get_from_parent(path, depth=1) From ancestor get_from_parent("id")
get_from_root(path) From JSON root get_from_root("version")
key() Current dict key When iterating {"a": 1, "b": 2}
index() Current list index 0, 1, 2, …
parent_key() Parent’s dict key Access parent iteration key
parent_index() Parent’s list index Access parent iteration index
node() Current node value The whole current object
literal(value) Constant literal(42)
concat(*parts) Join strings concat(get("first"), literal(" "), get("last"))
coalesce(*transforms) First non-None coalesce(get("nickname"), get("name"))
format_id(*parts, sep="_") Join with separator format_id(get("type"), get("id"))
len_of(inner) Length of list/dict/string len_of(get("tags"))

Common Mistakes

  • Empty results?
    • Check your goto() path matches the JSON structure exactly
    • Make sure you called .each() to iterate
  • Missing parent data?
    • Check the depth parameter in get_from_parent()
    • Ensure the parent context exists in your navigation chain
  • Duplicate or missing rows?
    • Verify TempField values are unique for each row
    • Check that join keys don’t contain None values

Next Steps

Glossary

  • Pipeline: The chain of operations from etl() to run()
  • Navigation: Methods like goto() and each() that position you in the JSON
  • Transform: A function that extracts values from the current context
  • Field: An output column in your table
  • TempField: A field used for joins/linking but not in final output
  • Join Key: Values that uniquely identify a row (derived from TempFields)

License

MIT

Need help? Open an issue on GitHub!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

etielle-2.6.0.tar.gz (80.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

etielle-2.6.0-py3-none-any.whl (33.0 kB view details)

Uploaded Python 3

File details

Details for the file etielle-2.6.0.tar.gz.

File metadata

  • Download URL: etielle-2.6.0.tar.gz
  • Upload date:
  • Size: 80.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for etielle-2.6.0.tar.gz
Algorithm Hash digest
SHA256 1f11294a614eeb973f8a29eee15d6a783e0807cac8dee9a277bef137f082c6f2
MD5 d437bc5beecbcc32a551c9d74d34c736
BLAKE2b-256 e0c95df4ed9760c7f26b84761db00dce58cac4b8048a8acd6098a6bb866749d6

See more details on using hashes here.

Provenance

The following attestation bundles were made for etielle-2.6.0.tar.gz:

Publisher: release.yml on Promptly-Technologies-LLC/etielle

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file etielle-2.6.0-py3-none-any.whl.

File metadata

  • Download URL: etielle-2.6.0-py3-none-any.whl
  • Upload date:
  • Size: 33.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for etielle-2.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 26ba4f324b35545887423068bd0cbb41d6682132818ac66011ba9a462086f2ae
MD5 cabebe26573711ec5afd46a45cb5e175
BLAKE2b-256 f5f81944b2c32cbf22d04d7b232acc83ea4e7249e0010f3b1d74e4dc8cf58516

See more details on using hashes here.

Provenance

The following attestation bundles were made for etielle-2.6.0-py3-none-any.whl:

Publisher: release.yml on Promptly-Technologies-LLC/etielle

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page