Skip to main content

A declarative, type-safe Python DSL for mapping complex nested JSON to relational database schemas

Project description

Quickstart: Declarative JSON-to-Relational Mapping in Python

etielle is a simple, powerful Python library for reshaping nested JSON data, typically from an API, into relational tables that fit your database schema. Think of etielle as a “JSON extractor” that you program with clear instructions: “Go here in the JSON, pull this data, and put it in that table.” The library’s name is a play on ETL (“Extract, Transform, Load”), which is the technical term for this set of operations.

Why Use etielle? (For Beginners)

JSON data from APIs is often deeply nested and requires complicated parsing. etielle helps by:

  • Traversing nested structures: Walk through arrays-within-dictionaries-within-arrays to any arbitrary depth.
  • Performing arbitrary transformations: Use built-in functions or define your own custom transforms.
  • Building relationships: Link records across your different output tables automatically.
  • Emitting to arbitrary formats: Emit data to Pydantic models, TypedDicts, or ORM objects directly.
  • Optionally loading data into a database: Load data into a database using SQLAlchemy or SQLModel.

Learning Path

  1. Quickstart: Quick and dirty introduction to etielle and how to use it.
  2. Introduction to ETL: The problem etielle is solving: JSON data ETL.
  3. Navigation: How to navigate through your JSON data with goto() and each().
  4. Transforms: Getting and altering values from the JSON data.
  5. Mapping Tables: Outputting data with Field and TempField.
  6. Relationships: Linking tables together with link_to().
  7. Database Loading: Persisting data with load() and run().

Installation

We recommend using uv for faster installs, but pip works too.

uv add etielle
# or
pip install etielle

Optional: Install with ORM adapters

If you plan to bind relationships and flush to your database via SQLAlchemy or SQLModel, install with the optional extra for your ORM:

uv add "etielle[sqlalchemy]"
# or
uv add "etielle[sqlmodel]"

Quick Start: Your First Mapping

Let’s start with a simple example. Suppose you have this JSON:

import json

data = {
  "users": [
    {"id": "u1", "name": "Alice", "posts": [{"id": "p1", "title": "Hello"}, {"id": "p2", "title": "World"}]},
    {"id": "u2", "name": "Bob", "posts": []}
  ]
}

We want two tables: “users” (id, name) and “posts” (id, user_id, title).

Here’s the code using the fluent API:

from etielle import etl, Field, TempField, get, get_from_parent

# Build and run the pipeline
result = (
    etl(data)
    # Extract users
    .goto("users").each()
    .map_to(table="users", fields=[
        Field("id", get("id")),
        Field("name", get("name")),
    ])
    # Extract posts (nested under each user)
    .goto("posts").each()
    .map_to(table="posts", fields=[
        Field("id", get("id")),
        Field("user_id", get_from_parent("id")),  # Link to parent user
        Field("title", get("title")),
    ])
    .run()
)

# result.tables gives you dict access by table name
out = {table: list(rows.values()) for table, rows in result.tables.items()}
print(json.dumps(out, indent=2))
{
  "users": [
    {
      "id": "u1",
      "name": "Alice"
    },
    {
      "id": "u2",
      "name": "Bob"
    }
  ],
  "posts": [
    {
      "id": "p1",
      "user_id": "u1",
      "title": "Hello"
    },
    {
      "id": "p2",
      "user_id": "u1",
      "title": "World"
    }
  ]
}

Congrats! You’ve mapped your first JSON using the fluent E->T->L pattern.

Core Concepts: Breaking It Down

1. The etl() Entry Point

Everything starts with etl(data). This creates a pipeline builder that you chain methods onto:

from etielle import etl

result = (
    etl(data)           # Start with your JSON
    .goto("users")      # Navigate to a path
    .each()             # Iterate over items
    .map_to(...)        # Emit table rows
    .run()              # Execute and get results
)

2. Navigation: goto() and each()

Navigation tells etielle where to find your data:

  • goto(path): Navigate to a nested location (supports dot notation: "data.users")
  • each(): Iterate over items in a list or dict
# Navigate to data["response"]["users"] and iterate
.goto("response.users").each()

# Or use list syntax
.goto(["response", "users"]).each()

3. Fields and TempFields

When you call map_to(), you define what data to extract:

  • Field(name, transform): A column that appears in your output
  • TempField(name, transform): Used for joins/relationships, but NOT in output
.map_to(table="users", fields=[
    Field("id", get("id")),          # Output column (also available for joins)
    Field("name", get("name")),      # Output column
])

4. Transforms: Smart Data Extractors

Transforms are functions that pull values from the current context:

Transform Purpose
get("name") Get field from current node
get_from_parent("id") Get field from parent context
get_from_root("config.version") Get field from JSON root
literal(42) Constant value
key() Current dict key
index() Current list index
concat(a, b, c) Join strings
# Combine transforms
Field("full_id", concat(literal("user_"), get("id")))  # "user_u1"

5. Running the Pipeline

Call .run() to execute and get results:

result = pipeline.run()

# Access tables by name
users = result.tables["users"]    # Dict[tuple, dict]

# Or by model class (if using typed models)
users = result.tables[User]       # Dict[tuple, User]

# Check for errors
if result.errors:
    for table, errs in result.errors.items():
        print(f"{table}: {errs}")

Detailed Examples

Example 1: Merging Data from Multiple Paths

Merge user info from two parts of JSON using goto_root():

from etielle import etl, Field, TempField, get

data = {
    "users": [{"id": "u1", "name": "Alice"}],
    "profiles": [{"user_id": "u1", "email": "alice@example.com"}]
}

result = (
    etl(data)
    # First path: basic user data
    .goto("users").each()
    .map_to(table="users", join_on=["id"], fields=[
        Field("id", get("id")),
        Field("name", get("name")),
    ])
    # Second path: profile data (same table, merged by id)
    .goto_root()
    .goto("profiles").each()
    .map_to(table="users", join_on=["id"], fields=[
        Field("email", get("email")),
        TempField("id", get("user_id"))  # Join key for merging
    ])
    .run()
)

user = list(result.tables["users"].values())[0]
print(user)  # Has id, name, AND email merged together
{'id': 'u1', 'name': 'Alice', 'email': 'alice@example.com'}

Example 2: Deep Nesting

Handle deeply nested structures with chained navigation:

# servers -> channels -> messages -> reactions (3 levels deep)
result = (
    etl(data)
    .goto("servers").each()
    .map_to(table="servers", fields=[...])

    .goto("channels").each()
    .map_to(table="channels", fields=[
        Field("server_id", get_from_parent("id", depth=1)),
        ...
    ])

    .goto("messages").each()
    .map_to(table="messages", fields=[
        Field("channel_id", get_from_parent("id", depth=1)),
        Field("server_id", get_from_parent("id", depth=2)),
        ...
    ])
    .run()
)

Example 3: Typed Output with Pydantic

Use model classes for validated, typed output:

from pydantic import BaseModel
from etielle import etl, Field, TempField, get

class User(BaseModel):
    id: str
    name: str
    email: str | None = None

data = {"users": [{"id": "u1", "name": "Alice"}]}

result = (
    etl(data)
    .goto("users").each()
    .map_to(table=User, fields=[  # Pass model class, not string
        Field("id", get("id")),
        Field("name", get("name")),
    ])
    .run()
)

user = list(result.tables[User].values())[0]
print(f"Type: {type(user).__name__}, name: {user.name}")
Type: User, name: Alice

Transform Cheatsheet

Transform Purpose Example
get(path) From current node get("user.name")
get_from_parent(path, depth=1) From ancestor get_from_parent("id")
get_from_root(path) From JSON root get_from_root("version")
key() Current dict key When iterating {"a": 1, "b": 2}
index() Current list index 0, 1, 2, …
parent_key() Parent’s dict key Access parent iteration key
parent_index() Parent’s list index Access parent iteration index
node() Current node value The whole current object
literal(value) Constant literal(42)
concat(*parts) Join strings concat(get("first"), literal(" "), get("last"))
coalesce(*transforms) First non-None coalesce(get("nickname"), get("name"))
format_id(*parts, sep="_") Join with separator format_id(get("type"), get("id"))
len_of(inner) Length of list/dict/string len_of(get("tags"))

Common Mistakes

  • Empty results?
    • Check your goto() path matches the JSON structure exactly
    • Make sure you called .each() to iterate
  • Missing parent data?
    • Check the depth parameter in get_from_parent()
    • Ensure the parent context exists in your navigation chain
  • Duplicate or missing rows?
    • Verify TempField values are unique for each row
    • Check that join keys don’t contain None values

Next Steps

Glossary

  • Pipeline: The chain of operations from etl() to run()
  • Navigation: Methods like goto() and each() that position you in the JSON
  • Transform: A function that extracts values from the current context
  • Field: An output column in your table
  • TempField: A field used for joins/linking but not in final output
  • Join Key: Values that uniquely identify a row (derived from TempFields)

License

MIT

Need help? Open an issue on GitHub!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

etielle-3.6.0.tar.gz (119.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

etielle-3.6.0-py3-none-any.whl (40.4 kB view details)

Uploaded Python 3

File details

Details for the file etielle-3.6.0.tar.gz.

File metadata

  • Download URL: etielle-3.6.0.tar.gz
  • Upload date:
  • Size: 119.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for etielle-3.6.0.tar.gz
Algorithm Hash digest
SHA256 0c36047eb7879ca3600243cbbaec09752ccac21e647dc4970559aa6902140329
MD5 08c4529d382640aa1683e66fdcbea578
BLAKE2b-256 8d6302fdee12faa3c983ba348bc65250761686b3419495a3bd810472b314ead1

See more details on using hashes here.

Provenance

The following attestation bundles were made for etielle-3.6.0.tar.gz:

Publisher: release.yml on Promptly-Technologies-LLC/etielle

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file etielle-3.6.0-py3-none-any.whl.

File metadata

  • Download URL: etielle-3.6.0-py3-none-any.whl
  • Upload date:
  • Size: 40.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for etielle-3.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7436295367d2ea2a5efea40c012321f762ca55f242676a158a4a7540dac49991
MD5 60e2a943191730fd78c3f29244f5ef13
BLAKE2b-256 557231bc26b33de122ef7fcb2fbbe0ff7e88ec24fd84ee6152d722163e1f87ce

See more details on using hashes here.

Provenance

The following attestation bundles were made for etielle-3.6.0-py3-none-any.whl:

Publisher: release.yml on Promptly-Technologies-LLC/etielle

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page