A declarative, type-safe Python DSL for mapping complex nested JSON to relational database schemas
Project description
Quickstart: Declarative JSON-to-Relational Mapping in Python with
etielle
etielle is a simple, powerful Python library for reshaping nested
JSON data, typically from an API,
into relational tables that fit your database schema. Think of etielle
as a “JSON extractor” that you program with clear instructions: “Go here
in the JSON, pull this data, and put it in that table.” The library’s
name is a play on ETL (“Extract, Transform,
Load”), which is the technical term for
this set of operations.
- Repository: Promptly-Technologies-LLC/etielle
- PyPI:
etielle - Python: ≥ 3.13
Why Use etielle? (For Beginners)
JSON data from APIs (Application Program Interfaces — web services that
typically return JSON) is often deeply nested and requires complicated
parsing. etielle helps by:
- Traversing nested structures: Walk through arrays-within-dictionaries-within-arrays to any arbitrary depth.
- Performing arbitrary transformations: Use the provided functions to perform common operations (like getting the key or index of the current item or its parent), or define your own custom ones.
- Building relationships: Link records across your different output tables and emit ORM relationships or foreign keys.
- Emitting to arbitrary formats: Emit data to Pydantic models, TypedDicts, or ORM objects directly instead of plain dicts, with validation and type safety.
- Optionally loading data into a database: Load data into a database using SQLAlchemy or SQLModel with performant one-shot flushing.
Learning Path
- Quickstart: Quick and dirty introduction to
etielleand how to use it. - Introduction to ETL: The problem
etielleis solving: JSON data ETL (Extract, Transform, and Load). - Traversals: How to tell
etiellehow to traverse your JSON data. - Transforms: Getting and altering values from the JSON data and mapping them in a type-safe way to your output tables.
- Emissions: Outputting data to dictionaries, TypedDicts, Pydantic models, or ORM objects, with merge logic to construct single rows from different parts of the input JSON data.
- Database upserts: Optionally, creating relationships in memory and flushing data into a database with performant one-shot flushing.
Installation
We recommend using uv for faster installs, but pip works too.
uv add etielle
# or
pip install etielle
Optional: Install with ORM adapters
If you plan to bind relationships and flush to your database via SQLAlchemy or SQLModel, install with the optional extra for your ORM:
uv add "etielle[sqlalchemy]"
# or
uv add "etielle[sqlmodel]"
Quick Start: Your First Mapping
Let’s start with a simple example. Suppose you have this JSON:
import json
data = {
"users": [
{"id": "u1", "name": "Alice", "posts": [{"id": "p1", "title": "Hello"}, {"id": "p2", "title": "World"}]},
{"id": "u2", "name": "Bob", "posts": []}
]
}
We want two tables: “users” (id, name) and “posts” (id, user_id, title).
Here’s the code:
from etielle.core import MappingSpec, TraversalSpec, TableEmit, Field # Core building blocks
from etielle.transforms import get, get_from_parent # Functions to pull data from JSON
from etielle.executor import run_mapping # The engine that runs everything
# A TraversalSpec tells etielle how to walk through your JSON. Think of it as
# giving directions: "Start at the 'users' key, then loop through each item in that array."
# Traverse users array
users_traversal = TraversalSpec(
path=["users"], # Path to the array
mode="auto", # Iterate automatically based on container
emits=[
# The join_keys identify each unique row—like a primary key in a database.
# Rows with matching keys will be merged together.
TableEmit(
table="users",
join_keys=[get("id")], # Unique key for the row
fields=[
Field("id", get("id")),
Field("name", get("name"))
]
)
]
)
# This second traversal is nested: first we navigate to each user,
# then for each user we go into their posts array using inner_path.
posts_traversal = TraversalSpec(
path=["users"],
mode="auto",
inner_path=["posts"], # Nested path inside each user
inner_mode="auto",
emits=[
TableEmit(
table="posts",
join_keys=[get("id")],
fields=[
Field("id", get("id")),
Field("user_id", get_from_parent("id")), # Link to parent user
Field("title", get("title"))
]
)
]
)
spec = MappingSpec(traversals=[users_traversal, posts_traversal])
result = run_mapping(data, spec)
# result is a dict: {"users": MappingResult, "posts": MappingResult}
# Each MappingResult has .instances (a dict keyed by join_keys)
# Let's convert to simple lists for display:
out = {table: list(mr.instances.values()) for table, mr in result.items()}
print(json.dumps(out, indent=2))
{
"users": [
{
"id": "u1",
"name": "Alice"
},
{
"id": "u2",
"name": "Bob"
}
],
"posts": [
{
"id": "p1",
"user_id": "u1",
"title": "Hello"
},
{
"id": "p2",
"user_id": "u1",
"title": "World"
}
]
}
Congrats! You’ve mapped your first JSON.
Core Concepts: Breaking It Down
Let’s explain the building blocks like you’re learning for the first time.
1. Context: Your “Location” in the JSON
Imagine traversing a JSON tree—Context is your GPS:
root: The entire JSON.node: The current spot (e.g., a user object).path: Directions to get here (e.g., (“users”, 0)).parent: The previous spot (for looking “up”).key/index: If in a dict/list, the current key or index.slots: A notepad for temporary notes.
Contexts are created automatically as you traverse and are immutable (unchangeable) for safety.
2. Transforms: Smart Data Extractors
Transforms are like mini-functions that pull values from Context. They’re “lazy”—they don’t run until needed, and they adapt to the current Context.
Examples:
get("name"): Get “name” from current node →"Alice"when node is{"name": "Alice"}get_from_parent("id"): Get “id” from parent context →"u1"when processing a post under user u1index(): Current list position →0for first item,1for second, etc.concat(literal("user_"), get("id")): Combine strings →"user_u1"
Full list in the Cheatsheet below.
3. TraversalSpec: How to Walk the JSON
This says: “Start here, then go deeper if needed, and do this for each item.”
path: Starting path (list of strings, e.g., [“users”]).mode: Iteration mode for the outer container: “auto” (default), “items”, or “single”.inner_path: Optional deeper path (e.g., [“posts”] for nesting).inner_mode: Iteration mode for the inner container: “auto” (default), “items”, or “single”.emits: What tables to create from each item.
You can have multiple Traversals in one MappingSpec—they run independently.
Here’s a visual representation of how traversals work:
JSON structure:
root
└── users [] ← path=["users"]
├── [0] {"id": "u1", ...}
│ └── posts [] ← inner_path=["posts"]
│ ├── [0] {"id": "p1", "title": "Hello"}
│ └── [1] {"id": "p2", "title": "World"}
└── [1] {"id": "u2", ...}
4. TableEmit and Fields: Building Your Tables
table: Name of the table.fields: List of Field(name, transform) – columns and how to compute them.join_keys: List of transforms for unique row IDs (like primary keys). Same keys across traversals merge rows.
5. Executor: Running It All
run_mapping(json_data, spec) executes everything and returns a dict of
tables.
Detailed Examples
Example 1: Composite Keys for Merging Data
Merge user info from two parts of JSON:
spec = MappingSpec(traversals=[
TraversalSpec( # Basic user data
path=["users"],
mode="auto",
emits=[TableEmit(
table="users",
join_keys=[get("id")],
fields=[Field("id", get("id")), Field("name", get("name"))]
)]
),
TraversalSpec( # Add email from another section
path=["profiles"],
mode="auto",
emits=[TableEmit(
table="users", # Same table!
join_keys=[get("user_id")], # Matches previous keys
fields=[Field("email", get("email"))]
)]
)
])
Rows with matching keys merge: e.g., add “email” to existing user row.
Example 2: Deep Nesting (Arbitrary Depth)
No limit to depth—use longer inner_path. The depth parameter
controls how many levels up to look:
get_from_parent("id")ordepth=1→ immediate parentget_from_parent("id", depth=2)→ grandparentget_from_parent("id", depth=3)→ great-grandparent
spec = MappingSpec(traversals=[
TraversalSpec(
path=["servers"],
mode="auto",
inner_path=["channels", "messages", "reactions"], # 3 levels deep!
inner_mode="auto",
emits=[TableEmit(
table="reactions",
join_keys=[get_from_parent("id", depth=3), get_from_parent("id", depth=2), get_from_parent("id"), get("id")],
fields=[
Field("server_id", get_from_parent("id", depth=3)),
Field("channel_id", get_from_parent("id", depth=2)),
Field("message_id", get_from_parent("id")),
Field("reaction", get("emoji"))
]
)]
)
])
Transform Cheatsheet
get(path): From current node (dot notation or list, e.g., “user.name” or [“user”, 0]).get_from_parent(path, depth=1): From ancestor.get_from_root(path): From top-level JSON.key(): Current dict key.index(): Current list index.literal(value): Constant value.concat(*parts): Join strings.format_id(*parts, sep="_"): Join non-empty parts with separator.coalesce(*transforms): First non-None value.len_of(inner): Length of a list/dict/string.
Pro Tip: Transforms are lazy—they run in the “context” of where they’re used, making them super flexible.
Transforms compose naturally:
user_key = concat(literal("user_"), get("id")) # "user_123"
full_name = concat(get("first"), literal(" "), get("last")) # "Alice Smith"
Common Mistakes
- Empty results?
- Check your
pathmatches the JSON structure exactly - Verify the data type at that path matches expectations
- Check your
- Missing parent data?
- Check the
depthparameter inget_from_parent() - Ensure the parent context exists in your traversal
- Check the
- Duplicate or missing rows?
- Verify
join_keysare unique for each row - Check that join_keys don’t contain
Nonevalues (these rows are skipped)
- Verify
Advanced Topics
- Lazy Evaluation: Transforms don’t compute until executed, adapting to the current spot in JSON.
- Custom Transforms: Define your own functions that take Context and return values. See Transforms.
- Row Merging Rules: Last write wins for duplicate fields; missing keys skip rows.
- Type-safe field selectors: IDE autocomplete and compile-time typo detection. See Developing with etielle.
- Typed emissions: Build Pydantic/TypedDict/ORM instances directly instead of dicts. See Emissions.
- Merge policies: Sum/append/min/max instead of overwrite when multiple traversals update the same field. See Emissions.
- Error reporting: Per-key diagnostics in results. See Developing with etielle.
- Relationships without extra round trips: Bind in-memory, flush once. See Relationships and Database upserts.
- Performance: Efficient for large JSON; traversals are independent.
Roadmap Ideas
- Database integrations (e.g., SQLAlchemy).
- More examples and benchmarks.
- Visual mapping tools.
Glossary
- Context: Your current position while traversing the JSON tree
- Transform: A function that extracts values from a Context
- Traversal: Instructions for walking through part of the JSON
- Emit: Creating a table row from the current context
- Join keys: Values that uniquely identify a row (like primary keys)
- Depth: How many parent levels to traverse upward
License
MIT
Need help? Open an issue on GitHub!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file etielle-2.3.2.tar.gz.
File metadata
- Download URL: etielle-2.3.2.tar.gz
- Upload date:
- Size: 48.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d96b958405a2103e47eb6e0a9abb03a56505f24d57ce84d9e8df09e0e3b1f7ba
|
|
| MD5 |
6c731cdc33792c3d0f76285de7c59d89
|
|
| BLAKE2b-256 |
025df8f38b2be40278cf1f816fb8233a618a07acfb69cd5bd92fadd2c36a860d
|
Provenance
The following attestation bundles were made for etielle-2.3.2.tar.gz:
Publisher:
release.yml on Promptly-Technologies-LLC/etielle
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
etielle-2.3.2.tar.gz -
Subject digest:
d96b958405a2103e47eb6e0a9abb03a56505f24d57ce84d9e8df09e0e3b1f7ba - Sigstore transparency entry: 731355654
- Sigstore integration time:
-
Permalink:
Promptly-Technologies-LLC/etielle@d2fd107b80907a13a0f22f891d6aa8fc285c7a35 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Promptly-Technologies-LLC
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d2fd107b80907a13a0f22f891d6aa8fc285c7a35 -
Trigger Event:
workflow_run
-
Statement type:
File details
Details for the file etielle-2.3.2-py3-none-any.whl.
File metadata
- Download URL: etielle-2.3.2-py3-none-any.whl
- Upload date:
- Size: 21.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
003440bc347fde834b1da84e43fb8354832e710af1ceb6d92f79ed6c522626f4
|
|
| MD5 |
f64d434d140be0a2e582a30eaa1c87eb
|
|
| BLAKE2b-256 |
aff1ce81bc620887fd0de54dbcc0ec1c26686694a4cc90b8c5bd2f6908ba19d0
|
Provenance
The following attestation bundles were made for etielle-2.3.2-py3-none-any.whl:
Publisher:
release.yml on Promptly-Technologies-LLC/etielle
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
etielle-2.3.2-py3-none-any.whl -
Subject digest:
003440bc347fde834b1da84e43fb8354832e710af1ceb6d92f79ed6c522626f4 - Sigstore transparency entry: 731355656
- Sigstore integration time:
-
Permalink:
Promptly-Technologies-LLC/etielle@d2fd107b80907a13a0f22f891d6aa8fc285c7a35 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Promptly-Technologies-LLC
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d2fd107b80907a13a0f22f891d6aa8fc285c7a35 -
Trigger Event:
workflow_run
-
Statement type: