Type-friendly utilities for moving data between Python objects, Arrow, Polars, Pandas, Spark, and Databricks
Project description
Yggdrasil (Python)
Type-friendly utilities for moving data between Python objects, Arrow, Polars, pandas, Spark, and Databricks. The package bundles enhanced dataclasses, casting utilities, and lightweight wrappers around Databricks and HTTP clients so Python/data engineers can focus on schemas instead of plumbing.
When to use this package
Use Yggdrasil when you need to:
- Convert payloads across dataframe engines without rewriting type logic for each backend.
- Define dataclasses that auto-coerce inputs, expose defaults, and surface Arrow schemas.
- Run Databricks SQL jobs or manage clusters with minimal boilerplate.
- Add resilient retries, concurrency helpers, and dependency guards to data pipelines.
Prerequisites
- Python 3.10+
- uv for virtualenv and dependency management.
Optional extras:
polars,pandas,pyarrow, andpysparkfor engine-specific conversions.databricks-sdkfor workspace, SQL, jobs, and compute helpers.msalfor Azure AD authentication when usingMSALSession.
Installation
From the python/ directory:
uv venv .venv
source .venv/bin/activate
uv pip install -e .[dev]
Extras are grouped by engine:
.[polars],.[pandas],.[spark],.[databricks]– install only the integrations you need..[dev]– adds testing, linting, and typing tools (pytest,ruff,black,mypy).
Quickstart
Define an Arrow-aware dataclass, coerce inputs, and cast across containers:
from yggdrasil import yggdataclass
from yggdrasil.types.cast import convert
from yggdrasil.types import arrow_field_from_hint
@yggdataclass
class User:
id: int
email: str
active: bool = True
user = User.__safe_init__("123", email="alice@example.com")
assert user.id == 123 and user.active is True
payload = {"id": "45", "email": "bob@example.com", "active": "false"}
clean = User.from_dict(payload)
print(clean.to_dict())
field = arrow_field_from_hint(User, name="user")
print(field) # user: struct<id: int64, email: string, active: bool>
numbers = convert(["1", "2", "3"], list[int])
print(numbers)
Databricks example
Install the databricks extra and run SQL with typed results:
from yggdrasil.databricks.workspaces import Workspace
from yggdrasil.databricks.sql import SQLEngine
ws = Workspace(host="https://<workspace-url>", token="<token>")
engine = SQLEngine(workspace=ws)
stmt = engine.execute("SELECT 1 AS value")
result = stmt.wait(engine)
tbl = result.arrow_table()
print(tbl.to_pandas())
Parallel processing and retries
from yggdrasil.pyutils import parallelize, retry
@parallelize(max_workers=4)
def square(x):
return x * x
@retry(tries=5, delay=0.2, backoff=2)
def sometimes_fails(value: int) -> int:
...
print(list(square(range(5))))
Project layout
yggdrasil/dataclasses–yggdataclassdecorator plus Arrow schema helpers.yggdrasil/types– casting registry (convert,register_converter), Arrow inference, and default generators.yggdrasil/libs– optional bridges to Polars, pandas, Spark, and Databricks SDK types.yggdrasil/databricks– workspace, SQL, jobs, and compute helpers built on the Databricks SDK.yggdrasil/requests– retry-capable HTTP sessions and Azure MSAL auth helpers.yggdrasil/pyutils– concurrency and retry decorators.yggdrasil/ser– serialization helpers and dependency inspection utilities.tests/– pytest-based coverage for conversions, dataclasses, requests, and platform helpers.
Testing
From python/:
pytest
Optional checks when developing:
ruff check
black .
mypy
Troubleshooting and common pitfalls
- Missing optional dependency: Install the matching extra (e.g.,
uv pip install -e .[polars]) or wrap calls withrequire_polars/require_pysparkfromyggdrasil.libs. - Schema mismatches: Use
arrow_field_from_hintandCastOptionsto enforce expected Arrow metadata when casting. - Databricks auth: Provide
hostandtokentoWorkspace. For Azure, ensure environment variables align with your workspace deployment.
Contributing
- Fork and branch.
- Install with
uv pip install -e .[dev]. - Run tests and linters.
- Submit a PR describing the change and any new examples added to the docs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ygg-0.1.38.tar.gz.
File metadata
- Download URL: ygg-0.1.38.tar.gz
- Upload date:
- Size: 143.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53add7f2d700a7b4c9b72b011f323dfcceb0cf352aba156ea6c4887a5c0dc551
|
|
| MD5 |
10c7a156159e46321354d0e499c37c99
|
|
| BLAKE2b-256 |
5f255fb6d9099e3b07b62f98734af25dbeb958980f2bb9083818bbb52bcb9861
|
File details
Details for the file ygg-0.1.38-py3-none-any.whl.
File metadata
- Download URL: ygg-0.1.38-py3-none-any.whl
- Upload date:
- Size: 166.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
530637056000868da049d77c29b0fc784fcb162534d150096324e7cace225dce
|
|
| MD5 |
86a8222555af3db5b6f66aadb430487a
|
|
| BLAKE2b-256 |
4d8bca2d956ede3ad608d604d35b05507229cf96b7317fa3384fecca6c753c96
|