Type-friendly utilities for moving data between Python objects, Arrow, Polars, Pandas, Spark, and Databricks
Project description
Yggdrasil (Python)
Type-friendly utilities for moving data between Python objects, Arrow, Polars, Pandas, Spark, and Databricks. The package bundles dataclass helpers, casting utilities, and light wrappers around Databricks and HTTP clients so Python/data engineers can focus on schemas instead of plumbing.
Features
@yggdataclassdecorator that adds safe init/from/to helpers and Arrow schema awareness.- Rich conversion registry to cast between Python types, Arrow, Polars, Pandas, and Spark objects.
- Arrow type inference from Python type hints and sensible default values for common dtypes.
- Parallelization and retry utilities for robust data pipelines.
- Databricks helpers for SQL execution, workspace file management, jobs, and compute interactions.
- HTTP sessions with built-in retries plus optional Azure MSAL authentication.
Installation
Requirements: Python 3.10+ and uv.
# from the python/ directory
uv venv .venv
source .venv/bin/activate
uv pip install -e .[dev]
The editable install makes it easy to iterate locally. Add .[dev] to include pytest, black, ruff, and mypy for development.
Quickstart
Import the package and use the provided helpers to define dataclasses and perform typed conversions.
from yggdrasil import yggdataclass, convert
from yggdrasil.types import arrow_field_from_hint
@yggdataclass
class User:
id: int
email: str
active: bool = True
# Safe construction with type conversion and defaults
user = User.__safe_init__("123", email="alice@example.com")
assert user.id == 123
# Convert incoming payloads to typed instances
payload = {"id": "45", "email": "bob@example.com", "active": "false"}
clean = User.from_dict(payload)
# Arrow schema from type hints
field = User.__arrow_field__(name="user")
print(field) # user: struct<id: int64, email: string, active: bool>
# Cast between types
from yggdrasil.types.cast import convert
converted = convert(["1", "2", "3"], list[int])
# Parallelize a function over an iterable
from yggdrasil.pyutils import parallelize
@parallelize(max_workers=4)
def square(x):
return x * x
results = list(square(range(5))) # [0, 1, 4, 9, 16]
Databricks example
from yggdrasil.databricks.workspaces import Workspace
from yggdrasil.databricks.sql import SQLEngine
ws = Workspace(host="https://<workspace-url>", token="<token>")
engine = SQLEngine(workspace=ws)
stmt = engine.execute("SELECT 1 AS value")
result = stmt.wait(engine)
tbl = result.arrow_table()
print(tbl.to_pandas())
Configuration
MSALAuthandMSALSessionpull Azure credentials from environment variables such asAZURE_CLIENT_ID,AZURE_CLIENT_SECRET,AZURE_TENANT_ID, andAZURE_SCOPES.- Databricks helpers accept host/token or workspace configuration arguments; see
yggdrasil.databricks.workspaces.Workspacefor details. - Casting utilities accept
CastOptionsfor defaults and Arrow metadata when converting.
Project structure
yggdrasil/dataclasses–yggdataclassdecorator with safe init/from/to helpers and Arrow schema support.yggdrasil/types– Conversion registry (convert,register_converter), Arrow type inference, and default value helpers.yggdrasil/libs– Optional bridges to Polars, Pandas, Spark, and Databricks SDK types.yggdrasil/databricks– Workspace, SQL, jobs, and compute helpers built on the Databricks SDK.yggdrasil/requests– Retry-capable HTTP sessions and Azure MSAL auth helpers.yggdrasil/pyutils– Utility decorators for parallelism and retries.yggdrasil/ser– Serialization helpers and dependency inspection utilities.tests/– Pytest-based tests for the above modules.
Testing
Run the test suite from the python/ directory:
pytest
Contributing
- Fork the repo and create a feature branch.
- Install with
uv pip install -e .[dev]to pull in linting/type-checking tools. - Run
pytest(and optionallyruff,black,mypy) before opening a PR. - Submit a PR describing your changes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ygg-0.1.6.tar.gz.
File metadata
- Download URL: ygg-0.1.6.tar.gz
- Upload date:
- Size: 84.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3ef8d18d12fc20a9e46e6cde25317b44352744bc8ad751df936acba0b8939ef
|
|
| MD5 |
f1c1d2acc2916b33493211e999a2088e
|
|
| BLAKE2b-256 |
38d7410a939fc28cc17b55843134d78573f98e1a70664661ff79d859b415bdbb
|
File details
Details for the file ygg-0.1.6-py3-none-any.whl.
File metadata
- Download URL: ygg-0.1.6-py3-none-any.whl
- Upload date:
- Size: 99.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7b85a2e36a6bab1424282eff40b5cc1f534b287c58af5454c956e50eea06462
|
|
| MD5 |
b4075e5afa9b8dbae7f91d9156001f70
|
|
| BLAKE2b-256 |
07ef9cbd24999056e77c72bf2c780ae62313e2bbb3a8675f42d75815d5c17c15
|