Skip to main content

Type-friendly utilities for moving data between Python objects, Arrow, Polars, Pandas, Spark, and Databricks

Project description

Yggdrasil (Python)

Type-friendly utilities for moving data between Python objects, Arrow, Polars, pandas, Spark, and Databricks. The package bundles enhanced dataclasses, casting utilities, and lightweight wrappers around Databricks and HTTP clients so Python/data engineers can focus on schemas instead of plumbing.

When to use this package

Use Yggdrasil when you need to:

  • Convert payloads across dataframe engines without rewriting type logic for each backend.
  • Define dataclasses that auto-coerce inputs, expose defaults, and surface Arrow schemas.
  • Run Databricks SQL jobs or manage clusters with minimal boilerplate.
  • Add resilient retries, concurrency helpers, and dependency guards to data pipelines.

Prerequisites

  • Python 3.10+
  • uv for virtualenv and dependency management.

Optional extras:

  • polars, pandas, pyarrow, and pyspark for engine-specific conversions.
  • databricks-sdk for workspace, SQL, jobs, and compute helpers.
  • msal for Azure AD authentication when using MSALSession.

Installation

From the python/ directory:

uv venv .venv
source .venv/bin/activate
uv pip install -e .[dev]

Extras are grouped by engine:

  • .[polars], .[pandas], .[spark], .[databricks] – install only the integrations you need.
  • .[dev] – adds testing, linting, and typing tools (pytest, ruff, black, mypy).

Quickstart

Define an Arrow-aware dataclass, coerce inputs, and cast across containers:

from yggdrasil import yggdataclass
from yggdrasil.types.cast import convert
from yggdrasil.types import arrow_field_from_hint

@yggdataclass
class User:
    id: int
    email: str
    active: bool = True

user = User.__safe_init__("123", email="alice@example.com")
assert user.id == 123 and user.active is True

payload = {"id": "45", "email": "bob@example.com", "active": "false"}
clean = User.from_dict(payload)
print(clean.to_dict())

field = arrow_field_from_hint(User, name="user")
print(field)  # user: struct<id: int64, email: string, active: bool>

numbers = convert(["1", "2", "3"], list[int])
print(numbers)

Databricks example

Install the databricks extra and run SQL with typed results:

from yggdrasil.databricks.workspaces import Workspace
from yggdrasil.databricks.sql import SQLEngine

ws = Workspace(host="https://<workspace-url>", token="<token>")
engine = SQLEngine(workspace=ws)

stmt = engine.execute("SELECT 1 AS value")
result = stmt.wait(engine)
tbl = result.arrow_table()
print(tbl.to_pandas())

Parallel processing and retries

from yggdrasil.pyutils import parallelize, retry

@parallelize(max_workers=4)
def square(x):
    return x * x

@retry(tries=5, delay=0.2, backoff=2)
def sometimes_fails(value: int) -> int:
    ...

print(list(square(range(5))))

Project layout

  • yggdrasil/dataclassesyggdataclass decorator plus Arrow schema helpers.
  • yggdrasil/types – casting registry (convert, register_converter), Arrow inference, and default generators.
  • yggdrasil/libs – optional bridges to Polars, pandas, Spark, and Databricks SDK types.
  • yggdrasil/databricks – workspace, SQL, jobs, and compute helpers built on the Databricks SDK.
  • yggdrasil/requests – retry-capable HTTP sessions and Azure MSAL auth helpers.
  • yggdrasil/pyutils – concurrency and retry decorators.
  • yggdrasil/ser – serialization helpers and dependency inspection utilities.
  • tests/ – pytest-based coverage for conversions, dataclasses, requests, and platform helpers.

Testing

From python/:

pytest

Optional checks when developing:

ruff check
black .
mypy

Troubleshooting and common pitfalls

  • Missing optional dependency: Install the matching extra (e.g., uv pip install -e .[polars]) or wrap calls with require_polars/require_pyspark from yggdrasil.libs.
  • Schema mismatches: Use arrow_field_from_hint and CastOptions to enforce expected Arrow metadata when casting.
  • Databricks auth: Provide host and token to Workspace. For Azure, ensure environment variables align with your workspace deployment.

Contributing

  1. Fork and branch.
  2. Install with uv pip install -e .[dev].
  3. Run tests and linters.
  4. Submit a PR describing the change and any new examples added to the docs.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ygg-0.1.41.tar.gz (144.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ygg-0.1.41-py3-none-any.whl (166.7 kB view details)

Uploaded Python 3

File details

Details for the file ygg-0.1.41.tar.gz.

File metadata

  • Download URL: ygg-0.1.41.tar.gz
  • Upload date:
  • Size: 144.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ygg-0.1.41.tar.gz
Algorithm Hash digest
SHA256 ccba530aacd90b4877bad88512b16c484b8a275b6a7f7d841eb435d1080459e4
MD5 0d0c5f7a61a61474b587d7370151d4db
BLAKE2b-256 ba92ada02cf1024c4b24a7f74087afcff85709eccccc11c8a28862c94c8ce00f

See more details on using hashes here.

File details

Details for the file ygg-0.1.41-py3-none-any.whl.

File metadata

  • Download URL: ygg-0.1.41-py3-none-any.whl
  • Upload date:
  • Size: 166.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ygg-0.1.41-py3-none-any.whl
Algorithm Hash digest
SHA256 8685032ef9d9aa82b9b7e69480870a48dbef88162f3ca51555d53d2e92cd0db0
MD5 7a35044f1fa55bb95e8d0f2e58083d5e
BLAKE2b-256 833ba579f307f08861aeff65a3db0390bfa9f2ef6f980de582d4bf6b3f6b0d1d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page