Skip to main content

Type-friendly utilities for moving data between Python objects, Arrow, Polars, Pandas, Spark, and Databricks

Project description

Yggdrasil (Python)

Schema-aware utilities for moving data between Python objects, Arrow, Polars, pandas, Spark, and Databricks. Define types once — cast everywhere.

Install

pip install ygg                           # core (Arrow, requests, pyutils)
pip install "ygg[polars]"                # + Polars
pip install "ygg[pandas]"                # + pandas
pip install "ygg[spark]"                 # + PySpark
pip install "ygg[databricks]"            # + Databricks SDK

From source (dev):

cd python/
uv venv .venv && source .venv/bin/activate
uv pip install -e ".[dev]"

Quickstart

Infer Arrow schema from a dataclass

from dataclasses import dataclass
from yggdrasil.dataclasses import dataclass_to_arrow_field

@dataclass
class Order:
    id: int
    amount: float
    country: str | None = None

field = dataclass_to_arrow_field(Order)
print(field.type)           # struct<id: int64, amount: double, country: string>
schema = field.type.to_schema()

Cast any table to an Arrow schema

import pyarrow as pa
from yggdrasil.arrow.cast import cast_arrow_tabular
from yggdrasil.data.cast import CastOptions

target = pa.schema([
    pa.field("id", pa.int64()),
    pa.field("amount", pa.float64()),
])
raw = pa.table({"id": ["1", "2"], "amount": ["10.5", "20.0"]})
out = cast_arrow_tabular(raw, CastOptions(target_field=target))

Retry + parallel

from yggdrasil.pyutils import retry, parallelize

@retry(tries=5, delay=0.5, backoff=2.0)
def fetch(url: str) -> bytes: ...

@parallelize(max_workers=8)
def process(item: str) -> dict:
    return {"result": item.upper()}

results = list(process(["a", "b", "c"]))

Databricks — SQL with typed results

from yggdrasil.databricks.workspaces import Workspace
from yggdrasil.databricks.sql import SQLEngine

ws = Workspace(host="https://<workspace>", token="<pat>").connect()
engine = SQLEngine(catalog_name="main", schema_name="analytics", workspace=ws)

result = engine.execute("SELECT id, amount FROM transactions LIMIT 100")
df = result.to_pandas()
arrow_table = result.to_arrow_table()

Module map

Module Key exports
yggdrasil.arrow arrow_field_from_hint
yggdrasil.arrow.cast cast_arrow_tabular, cast_arrow_array
yggdrasil.data.cast CastOptions, convert, register_converter
yggdrasil.dataclasses dataclass_to_arrow_field
yggdrasil.pandas.cast cast_pandas_dataframe
yggdrasil.polars.cast cast_polars_dataframe, cast_polars_lazyframe
yggdrasil.spark.cast cast_spark_dataframe
yggdrasil.pyutils retry, parallelize
yggdrasil.concurrent JobPoolExecutor, Job
yggdrasil.requests YGGSession
yggdrasil.io BytesIO, Codec, MediaType
yggdrasil.deltalake DeltaTable
yggdrasil.databricks Workspace, SQLEngine, Cluster, NotebookConfig

Docs

Module reference →

Test

cd python/
pytest
ruff check .
mypy

Project details


Release history Release notifications | RSS feed

This version

0.4.8

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ygg-0.4.8.tar.gz (315.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ygg-0.4.8-py3-none-any.whl (364.4 kB view details)

Uploaded Python 3

File details

Details for the file ygg-0.4.8.tar.gz.

File metadata

  • Download URL: ygg-0.4.8.tar.gz
  • Upload date:
  • Size: 315.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ygg-0.4.8.tar.gz
Algorithm Hash digest
SHA256 756dcc6a9ab6881e6f0e8fef33dd106c00c6a78a6ab2c83c2b0378bd9b7cd5e6
MD5 3dba22eb12f06c53d7b3e87bcb916279
BLAKE2b-256 5d4e9011e6b402a2129a45223d69485b3e121362f271f4fbd4540e5cc6ddf920

See more details on using hashes here.

File details

Details for the file ygg-0.4.8-py3-none-any.whl.

File metadata

  • Download URL: ygg-0.4.8-py3-none-any.whl
  • Upload date:
  • Size: 364.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ygg-0.4.8-py3-none-any.whl
Algorithm Hash digest
SHA256 67b45d9158fc1bc5d944cf6f3a0f2dfb2d05fcccd6c12622e6a8b51f31cfb3ed
MD5 b814773f9edfa9053d23e1e868175309
BLAKE2b-256 4543dced78e5cad7e4bfedaf6f4e9f499a244a51a2598b9335732a1b9c9a89d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page