Skip to main content

Type-friendly utilities for moving data between Python objects, Arrow, Polars, Pandas, Spark, and Databricks

Project description

Yggdrasil (Python)

Schema-aware utilities for moving data between Python objects, Arrow, Polars, pandas, Spark, and Databricks. Define types once — cast everywhere.

Install

pip install ygg                           # core (Arrow, requests, pyutils)
pip install "ygg[polars]"                # + Polars
pip install "ygg[pandas]"                # + pandas
pip install "ygg[spark]"                 # + PySpark
pip install "ygg[databricks]"            # + Databricks SDK

From source (dev):

cd python/
uv venv .venv && source .venv/bin/activate
uv pip install -e ".[dev]"

Quickstart

Infer Arrow schema from a dataclass

from dataclasses import dataclass
from yggdrasil.dataclasses import dataclass_to_arrow_field

@dataclass
class Order:
    id: int
    amount: float
    country: str | None = None

field = dataclass_to_arrow_field(Order)
print(field.type)           # struct<id: int64, amount: double, country: string>
schema = field.type.to_schema()

Cast any table to an Arrow schema

import pyarrow as pa
from yggdrasil.arrow.cast import cast_arrow_tabular
from yggdrasil.data.cast import CastOptions

target = pa.schema([
    pa.field("id", pa.int64()),
    pa.field("amount", pa.float64()),
])
raw = pa.table({"id": ["1", "2"], "amount": ["10.5", "20.0"]})
out = cast_arrow_tabular(raw, CastOptions(target_field=target))

Retry + parallel

from yggdrasil.pyutils import retry, parallelize

@retry(tries=5, delay=0.5, backoff=2.0)
def fetch(url: str) -> bytes: ...

@parallelize(max_workers=8)
def process(item: str) -> dict:
    return {"result": item.upper()}

results = list(process(["a", "b", "c"]))

Databricks — SQL with typed results

from yggdrasil.databricks.workspaces import Workspace
from yggdrasil.databricks.sql import SQLEngine

ws = Workspace(host="https://<workspace>", token="<pat>").connect()
engine = SQLEngine(catalog_name="main", schema_name="analytics", workspace=ws)

result = engine.execute("SELECT id, amount FROM transactions LIMIT 100")
df = result.to_pandas()
arrow_table = result.to_arrow_table()

Module map

Module Key exports
yggdrasil.arrow arrow_field_from_hint
yggdrasil.arrow.cast cast_arrow_tabular, cast_arrow_array
yggdrasil.data.cast CastOptions, convert, register_converter
yggdrasil.dataclasses dataclass_to_arrow_field
yggdrasil.pandas.cast cast_pandas_dataframe
yggdrasil.polars.cast cast_polars_dataframe, cast_polars_lazyframe
yggdrasil.spark.cast cast_spark_dataframe
yggdrasil.pyutils retry, parallelize
yggdrasil.concurrent JobPoolExecutor, Job
yggdrasil.requests YGGSession
yggdrasil.io BytesIO, Codec, MediaType
yggdrasil.deltalake DeltaTable
yggdrasil.databricks Workspace, SQLEngine, Cluster, NotebookConfig

Docs

Module reference →

Test

cd python/
pytest
ruff check .
mypy

Project details


Release history Release notifications | RSS feed

This version

0.4.5

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ygg-0.4.5.tar.gz (323.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ygg-0.4.5-py3-none-any.whl (371.0 kB view details)

Uploaded Python 3

File details

Details for the file ygg-0.4.5.tar.gz.

File metadata

  • Download URL: ygg-0.4.5.tar.gz
  • Upload date:
  • Size: 323.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ygg-0.4.5.tar.gz
Algorithm Hash digest
SHA256 2180b668ad7a7fcac6cf1b2e17fdc568e65b2873647968fe215e2b6c449b2de9
MD5 79f4930ead93218e8a65c2533e1437a7
BLAKE2b-256 a500bef55899115d9a7f768fc4329b1bc663fcb0b4b5abd5482ec42fe7250ef4

See more details on using hashes here.

File details

Details for the file ygg-0.4.5-py3-none-any.whl.

File metadata

  • Download URL: ygg-0.4.5-py3-none-any.whl
  • Upload date:
  • Size: 371.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ygg-0.4.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5173f140d2ec8b6e51ed7ce9fa0f734c4b9ea905b6f8e74b016ca50a40316eb0
MD5 6e3d6dba4fa345c1c89933c976fbcaab
BLAKE2b-256 4b46f05deed21303f6c74cfa44d2b6d4f11abc59f1b7713ff15609d2cdc8669a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page