Skip to main content

Type-friendly utilities for moving data between Python objects, Arrow, Polars, Pandas, Spark, and Databricks

Project description

Yggdrasil (Python)

Schema-aware utilities for moving data between Python objects, Arrow, Polars, pandas, Spark, and Databricks. Define types once — cast everywhere.

Install

pip install ygg                           # core (Arrow, requests, pyutils)
pip install "ygg[polars]"                # + Polars
pip install "ygg[pandas]"                # + pandas
pip install "ygg[spark]"                 # + PySpark
pip install "ygg[databricks]"            # + Databricks SDK

From source (dev):

cd python/
uv venv .venv && source .venv/bin/activate
uv pip install -e ".[dev]"

Quickstart

Infer Arrow schema from a dataclass

from dataclasses import dataclass
from yggdrasil.dataclasses import dataclass_to_arrow_field

@dataclass
class Order:
    id: int
    amount: float
    country: str | None = None

field = dataclass_to_arrow_field(Order)
print(field.type)           # struct<id: int64, amount: double, country: string>
schema = field.type.to_schema()

Cast any table to an Arrow schema

import pyarrow as pa
from yggdrasil.arrow.cast import cast_arrow_tabular
from yggdrasil.data.cast import CastOptions

target = pa.schema([
    pa.field("id", pa.int64()),
    pa.field("amount", pa.float64()),
])
raw = pa.table({"id": ["1", "2"], "amount": ["10.5", "20.0"]})
out = cast_arrow_tabular(raw, CastOptions(target_field=target))

Retry + parallel

from yggdrasil.pyutils import retry, parallelize

@retry(tries=5, delay=0.5, backoff=2.0)
def fetch(url: str) -> bytes: ...

@parallelize(max_workers=8)
def process(item: str) -> dict:
    return {"result": item.upper()}

results = list(process(["a", "b", "c"]))

Databricks — SQL with typed results

from yggdrasil.databricks.workspaces import Workspace
from yggdrasil.databricks.sql import SQLEngine

ws = Workspace(host="https://<workspace>", token="<pat>").connect()
engine = SQLEngine(catalog_name="main", schema_name="analytics", workspace=ws)

result = engine.execute("SELECT id, amount FROM transactions LIMIT 100")
df = result.to_pandas()
arrow_table = result.to_arrow_table()

Module map

Module Key exports
yggdrasil.arrow arrow_field_from_hint
yggdrasil.arrow.cast cast_arrow_tabular, cast_arrow_array
yggdrasil.data.cast CastOptions, convert, register_converter
yggdrasil.dataclasses dataclass_to_arrow_field
yggdrasil.pandas.cast cast_pandas_dataframe
yggdrasil.polars.cast cast_polars_dataframe, cast_polars_lazyframe
yggdrasil.spark.cast cast_spark_dataframe
yggdrasil.pyutils retry, parallelize
yggdrasil.concurrent JobPoolExecutor, Job
yggdrasil.requests YGGSession
yggdrasil.io BytesIO, Codec, MediaType
yggdrasil.deltalake DeltaTable
yggdrasil.databricks Workspace, SQLEngine, Cluster, NotebookConfig

Docs

Module reference →

Test

cd python/
pytest
ruff check .
mypy

Project details


Release history Release notifications | RSS feed

This version

0.4.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ygg-0.4.2.tar.gz (323.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ygg-0.4.2-py3-none-any.whl (371.1 kB view details)

Uploaded Python 3

File details

Details for the file ygg-0.4.2.tar.gz.

File metadata

  • Download URL: ygg-0.4.2.tar.gz
  • Upload date:
  • Size: 323.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ygg-0.4.2.tar.gz
Algorithm Hash digest
SHA256 45723ebbe81cccf5004f31d9870c63e213939b7d9d4947ebced6ef649819fbab
MD5 9d07e34acaf8e9729ccad1e77fa3a646
BLAKE2b-256 5a181889d1b95eb4fea3fd0a34f20f7151f0832a517fd3aa253869d3ae0a8003

See more details on using hashes here.

File details

Details for the file ygg-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: ygg-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 371.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ygg-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 746e42a255f3ac44136e9a7b6ba0c6a20eff46f2906cec12e725e7d17bcba903
MD5 eeb511ac1aca40a0aec7e824dfd2b9b0
BLAKE2b-256 a482314fe464050155805684285109a253de7e7c1645d78972094a3291b6ce19

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page