Skip to main content

Type-friendly utilities for moving data between Python objects, Arrow, Polars, Pandas, Spark, and Databricks

Project description

Yggdrasil (Python)

Schema-aware utilities for moving data between Python objects, Arrow, Polars, pandas, Spark, and Databricks. Define types once — cast everywhere.

Install

pip install ygg                           # core (Arrow, requests, pyutils)
pip install "ygg[polars]"                # + Polars
pip install "ygg[pandas]"                # + pandas
pip install "ygg[spark]"                 # + PySpark
pip install "ygg[databricks]"            # + Databricks SDK

From source (dev):

cd python/
uv venv .venv && source .venv/bin/activate
uv pip install -e ".[dev]"

Quickstart

Infer Arrow schema from a dataclass

from dataclasses import dataclass
from yggdrasil.dataclasses import dataclass_to_arrow_field

@dataclass
class Order:
    id: int
    amount: float
    country: str | None = None

field = dataclass_to_arrow_field(Order)
print(field.type)           # struct<id: int64, amount: double, country: string>
schema = field.type.to_schema()

Cast any table to an Arrow schema

import pyarrow as pa
from yggdrasil.arrow.cast import cast_arrow_tabular
from yggdrasil.data.cast import CastOptions

target = pa.schema([
    pa.field("id", pa.int64()),
    pa.field("amount", pa.float64()),
])
raw = pa.table({"id": ["1", "2"], "amount": ["10.5", "20.0"]})
out = cast_arrow_tabular(raw, CastOptions(target_field=target))

Retry + parallel

from yggdrasil.pyutils import retry, parallelize

@retry(tries=5, delay=0.5, backoff=2.0)
def fetch(url: str) -> bytes: ...

@parallelize(max_workers=8)
def process(item: str) -> dict:
    return {"result": item.upper()}

results = list(process(["a", "b", "c"]))

Databricks — SQL with typed results

from yggdrasil.databricks.workspaces import Workspace
from yggdrasil.databricks.sql import SQLEngine

ws = Workspace(host="https://<workspace>", token="<pat>").connect()
engine = SQLEngine(catalog_name="main", schema_name="analytics", workspace=ws)

result = engine.execute("SELECT id, amount FROM transactions LIMIT 100")
df = result.to_pandas()
arrow_table = result.to_arrow_table()

Module map

Module Key exports
yggdrasil.arrow arrow_field_from_hint
yggdrasil.arrow.cast cast_arrow_tabular, cast_arrow_array
yggdrasil.data.cast CastOptions, convert, register_converter
yggdrasil.dataclasses dataclass_to_arrow_field
yggdrasil.pandas.cast cast_pandas_dataframe
yggdrasil.polars.cast cast_polars_dataframe, cast_polars_lazyframe
yggdrasil.spark.cast cast_spark_dataframe
yggdrasil.pyutils retry, parallelize
yggdrasil.concurrent JobPoolExecutor, Job
yggdrasil.requests YGGSession
yggdrasil.io BytesIO, Codec, MediaType
yggdrasil.deltalake DeltaTable
yggdrasil.databricks Workspace, SQLEngine, Cluster, NotebookConfig

Docs

Module reference →

Test

cd python/
pytest
ruff check .
mypy

Project details


Release history Release notifications | RSS feed

This version

0.4.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ygg-0.4.0.tar.gz (318.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ygg-0.4.0-py3-none-any.whl (365.3 kB view details)

Uploaded Python 3

File details

Details for the file ygg-0.4.0.tar.gz.

File metadata

  • Download URL: ygg-0.4.0.tar.gz
  • Upload date:
  • Size: 318.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ygg-0.4.0.tar.gz
Algorithm Hash digest
SHA256 911f7be47e37ba7cd348bd9c454e9cfa9a9bda81c921e33658aac14734e5049e
MD5 cc50c4d07f2d6385158f4cc67331a2a1
BLAKE2b-256 98e8d5b78eaed7d656d381ff1200dc9026c6dda03e43df0e5176291d1a366628

See more details on using hashes here.

File details

Details for the file ygg-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: ygg-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 365.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ygg-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 29f5501b2fcb5f146c8d9f541d0c081107029105bd66007f8ddcfab36be18209
MD5 bf018815632a122d8d9200b94f738708
BLAKE2b-256 3da24bafa704a59fadf4fa44127d0fb6892f99d62a893945278a35958f90d470

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page