Type-friendly utilities for moving data between Python objects, Arrow, Polars, Pandas, Spark, and Databricks

These details have not been verified by PyPI

Project links

Project description

`ygg` — Yggdrasil for Python

Schema-aware data interchange for Python teams that move data between Python types, Arrow, Polars, pandas, Spark, and Databricks. One conversion registry, one schema contract, optional dependencies.

PyPI: ygg · Import: yggdrasil
Docs: https://platob.github.io/Yggdrasil/
Source: python/src/yggdrasil/

pip install ygg

Why pick this up

Stop hand-writing brittle casting code between app models, dataframes, and warehouse schemas.
Treat Arrow schema as the contract surface so every tool agrees on field names, nullability, and metadata.
Use one conversion registry instead of separate utilities per engine.
Install only what you need beyond the core. Most integrations are optional extras.

Install with the right extras

pip install ygg                   # core: pyarrow + polars + yggrs
pip install "ygg[data]"           # pandas + numpy + sqlglot
pip install "ygg[bigdata]"        # pyspark + delta-spark
pip install "ygg[delta]"          # deltalake
pip install "ygg[databricks]"     # databricks-sdk
pip install "ygg[api]"            # fastapi + uvicorn + pydantic
pip install "ygg[http]"           # urllib3 + xxhash
pip install "ygg[pickle]"         # cloudpickle + dill + zstandard + xxhash + blake3
pip install "ygg[mongo]"          # mongoengine
pip install "ygg[postgres]"       # psycopg + adbc-driver-postgresql
pip install "ygg[kafka]"          # confluent-kafka
pip install "ygg[dev]"            # everything for local development

Editable dev install:

cd python
uv venv .venv && source .venv/bin/activate
uv pip install -e .[dev]

Progressive examples

1. Cast scalars

from yggdrasil.data.cast.registry import convert

convert("42", int)              # 42
convert("3.14", float)          # 3.14
convert("yes", bool)            # True
convert("2024-06-01", "date")   # datetime.date(2024, 6, 1)

2. Dict → typed dataclass

from dataclasses import dataclass
from yggdrasil.data.cast.registry import convert

@dataclass
class User:
    id: int
    email: str
    active: bool = True

convert({"id": "7", "email": "ada@example.com", "active": "false"}, User)
# User(id=7, email='ada@example.com', active=False)

3. Register a custom converter

from decimal import Decimal
from yggdrasil.data.cast.registry import convert, register_converter

@register_converter(str, Decimal)
def _str_to_decimal(value: str, options=None) -> Decimal:
    return Decimal(value.replace(",", "."))

convert("19,95", Decimal)   # Decimal('19.95')

4. Infer Arrow fields from Python type hints

import yggdrasil.arrow as pa
from yggdrasil.arrow import arrow_field_from_hint

pa.schema([
    arrow_field_from_hint(int,                 name="id"),
    arrow_field_from_hint(list[str],           name="tags"),
    arrow_field_from_hint(dict[str, float],    name="metrics"),
])

5. Cast an Arrow table to a target schema

import yggdrasil.arrow as pa
from yggdrasil.arrow.cast import cast_arrow_tabular
from yggdrasil.data.cast.options import CastOptions

raw = pa.table({"id": ["1", "2"], "score": ["9.1", "8.7"]})
target = pa.schema([
    pa.field("id",    pa.int64(),   nullable=False),
    pa.field("score", pa.float64(), nullable=False),
])

out = cast_arrow_tabular(raw, CastOptions(target_field=target, strict_match_names=True))
print(out.schema)

6. Convert across engines (Polars / pandas / Spark)

Always import optional engines through their lib.py guard:

from yggdrasil.polars.lib import polars
from yggdrasil.pandas.lib import pandas

Polars cast:

import yggdrasil.arrow as pa
from yggdrasil.data.cast.options import CastOptions
from yggdrasil.polars.cast import cast_polars_dataframe
from yggdrasil.polars.lib import polars

df = polars.DataFrame({"id": ["1", "2"], "value": ["4.2", "5.1"]})
target = pa.schema([pa.field("id", pa.int64()), pa.field("value", pa.float64())])
out = cast_polars_dataframe(df, CastOptions(target_field=target))

Arrow ↔ Polars round-trip:

from yggdrasil.polars.cast import (
    arrow_table_to_polars_dataframe,
    polars_dataframe_to_arrow_table,
)

pl_df = arrow_table_to_polars_dataframe(arrow_table)
roundtrip = polars_dataframe_to_arrow_table(pl_df)

7. Dataclass → Arrow struct field

from dataclasses import dataclass
from yggdrasil.dataclasses import dataclass_to_arrow_field

@dataclass
class Position:
    symbol: str
    quantity: float

field = dataclass_to_arrow_field(Position)
print(field)

8. HTTP: simple to advanced

from yggdrasil.io.http_ import HTTPSession

http = HTTPSession()
print(http.get("https://httpbin.org/get").json())
print(http.post("https://httpbin.org/post", json={"name": "alice"}).status)

Prepared request + send:

req = http.prepare_request("POST", "https://httpbin.org/post",
                           json={"event": "order_created", "id": 123})
resp = http.send(req)
print(resp.status, resp.json()["json"])

Parallel batch dispatch:

from yggdrasil.io import SendManyConfig

reqs = [http.prepare_request("GET", "https://httpbin.org/get", params={"page": i})
        for i in range(1, 11)]
responses = list(http.send_many(reqs, send_config=SendManyConfig(max_workers=5)))
print([r.status for r in responses])

Tabular response → engine of your choice:

resp = http.get("https://api.example.com/v1/orders?format=arrow")
table  = resp.to_arrow_table()
pdf    = resp.to_pandas()
plf    = resp.to_polars()

9. Buffers and URLs

from yggdrasil.io import BytesIO, URL

with BytesIO() as buf:           # spill-to-disk byte buffer with media detection
    buf.write(b"hello")
    buf.seek(0)
    print(buf.media_type, buf.compression)

u = URL.from_str("https://example.com/a/b?q=1")
print(u.host, u.path)
print(u.with_query_items({"q": 2, "lang": "en"}).to_string())

10. Databricks SQL: read/write across formats

from yggdrasil.databricks import DatabricksClient

c = DatabricksClient(host="https://<workspace>", token="<token>")

c.sql.execute("""
CREATE TABLE IF NOT EXISTS main.default.demo (id BIGINT, name STRING) USING DELTA
""")
c.sql.insert_into("main.default.demo",
                  [{"id": 1, "name": "alice"}, {"id": 2, "name": "bob"}])

stmt = c.sql.execute("SELECT * FROM main.default.demo ORDER BY id")
print(stmt.to_arrow_table())
print(stmt.to_pandas())
print(stmt.to_polars())

DatabricksClient also covers Unity Catalog (c.catalogs["main"]["default"]["orders"]), Compute (c.compute.clusters.all_purpose_cluster(...)), DBFS/Volumes (c.dbfs_path("/Volumes/...").write_text(...)), Secrets (c.secrets["scope/key"] = "..."), IAM, and Genie. See docs/guides/databricks.md.

11. Typed Databricks job widgets

from dataclasses import dataclass
from yggdrasil.databricks.jobs import NotebookConfig

@dataclass
class IngestConfig(NotebookConfig):
    catalog: str = "main"
    schema: str = "ingest"
    table: str = "events"
    dry_run: bool = False

cfg = IngestConfig.from_environment()   # in a job run
# cfg = IngestConfig.init_widgets()     # in a local notebook

12. Retries, parallelism, jobs

from yggdrasil.pyutils import retry, parallelize
from yggdrasil.concurrent import Job, JobPoolExecutor

@retry(tries=3, delay=0.2, backoff=2)
def flaky(x: int) -> int:
    return x

@parallelize(max_workers=4)
def square(x: int) -> int:
    return x * x

list(square(range(6)))   # [0, 1, 4, 9, 16, 25]

# Bounded streaming jobs
jobs = [Job.make(lambda x=x: x * x) for x in range(20)]
with JobPoolExecutor(max_workers=4, max_in_flight=8) as pool:
    for result in pool.as_completed(jobs):
        print(result.value)

13. Reuse `CastOptions.check`

from yggdrasil.data.cast.options import CastOptions

def normalize_options(options=None, *, target_field=None) -> CastOptions:
    return CastOptions.check(options, target_field=target_field, strict_match_names=True)

Modules at a glance

Module	Purpose
`yggdrasil.data`	Cast registry, `CastOptions`, `DataType`, `Field`/`Schema`, `DataTable`, normalized enums
`yggdrasil.arrow`	Arrow type inference, casting helpers (`cast_arrow_tabular`, `cast_arrow_record_batch_reader`)
`yggdrasil.dataclasses`	`dataclass_to_arrow_field`, `WaitingConfig`, `Expiring`, `ExpiringDict`
`yggdrasil.polars` / `yggdrasil.pandas` / `yggdrasil.spark`	Engine bridges (`cast.py`, `lib.py`, `tests.py` TestCase bases)
`yggdrasil.io`	`BytesIO`, `URL`, `SendConfig`/`SendManyConfig`, codecs, media types
`yggdrasil.io.http_`	`HTTPSession` (preferred), `PreparedRequest`, `Response`
`yggdrasil.requests`	Legacy retry-only `YGGSession` + MSAL variant
`yggdrasil.databricks`	`DatabricksClient` + `sql`/`compute`/`workspaces`/`fs`/`iam`/`secrets`/`jobs`/`account`/`ai.genie`
`yggdrasil.fastapi`	FastAPI service powering the Power Query connector
`yggdrasil.pyutils` / `yggdrasil.concurrent`	`retry`, `parallelize`, `Job`, `JobPoolExecutor`
`yggdrasil.pickle` / `blake3` / `xxhash`	Optional serialization + hashing
`yggdrasil.mongo` / `mongoengine`	Mongo helpers
`yggdrasil.fxrates`	FX-rate helpers
`yggdrasil.rs`	Bridge to native `yggrs` kernels (with pure-Python fallback)

For per-module pages, see docs/modules/ and the navigable docs site.

Testing

Tests that touch a dataframe or Arrow object subclass the matching engine TestCase from yggdrasil.<engine>.tests:

from yggdrasil.arrow.tests import ArrowTestCase

class TestX(ArrowTestCase):
    def test_table(self):
        t = self.table({"id": [1, 2]})
        self.assertSchemaEqual(t.schema, self.pa.schema([self.pa.field("id", self.pa.int64())]))

This handles optional-dependency skipping, per-test tmp dirs, Arrow interop, and frame/schema assertions.

pytest                                                   # full suite
pytest tests/test_yggdrasil/test_data/                   # one area
pytest tests/test_yggdrasil/test_data/test_registry.py   # one file
ruff check
black .

pytest-asyncio is in strict mode — async tests must use the explicit marker. The integration marker is skipped unless DATABRICKS_HOST is set.

Documentation locally

cd python
mkdocs serve     # http://127.0.0.1:8000
mkdocs build     # static site → python/site/

The published site is deployed by .github/workflows/docs.yml on every push to main that touches python/docs/**, python/src/**, mkdocs.yml, or the workflow itself.

License

Apache-2.0.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.7.45

May 7, 2026

0.7.42

May 6, 2026

0.7.41

May 6, 2026

0.7.40

May 5, 2026

0.7.39

May 5, 2026

0.7.37

May 5, 2026

0.7.36

May 5, 2026

0.7.35

May 3, 2026

0.7.32

May 3, 2026

0.7.31

May 3, 2026

0.7.30

May 3, 2026

0.7.29

May 3, 2026

0.7.28

May 3, 2026

0.7.27

May 3, 2026

0.7.26

May 3, 2026

0.7.25

May 3, 2026

0.7.24

May 3, 2026

0.7.23

May 3, 2026

0.7.22

May 3, 2026

0.7.21

May 3, 2026

0.7.20

May 3, 2026

0.7.19

May 3, 2026

0.7.18

May 2, 2026

0.7.17

May 2, 2026

0.7.16

May 2, 2026

0.7.14

Apr 30, 2026

0.7.13

Apr 30, 2026

0.7.12

Apr 30, 2026

0.7.11

Apr 30, 2026

0.7.10

Apr 30, 2026

0.7.9

Apr 20, 2026

0.7.8

Apr 19, 2026

0.7.7

Apr 19, 2026

0.7.6

Apr 19, 2026

0.7.5

Apr 19, 2026

0.7.4

Apr 19, 2026

0.7.3

Apr 19, 2026

0.7.2

Apr 19, 2026

0.7.1

Apr 19, 2026

0.7.0

Apr 17, 2026

0.6.21

Apr 9, 2026

0.6.20

Apr 9, 2026

0.6.19

Apr 9, 2026

0.6.18

Apr 9, 2026

0.6.17

Apr 9, 2026

0.6.16

Apr 9, 2026

0.6.15

Apr 8, 2026

0.6.14

Apr 6, 2026

0.6.13

Apr 6, 2026

0.6.12

Apr 6, 2026

0.6.11

Apr 4, 2026

0.6.10

Apr 2, 2026

0.6.9

Apr 2, 2026

0.6.8

Apr 2, 2026

0.6.7

Apr 2, 2026

0.6.6

Apr 2, 2026

0.6.5

Apr 2, 2026

0.6.4

Apr 1, 2026

0.6.3

Apr 1, 2026

0.6.2

Apr 1, 2026

0.6.0

Apr 1, 2026

0.5.50

Apr 2, 2026

0.5.40

Mar 31, 2026

0.5.39

Mar 31, 2026

0.5.38

Mar 31, 2026

0.5.37

Mar 31, 2026

0.5.36

Mar 30, 2026

0.5.35

Mar 30, 2026

0.5.32

Mar 30, 2026

0.5.31

Mar 30, 2026

0.5.30

Mar 30, 2026

0.5.25

Mar 27, 2026

0.5.24

Mar 27, 2026

0.5.23

Mar 27, 2026

0.5.22

Mar 27, 2026

0.5.21

Mar 25, 2026

0.5.20

Mar 20, 2026

0.5.19

Mar 20, 2026

0.5.18

Mar 19, 2026

0.5.17

Mar 19, 2026

0.5.16

Mar 19, 2026

0.5.15

Mar 19, 2026

0.5.14

Mar 19, 2026

0.5.13

Mar 19, 2026

0.5.12

Mar 18, 2026

0.5.11

Mar 18, 2026

0.5.10

Mar 17, 2026

0.5.9

Mar 17, 2026

0.5.8

Mar 17, 2026

0.5.7

Mar 17, 2026

0.5.6

Mar 17, 2026

0.5.5

Mar 17, 2026

0.5.4

Mar 16, 2026

0.5.2

Mar 15, 2026

0.5.1

Mar 15, 2026

0.4.16

Mar 12, 2026

0.4.15

Mar 12, 2026

0.4.14

Mar 12, 2026

0.4.13

Mar 10, 2026

0.4.12

Mar 10, 2026

0.4.9

Mar 8, 2026

0.4.8

Mar 8, 2026

0.4.7

Mar 2, 2026

0.4.6

Mar 2, 2026

0.4.5

Mar 2, 2026

0.4.4

Mar 2, 2026

0.4.3

Mar 2, 2026

0.4.2

Mar 2, 2026

0.4.1

Mar 2, 2026

0.4.0

Mar 1, 2026

0.3.22

Feb 26, 2026

0.3.21

Feb 26, 2026

0.3.20

Feb 26, 2026

0.3.18

Feb 26, 2026

0.3.17

Feb 26, 2026

0.3.16

Feb 25, 2026

0.3.14

Feb 25, 2026

0.3.13

Feb 25, 2026

0.3.12

Feb 25, 2026

0.3.11

Feb 25, 2026

0.3.10

Feb 25, 2026

0.3.9

Feb 24, 2026

0.3.8

Feb 24, 2026

0.3.7

Feb 24, 2026

0.3.6

Feb 24, 2026

0.3.5

Feb 23, 2026

0.3.4

Feb 23, 2026

0.3.3

Feb 23, 2026

0.2.16

Feb 18, 2026

0.2.15

Feb 18, 2026

0.2.14

Feb 18, 2026

0.2.13

Feb 17, 2026

0.2.12

Feb 17, 2026

0.2.11

Feb 17, 2026

0.2.10

Feb 17, 2026

0.2.8

Feb 13, 2026

0.2.7

Feb 13, 2026

0.2.6

Feb 12, 2026

0.2.5

Feb 12, 2026

0.2.4

Feb 12, 2026

0.2.3

Feb 12, 2026

0.2.2

Feb 12, 2026

0.2.1

Feb 12, 2026

0.2.0

Feb 11, 2026

0.1.68

Feb 6, 2026

0.1.67

Feb 5, 2026

0.1.66

Feb 4, 2026

0.1.65

Feb 4, 2026

0.1.64

Feb 4, 2026

0.1.60

Feb 3, 2026

0.1.57

Jan 26, 2026

0.1.56

Jan 22, 2026

0.1.55

Jan 21, 2026

0.1.54

Jan 21, 2026

0.1.53

Jan 20, 2026

0.1.52

Jan 20, 2026

0.1.51

Jan 16, 2026

0.1.50

Jan 16, 2026

0.1.49

Jan 15, 2026

0.1.48

Jan 14, 2026

0.1.47

Jan 14, 2026

0.1.46

Jan 14, 2026

0.1.45

Jan 14, 2026

0.1.44

Jan 13, 2026

0.1.43

Jan 13, 2026

0.1.42

Jan 13, 2026

0.1.41

Jan 13, 2026

0.1.39

Jan 13, 2026

0.1.38

Jan 13, 2026

0.1.37

Jan 13, 2026

0.1.35

Jan 13, 2026

0.1.34

Jan 12, 2026

0.1.33

Jan 12, 2026

0.1.32

Jan 12, 2026

0.1.31

Jan 9, 2026

0.1.30

Jan 6, 2026

0.1.29

Jan 5, 2026

0.1.28

Jan 5, 2026

0.1.27

Dec 29, 2025

0.1.26

Dec 29, 2025

0.1.25

Dec 29, 2025

0.1.24

Dec 22, 2025

0.1.23

Dec 22, 2025

0.1.21

Dec 22, 2025

0.1.20

Dec 19, 2025

0.1.19

Dec 19, 2025

0.1.18

Dec 19, 2025

0.1.17

Dec 19, 2025

0.1.16

Dec 18, 2025

0.1.15

Dec 18, 2025

0.1.14

Dec 15, 2025

0.1.13

Dec 15, 2025

0.1.12

Dec 15, 2025

0.1.11

Dec 15, 2025

0.1.10

Dec 15, 2025

0.1.9

Dec 15, 2025

0.1.8

Dec 15, 2025

0.1.7

Dec 14, 2025

0.1.6

Dec 12, 2025

0.1.5

Dec 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ygg-0.7.45.tar.gz (939.3 kB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ygg-0.7.45-py3-none-any.whl (1.1 MB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file ygg-0.7.45.tar.gz.

File metadata

Download URL: ygg-0.7.45.tar.gz
Upload date: May 7, 2026
Size: 939.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ygg-0.7.45.tar.gz
Algorithm	Hash digest
SHA256	`f997110e843fbe4d1f7b17718252eeb9e4647e9683e101f8cbb58934a4633cf7`
MD5	`ff1e176030f2cff53e11df278b54c91f`
BLAKE2b-256	`8dafd5e63ca359a32d4ef053eb74e6de9272e3518144a4e68577e82b698c8bb8`

See more details on using hashes here.

File details

Details for the file ygg-0.7.45-py3-none-any.whl.

File metadata

Download URL: ygg-0.7.45-py3-none-any.whl
Upload date: May 7, 2026
Size: 1.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ygg-0.7.45-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4a3741e9f52e29e190903515cd9ffd62f4429df9c364e5d60f006f5eb6fce828`
MD5	`8404ac9b8a087bbb2d5f6ba9aaef49af`
BLAKE2b-256	`d9f6ee1510329e7fa638a28cd6fcdc0091e8e3454e15b67339f9703ebb95cc9d`

See more details on using hashes here.

ygg 0.7.45

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ygg — Yggdrasil for Python

Why pick this up

Install with the right extras

Progressive examples

1. Cast scalars

2. Dict → typed dataclass

3. Register a custom converter

4. Infer Arrow fields from Python type hints

5. Cast an Arrow table to a target schema

6. Convert across engines (Polars / pandas / Spark)

7. Dataclass → Arrow struct field

8. HTTP: simple to advanced

9. Buffers and URLs

10. Databricks SQL: read/write across formats

11. Typed Databricks job widgets

12. Retries, parallelism, jobs

13. Reuse CastOptions.check

Modules at a glance

Testing

Documentation locally

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`ygg` — Yggdrasil for Python

13. Reuse `CastOptions.check`