Skip to main content

Shared local IO execution layer for DeltaCAT read/write clients.

Project description

deltacat-io-core

deltacat-io-core is the shared local execution layer for DeltaCAT reads and writes.

It is used by both:

  • deltacat-client for direct thin-plan execution
  • deltacat for shared local execution and compatibility wrappers

Naming

  • distribution/package name: deltacat-io-core
  • Python import module: deltacat_io_core

The distribution uses dashes for consistency with deltacat-client. The import module keeps underscores because Python module names cannot contain -.

Scope

deltacat-io-core owns the code that should behave the same regardless of whether the caller is using the thin client or the thick DeltaCAT package.

Today that includes:

  • direct execution of thin Plan objects
  • MOR execution for thin and thick paths
  • local file materialization and manifest building
  • schema alignment and table conversion helpers
  • sort-aware file ordering and manifest handling
  • shared compaction/MOR helper layers and model types
  • format-specific local readers/writers

Non-Goals

deltacat-io-core does not own:

  • server routes or REST/MCP request handling
  • authoritative catalog/storage mutations
  • native Ray job orchestration surfaces
  • public end-user API shape for deltacat or deltacat-client

It is a shared implementation layer, not the top-level user product.

Architecture

The current read architecture is:

  1. The server resolves a thin Plan.
  2. client.catalog.read(plan=...) executes that plan directly through deltacat-io-core.
  3. dc.read_table(plan=...) for thin plans also executes through the same shared path.

There is no longer a runtime bridge back into thick DeltaCAT for thin plan execution. The plan contract is expected to carry the metadata required for direct execution.

The current write architecture is:

  1. The client stages local files or materializes local data through shared helpers.
  2. The authoritative commit still happens through DeltaCAT server/native boundaries.
  3. Shared write-preparation and manifest logic lives in deltacat-io-core.

Installation

Base install:

uv pip install deltacat-io-core

Optional extras:

  • deltacat-io-core[io] for local file readers/writers (pyarrow, fastavro)
  • deltacat-io-core[pandas] for Pandas conversions
  • deltacat-io-core[polars] for Polars conversions and lazy scan helpers
  • deltacat-io-core[daft] for Daft conversions and lazy scan helpers
  • deltacat-io-core[lance] for Lance dataset support
  • deltacat-io-core[all] for the full local IO stack

Read Capabilities

The shared read executor currently handles:

  • schema-table reads
  • schemaless manifest-table reads
  • MOR reads
  • direct pyarrow, pandas, polars, numpy, daft, and ray_dataset outputs where supported
  • lazy pyarrow_parquet
  • lazy lance

It also enforces direct validation for unsupported combinations, for example:

  • schemaless + pyarrow_parquet
  • schemaless + lance
  • mixed-content lazy plans for format-specific readers
  • unknown content types in the shared path

Polars / Daft Capability Matrix

The shared executor applies the same capability decision in thin execute_read_plan(...) execution and in thick reads that delegate into that shared path.

Engine Content v1 behavior
Polars Parquet Lazy scan via pl.scan_parquet(...) when the existing local preconditions hold
Polars Lance Explicit eager fallback; no reader-level Lance row-filter pushdown
Polars PackDS Same as Lance; PackDS plans stay on the explicit eager Lance fallback
Daft Parquet Lazy scan via shared build_daft_lazy_scan(...) when the group is local/shared-eligible
Daft Lance Lazy only for a single dataset on the shared local path; multi-dataset falls back eagerly
Daft PackDS Same as Lance under PackDS v5: a single pruned episode dataset can use native lazy Lance scanning; multi-episode plans fall back eagerly

Notes:

  • Mixed-schema lazy eligibility on the shared path requires per-file schema_id lookups plus top-level schema information with resolvable field types, whether that comes from schema_serialized or a typed top-level schema summary.
  • On the shared Daft path, non-identity Parquet content encodings (for example .parquet.gz) stay on the eager PyArrow path.
  • When the process is pinned to DAFT_RUNNER=ray, the shared local Daft lazy path declines and falls back to the eager shared path instead of spawning a Ray-backed local lazy scan.

Write Capabilities

The shared write layer currently covers:

  • write input normalization
  • local data materialization
  • manifest construction for existing files and datasets
  • schema/read compatibility helpers
  • standard catalog write orchestration slices

Authoritative catalog mutation, commit, retention, and compaction boundaries still remain on the native/server side where they belong.

Relationship To Other Packages

Use deltacat-client when you want the public thin client.

Use deltacat when you want the thick/native package.

Use deltacat-io-core directly only if you are intentionally building against the shared execution layer itself.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deltacat_io_core-0.1.14a0.tar.gz (207.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deltacat_io_core-0.1.14a0-py3-none-any.whl (245.4 kB view details)

Uploaded Python 3

File details

Details for the file deltacat_io_core-0.1.14a0.tar.gz.

File metadata

  • Download URL: deltacat_io_core-0.1.14a0.tar.gz
  • Upload date:
  • Size: 207.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for deltacat_io_core-0.1.14a0.tar.gz
Algorithm Hash digest
SHA256 46493aa0831fc2a5e7da32f467b5a5b5de3d5c941a5f433bdb315f940e10740f
MD5 4b3ead1b254fcbf0aafc641b4afe980c
BLAKE2b-256 2b8d28eb6480f62503f3f3bb26166f69923256b55e967650f5d059e90987aaaf

See more details on using hashes here.

File details

Details for the file deltacat_io_core-0.1.14a0-py3-none-any.whl.

File metadata

  • Download URL: deltacat_io_core-0.1.14a0-py3-none-any.whl
  • Upload date:
  • Size: 245.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for deltacat_io_core-0.1.14a0-py3-none-any.whl
Algorithm Hash digest
SHA256 8d4046979fb77506cfc60a1182c978dd353b633bf810e0cb507e3f35ec84a88b
MD5 47f4bb6897b0a8bf62cce57346234eea
BLAKE2b-256 f424169384a2d775c4f6fc3cd702a00d2c65dc3003ca0279ed44e87808490e93

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page