Skip to main content

Comprehensive Python data utilities for serialization, inputs, logging, and workflows

Project description

Extended Data

Comprehensive Python data utilities for serialization, configuration inputs, structured logging, file processing, and workflow composition.

The public API lives under one extended_data namespace with three deliberate tiers:

  • Tier 1: pure functions for codecs, string transforms, redaction, matching, type coercion, mapping, sequence, and state utilities.
  • Tier 2: ExtendedData, ExtendedString, ExtendedDict, ExtendedList, ExtendedTuple, and ExtendedSet containers that expose Tier 1 operations as methods. ExtendedData is the common root and polymorphic constructor for the shape-specific containers.
  • Tier 3: data processors that compose the first two tiers for files, inputs, logging, export/import boundaries, and workflows.

External API clients and provider-backed Python sync live in the separate vendor-fabric distribution. Agent workflow orchestration lives in the separate agentic-fabric distribution.

Documentation: extended-data.dev

Install

pip install extended-data

Development and documentation extras are available for contributors:

pip install "extended-data[dev]"
pip install "extended-data[docs]"

Usage

from extended_data import DataFile, DataWorkflow, ExtendedData, ExtendedDict, InputProvider, Logging, decode_file
from extended_data.primitives import decode_json, encode_yaml, number_to_words, redact_sensitive_text

logger = Logging(logger_name="example", enable_console=False, enable_file=False)
inputs = InputProvider(inputs={"SERVICE_NAME": "api"}, from_environment=False)
data = decode_json('{"service": {"name": "api"}}')
payload = ExtendedDict(data).deep_merge({"replicas": 3})
wrapped = ExtendedData(payload).merge({"owner": "platform"})
decoded_file = decode_file('{"service": {"name": "worker"}}', suffix="json")
artifact = DataFile.decode("service:\n  name: api\n", suffix="yaml")
workflow = DataWorkflow.from_value(wrapped).transform("unhump").result()

logger.logged_statement("prepared workflow", json_data=workflow.as_builtin(), log_level="info")

assert inputs.inputs["SERVICE_NAME"] == "api"
assert wrapped.as_builtin()["owner"] == "platform"
assert decoded_file["service"]["name"].upper_first() == "Worker"
assert artifact.metadata["encoding"] == "yaml"
assert number_to_words(42) == "forty-two"
assert redact_sensitive_text("Authorization: Bearer raw_token") == "Authorization: [REDACTED]"
assert "replicas: 3" in encode_yaml(workflow.as_builtin())

The installed CLI exposes the Tier 3 data boundary:

extended-data decode '{"service": {"name": "api"}}' --suffix json
extended-data decode --file config.yaml --output json
extended-data inspect --file config.yaml
extended-data merge config/base.yaml config/dev.yaml --output yaml
extended-data transform --file payload.json --step reconstruct --step unhump

Package Shape

extended_data/
  containers/   Tier 2 ExtendedData root plus String/Dict/List/Tuple/Set containers
  inputs/       InputProvider and decorator-based input injection
  io/           Tier 3 file, import, export, and base64 processors
  logging/      structured lifecycle logging
  primitives/   Tier 1 pure functions and codecs
  workflows/    Tier 3 higher-order workflow composition

Tier 1 primitive names are explicit in this major version and live under extended_data.primitives, not the package root. Use bytes_to_string() for bytes-like coercion and string_to_bool(), string_to_int(), string_to_float(), string_to_path(), string_to_date(), string_to_datetime(), and string_to_time() for scalar string conversion. Use redact_sensitive_text() and redact_sensitive_data() for diagnostic and JSON-like payload redaction. Pass values=[...] when a caller knows specific context values, such as resource IDs, emails, paths, or URLs, must be withheld in addition to common secret fields.

Tier 2 containers inherit from standard Python collection primitives and expose chainable data operations. ExtendedData is the polymorphic constructor for any incoming value: ExtendedData({"service": "api"}) is an ExtendedDict, ExtendedData(["api"]) is an ExtendedList, and ExtendedData("api") is an ExtendedString, while all of them are also isinstance(value, ExtendedData). For example, ExtendedString.decode_json() promotes JSON into extended containers, ExtendedDict.reconstruct_special_types() turns string scalars into booleans/numbers/dates where safe, and ExtendedList.first_non_empty() returns the first meaningful value without lowering the surrounding data boundary.

Tier 3 processors keep structured data moving through explicit boundaries. DataFile reads, decodes, tracks metadata, and exports structured files. DataWorkflow layers reads, merges, transforms, writes, syncs, and provenance into a single result object. InputProvider loads direct inputs and environment data, and Logging provides structured lifecycle logging with stored-message snapshots returned as extended containers.

The old extended_data_types, directed_inputs_class, and lifecyclelogging package names are not shimmed. The removed extended_data.connectors and extended_data.secrets namespaces are also not preserved. Clean-break import failures are intentional so stale migrations are visible.

Local Development

uv sync --all-extras --dev
tox -e lint
tox -e typecheck
tox -e py311,py312,py313,py314
tox -e examples
tox -e docs
tox -e build

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extended_data-8.4.0.tar.gz (166.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

extended_data-8.4.0-py3-none-any.whl (97.9 kB view details)

Uploaded Python 3

File details

Details for the file extended_data-8.4.0.tar.gz.

File metadata

  • Download URL: extended_data-8.4.0.tar.gz
  • Upload date:
  • Size: 166.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for extended_data-8.4.0.tar.gz
Algorithm Hash digest
SHA256 4bba50c5e736d1a3a54abafa0b77cc4e883e652b8cd43f0d256979a1d6bc59da
MD5 c1f8357b87abeb85e942903b84a26cbb
BLAKE2b-256 f9d32469c65841b5764a6eaadb6ec0fd457ccbdd9a9d4f5f9dad296c442c3922

See more details on using hashes here.

File details

Details for the file extended_data-8.4.0-py3-none-any.whl.

File metadata

  • Download URL: extended_data-8.4.0-py3-none-any.whl
  • Upload date:
  • Size: 97.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for extended_data-8.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c2115e68e4d722588261441a834888fa12a0bb415da2b3b1948e570f8cffc4ca
MD5 60c7d4afafcd753658dc71d9ef59e85f
BLAKE2b-256 31ef79cb15244d78504851ce8ff7eca5d5e26dd7d2b04a7f328933e42e18a4cb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page