Small, typed data science helpers for serialization, atomic file writes, and dataframe-oriented record normalization.
Project description
dr-ds
Small, typed data science helpers for serialization, atomic file writes, and dataframe-oriented record normalization.
Install
uv add dr-ds
dr-ds currently targets Python 3.12+.
Included Helpers
dr_ds.atomic_io provides:
dump_json_atomicatomic_write_jsonlatomic_write_parquet_records
dr_ds.serialization provides:
serialize_timestamputc_now_isoto_jsonableconvert_large_intsparse_jsonish
dr_ds.parquet provides:
records_to_parquet_frameparquet_frame_to_records
These helpers are aimed at a common pattern in data workflows:
- start with
list[dict[str, Any]]records or similarly loose Python data - normalize nested containers and plain Python objects into JSON-safe values
- persist those records atomically or adapt them for dataframe/parquet workflows
- recover structured JSON-like columns on read without rebuilding ad hoc parsing logic
Design Goals
- Prefer small, reusable utilities with stable behavior over framework-heavy abstractions.
- Be explicit about lossy or opinionated conversions.
- Keep serialization helpers deterministic so downstream tests and diffs stay readable.
- Make the common data-science path easy: Python dicts in, JSON/parquet-safe data out.
Serialization Contracts
to_jsonable is the main normalization helper for nested Python values.
datetimevalues become UTC ISO 8601 strings.- Mapping keys are stringified.
- Tuples become lists.
- Sets become deterministically ordered lists.
- Plain Python objects are serialized from their public, non-callable attributes.
- Recursive references are replaced with the literal string
"<recursion>". - Values that cannot be meaningfully introspected fall back to
str(value).
convert_large_ints is intentionally narrower:
- it recursively converts integers whose absolute value exceeds
DEFAULT_MAX_INTinto floats - it preserves tuple and set container types
- it is mainly intended to keep dataframe/parquet pipelines practical when very large integers appear in nested payloads
parse_jsonish only parses strings that are valid JSON. Invalid JSON, blank
strings, and non-string values are returned unchanged.
Atomic IO Example
from pathlib import Path
from dr_ds.atomic_io import atomic_write_jsonl, dump_json_atomic
from dr_ds.serialization import to_jsonable
payload = to_jsonable(
{
"metrics": {"loss": 0.42},
"tags": {"baseline", "v1"},
"owner": {"name": "baseline-bot", "id": 7},
}
)
dump_json_atomic(Path("summary.json"), payload)
atomic_write_jsonl(
Path("runs.jsonl"),
[
{"run_id": "run-1", "summary": payload},
{"run_id": "run-2", "summary": {"loss": 0.39}},
],
)
All atomic writers use a sibling temporary file plus os.replace, then fsync
the parent directory so the rename is durably recorded.
Parquet Example
from dr_ds.parquet import parquet_frame_to_records, records_to_parquet_frame
records = [
{
"run_id": "abc123",
"metrics": {"loss": 0.42, "token_count": 2**35},
}
]
frame = records_to_parquet_frame(records, json_columns={"metrics"})
restored = parquet_frame_to_records(frame, json_columns={"metrics"})
records_to_parquet_frame prepares records for dataframe/parquet workflows.
It does not write parquet files directly.
Behavior to rely on:
- columns listed in
json_columnsare normalized throughto_jsonable - large integers nested inside those JSON columns are converted with
convert_large_ints - top-level large integers in non-JSON columns are also softened to floats
parquet_frame_to_recordsrestores JSON columns withparse_jsonishand converts dataframe null-like values in those columns back toNone
Coercion Helpers
coerce_int, coerce_number, and coerce_float are intentionally forgiving.
- invalid inputs return
Noneinstead of raising - booleans are rejected even though Python considers them integers
coerce_numberpreserves integral numeric values asintcoerce_floatis the lossy "give me a float if possible" variant
License
MIT
Development
Run the standard checks before committing:
uv run ruff format .
uv run ruff check .
uv run ty check
uv run pytest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dr_ds-0.1.4.tar.gz.
File metadata
- Download URL: dr_ds-0.1.4.tar.gz
- Upload date:
- Size: 34.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af9eb397710163137318bfeaaa6c3a41b998b93032efe4894bfd49196a15121d
|
|
| MD5 |
cb97c3bf963841a18b512265c1c5d5d8
|
|
| BLAKE2b-256 |
3c6d02f9f0e88d81f507c9d2f5f7c3d6c870cfcfdaa42cdb55c518aa6501949e
|
File details
Details for the file dr_ds-0.1.4-py3-none-any.whl.
File metadata
- Download URL: dr_ds-0.1.4-py3-none-any.whl
- Upload date:
- Size: 9.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0a01680def4f36ce92ff53d753dc2a83c9cfc9d9bba0952af67c83a0dd66d59
|
|
| MD5 |
2e638685330ea60fee4fba4a016375a5
|
|
| BLAKE2b-256 |
72f9a518606344719327fd2e61e8554eafd6aaca34a32d25c7bf2335172d16a4
|