Skip to main content

Generic utility functions for text formatting, string operations, and type conversions.

Project description

dsr-utils

PyPI version Python versions License Changelog

Utility functions and helpers for common data science tasks, including datetime parsing, formatting, tables, and plotting helpers.

Version 1.4.0: This release introduces a comprehensive hashing module for deterministic object and file-level integrity verification. It provides the "Source of Truth" foundation for audit-safe machine learning pipelines.

Features

  • Datetime utilities: Parse and enrich timestamps with vectorized pandas integration.
  • Formatting utilities: Numeric, currency, percentage, and datetime formatters.
  • Table helpers: High-precision layout engine with pagination support.
  • Matplotlib helpers: Headless-friendly bounding box and renderer utilities.
  • String utilities: Recursive case conversion (snake, pascal, camel, etc.).
  • Type utilities: Robust standardization of scalars and collections into flat lists.
  • Hashing Utilities: Generate deterministic fingerprints for pandas DataFrames, NumPy arrays, and large files using memory-efficient SHA-256 and joblib hashing.

Installation

pip install dsr-utils

Usage

General Usage

import pandas as pd
from dsr_utils.datetime import parse_datetime
from dsr_utils.formatting import FloatFormat
from dsr_utils.tables import Table, TableColumn, TableColumnStyle, render_table

# Datetime parsing with Pandas 2.0+ mixed-format support
ts = pd.Timestamp("2025-10-01 12:34:56")
# (Usage of parse_datetime utility here)

# Formatting utilities
fmt = FloatFormat(precision=2)
print(fmt.format_value(1234.567))

# Table helpers (v1.3.0 constructor requirements)
df = pd.DataFrame({"Metric": ["Trips"], "Value": ["1,200"]})
style = TableColumnStyle()
table = Table(
    data=df,
    max_table_height=0.5,
    mid_x=0.5,
    top_y=0.8,
    fontsize=11,
    columns={
        "Metric": TableColumn(detail_style=style, header_style=style),
        "Value": TableColumn(detail_style=style, header_style=style)
    }
)

Data Integrity & Hashing

import pandas as pd
from dsr_utils.hashing import calculate_object_hash, calculate_file_hash
from pathlib import Path

# Generate a deterministic hash for a DataFrame
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
df_hash = calculate_object_hash(df)
print(f"DataFrame Fingerprint: {df_hash}")

# Calculate hash for a raw data file without loading it entirely into memory
# Ideal for large CSVs on memory-constrained systems like a Mac-mini
file_path = Path("data/raw/adult.csv")
file_hash = calculate_file_hash(file_path)
print(f"File Fingerprint: {file_hash}")

Requirements

  • Python >= 3.10
  • numpy >= 2.0.0
  • pandas >= 2.0.0
  • joblib >= 1.4.0
  • matplotlib (required for matplotlib helpers)

License

MIT License - see LICENSE file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsr_utils-1.4.0.tar.gz (51.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dsr_utils-1.4.0-py3-none-any.whl (43.0 kB view details)

Uploaded Python 3

File details

Details for the file dsr_utils-1.4.0.tar.gz.

File metadata

  • Download URL: dsr_utils-1.4.0.tar.gz
  • Upload date:
  • Size: 51.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dsr_utils-1.4.0.tar.gz
Algorithm Hash digest
SHA256 4bb3bbef618e4223e7cadd3bcb8b7751330a66545d45ee124958df9721368bd6
MD5 c47ab06abb4791b29ef294af16ef7420
BLAKE2b-256 f060faa000cf0fecba541954e0690253b758e262746360da07cfe3a03759980b

See more details on using hashes here.

Provenance

The following attestation bundles were made for dsr_utils-1.4.0.tar.gz:

Publisher: python-publish.yml on scottroberts140/dsr-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dsr_utils-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: dsr_utils-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 43.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dsr_utils-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 633974577261f0c5045f56a246509e509b7a490c683aa08919fac93928a2ba79
MD5 2821c25841c49da1e6adffa2f16c8192
BLAKE2b-256 56ffe9a94999e595af8f7b4ad9bd327e60d586a2b6d6f4d360b0e3c182661b0e

See more details on using hashes here.

Provenance

The following attestation bundles were made for dsr_utils-1.4.0-py3-none-any.whl:

Publisher: python-publish.yml on scottroberts140/dsr-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page