Generic utility functions for text formatting, string operations, and type conversions.
Project description
dsr-utils
Utility functions and helpers for common data science tasks, including datetime parsing, formatting, tables, and plotting helpers.
Version 1.6.0: Enhanced the reflection module with a manual bypass mode (valid_params) to support strict parameter filtering for functions utilizing **kwargs passthrough.
Features
- Datetime utilities: Parse and enrich timestamps with vectorized pandas integration.
- Formatting utilities: Numeric, currency, percentage, and datetime formatters.
- Table helpers: High-precision layout engine with pagination support.
- Matplotlib helpers: Headless-friendly bounding box and renderer utilities.
- String utilities: Recursive case conversion (snake, pascal, camel, etc.).
- Type utilities: Robust standardization of scalars and collections into flat lists.
- Hashing Utilities: Generate deterministic fingerprints for pandas DataFrames, NumPy arrays, and large files using memory-efficient SHA-256 and joblib hashing.
- Reflection Utilities: Programmatically inspect function signatures and safely execute callables by filtering incompatible keyword arguments.
Installation
pip install dsr-utils
Usage
General Usage
import pandas as pd
from dsr_utils.datetime import parse_datetime
from dsr_utils.formatting import FloatFormat
from dsr_utils.tables import Table, TableColumn, TableColumnStyle, render_table
# Datetime parsing with Pandas 2.0+ mixed-format support
ts = pd.Timestamp("2025-10-01 12:34:56")
# (Usage of parse_datetime utility here)
# Formatting utilities
fmt = FloatFormat(precision=2)
print(fmt.format_value(1234.567))
# Table helpers (v1.3.0 constructor requirements)
df = pd.DataFrame({"Metric": ["Trips"], "Value": ["1,200"]})
style = TableColumnStyle()
table = Table(
data=df,
max_table_height=0.5,
mid_x=0.5,
top_y=0.8,
fontsize=11,
columns={
"Metric": TableColumn(detail_style=style, header_style=style),
"Value": TableColumn(detail_style=style, header_style=style)
}
)
Data Integrity & Hashing
import pandas as pd
from dsr_utils.hashing import calculate_object_hash, calculate_file_hash
from pathlib import Path
# Generate a deterministic hash for a DataFrame
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
df_hash = calculate_object_hash(df)
print(f"DataFrame Fingerprint: {df_hash}")
# Calculate hash for a raw data file without loading it entirely into memory
# Ideal for large CSVs on memory-constrained systems like a Mac-mini
file_path = Path("data/raw/adult.csv")
file_hash = calculate_file_hash(file_path)
print(f"File Fingerprint: {file_hash}")
# The same helper also supports cloud-backed URIs and paths handled by cloudpathlib
cloud_file_hash = calculate_file_hash("s3://my-bucket/data/raw/adult.csv")
print(f"Cloud File Fingerprint: {cloud_file_hash}")
Dynamic Function Execution
from dsr_utils.reflection import safe_call
def process_data(data, mode="fast", verbose=False):
return f"Processing {data} in {mode} mode"
# A dictionary containing both valid and invalid parameters
raw_config = {
"mode": "thorough",
"verbose": True,
"unsupported_param": "ignore_me"
}
# safe_call filters the config and returns the result + rejected keys
result, rejected = safe_call(process_data, raw_config, data="MyDataset")
print(result) # Output: Processing MyDataset in thorough mode
print(rejected.keys()) # Output: dict_keys(['unsupported_param'])
Advanced Reflection: Manual Filtering
For functions that use **kwargs in their signature (like json.load or pd.read_parquet), standard reflection cannot identify invalid parameters. In these cases, you can provide an explicit set of valid_params to bypass reflection and enforce strict filtering.
from dsr_utils.reflection import safe_call
# Example 1: pd.read_parquet has **kwargs, so we provide a strict set
PARQUET_READ_PARAMS = {"path", "engine", "columns", "storage_options"}
raw_config = {
"columns": ["id", "value"],
"fake_param": "invalid"
}
# safe_call uses valid_params as the ground truth instead of inspection
result, rejected = safe_call(
pd.read_parquet,
raw_config,
valid_params=PARQUET_READ_PARAMS,
path="data.parquet"
)
print(rejected) # Output: {'fake_param': 'invalid'}
# Example 2:
# 'mode' is in valid_params, but 'fixed_kwargs' (mode="safe") will take priority.
# The value "thorough" will be moved to the rejected dictionary.
result, rejected = safe_call(
process_data,
raw_config,
valid_params={"mode", "verbose"},
data="MyDataset",
mode="safe"
)
Note on Conflict Resolution: If a parameter in your config dictionary conflicts with a value passed via **fixed_kwargs, the value in fixed_kwargs takes precedence, and the original value is moved to the rejected dictionary for visibility.
Requirements
- Python >= 3.10
- numpy >= 2.0.0
- pandas >= 2.0.0
- joblib >= 1.4.0
- matplotlib (required for matplotlib helpers)
License
MIT License - see LICENSE file for details
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dsr_utils-1.7.0.tar.gz.
File metadata
- Download URL: dsr_utils-1.7.0.tar.gz
- Upload date:
- Size: 56.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fc073a89e8005299f90e7f495d286612671f6f6600fbb100a24f60dd95a580f
|
|
| MD5 |
0a068609003ec4666259077f2dff9bf1
|
|
| BLAKE2b-256 |
1f2cd088aaa983995a9253c19dd3a0ba9f561faf47cf7a46bd3ef8afcf5be31b
|
Provenance
The following attestation bundles were made for dsr_utils-1.7.0.tar.gz:
Publisher:
python-publish.yml on scottroberts140/dsr-utils
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dsr_utils-1.7.0.tar.gz -
Subject digest:
1fc073a89e8005299f90e7f495d286612671f6f6600fbb100a24f60dd95a580f - Sigstore transparency entry: 1343443183
- Sigstore integration time:
-
Permalink:
scottroberts140/dsr-utils@4c0ff0f1de7af72b93e62adbf50d0415ff7955d0 -
Branch / Tag:
refs/tags/v1.7.0 - Owner: https://github.com/scottroberts140
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@4c0ff0f1de7af72b93e62adbf50d0415ff7955d0 -
Trigger Event:
release
-
Statement type:
File details
Details for the file dsr_utils-1.7.0-py3-none-any.whl.
File metadata
- Download URL: dsr_utils-1.7.0-py3-none-any.whl
- Upload date:
- Size: 45.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1cc68052246e5d33194d9eae538dbc9eca975aa922deb7651ebf2360e0d1cc99
|
|
| MD5 |
79e828a80a76d229a8803f4c0f4c56f8
|
|
| BLAKE2b-256 |
c05eceff55c56f161c1113021466fcef6efbfd84c77c6a37a8135ecea0f39726
|
Provenance
The following attestation bundles were made for dsr_utils-1.7.0-py3-none-any.whl:
Publisher:
python-publish.yml on scottroberts140/dsr-utils
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dsr_utils-1.7.0-py3-none-any.whl -
Subject digest:
1cc68052246e5d33194d9eae538dbc9eca975aa922deb7651ebf2360e0d1cc99 - Sigstore transparency entry: 1343443204
- Sigstore integration time:
-
Permalink:
scottroberts140/dsr-utils@4c0ff0f1de7af72b93e62adbf50d0415ff7955d0 -
Branch / Tag:
refs/tags/v1.7.0 - Owner: https://github.com/scottroberts140
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@4c0ff0f1de7af72b93e62adbf50d0415ff7955d0 -
Trigger Event:
release
-
Statement type: