Skip to main content

A Tabular Helper utility library with general-purpose helpers, string normalization, numeric parsing, and date formatting for the tha-* ecosystem.

Project description

tha-utils-helper

CI

A Tabular Helper utility library for the tha-* ecosystem. Includes general-purpose dict/list/type helpers, string normalization and slugification, numeric string parsing, and date format conversion — all with row-level error handling for CSV pipeline use.

Install

pip install tha-utils-helper

Quick start

from tha_utils_helper import DictUtils, ListUtils, TypeUtils, ThaStr, ThaNum, ThaDT

# Structural helpers — work on single values or lists of row dicts
DictUtils.pick({"a": 1, "b": 2, "c": 3}, ["a", "c"])         # {"a": 1, "c": 3}
DictUtils.rename_keys_rows(rows, {"studentUniqueId": "id"})   # rename across all rows

# String normalization
ThaStr.format_str("  HELLO WORLD  ", case="lower")            # "hello world"
ThaStr.slugify("Hello World!")                                 # "hello-world"

# Numeric parsing
ThaNum.format_num("$1,234.56")                                 # 1234.56
ThaNum.format_num("(£500)", cast="int")                        # -500

# Date formatting
ThaDT.format_date("Apr 15, 2024", "%Y-%m-%d")                 # "2024-04-15"

# Row-level processing with on_error and skip_statuses
formatter = ThaNum()
rows = formatter.format_num_rows(rows, column="Budget", cast="float", round_to=2)

API

DictUtils

Static methods for single dicts and lists of row dicts.

DictUtils.pick(d, keys)               # new dict with only the specified keys
DictUtils.omit(d, keys)               # new dict with the specified keys removed
DictUtils.safe_get(d, *keys)          # traverse nested dicts safely — returns None on miss
DictUtils.rename_keys(d, mapping)     # rename keys; unmapped keys are preserved

DictUtils.pick_rows(rows, keys)       # pick() applied to every row
DictUtils.omit_rows(rows, keys)       # omit() applied to every row
DictUtils.rename_keys_rows(rows, mapping)  # rename_keys() applied to every row
DictUtils.pick({"a": 1, "b": 2, "c": 3}, ["a", "c"])
# {"a": 1, "c": 3}

DictUtils.safe_get({"student": {"id": 42}}, "student", "id")
# 42

DictUtils.rename_keys_rows(rows, {"studentUniqueId": "student_id"})
# [{"student_id": ..., ...}, ...]

ListUtils

Static methods for lists.

ListUtils.chunk(lst, size)   # split into consecutive chunks of size
ListUtils.flatten(lst)       # flatten one level of nesting
ListUtils.chunk([1, 2, 3, 4, 5], 2)    # [[1, 2], [3, 4], [5]]
ListUtils.flatten([[1, 2], [3, 4]])     # [1, 2, 3, 4]

chunk also works on lists of row dicts directly: ListUtils.chunk(rows, 100).


TypeUtils

Static methods for coercing values. Row methods return None on failure (consistent with safe_int / safe_float).

TypeUtils.normalize_bool(val)                                   # bool or raises ValueError
TypeUtils.safe_int(val)                                         # int | None
TypeUtils.safe_float(val)                                       # float | None

TypeUtils.normalize_bool_rows(rows, column, *, out_column=None) # None on failure
TypeUtils.safe_int_rows(rows, column, *, out_column=None)
TypeUtils.safe_float_rows(rows, column, *, out_column=None)

normalize_bool recognizes:

Truthy Falsy
True, 1, "true", "yes", "1", "t", "y" False, 0, "false", "no", "0", "f", "n"

String matching is case-insensitive and strips whitespace.

TypeUtils.normalize_bool("Yes")     # True
TypeUtils.safe_int("3.14")          # None  (not an integer string)
TypeUtils.safe_float("abc")         # None

TypeUtils.safe_int_rows(rows, "count", out_column="count_int")
# adds "count_int" column; original "count" column preserved

ThaStr

String normalization and slugification. format_str and slugify are static methods callable without instantiation. Row methods require an instance and store results in self.rows.

ThaStr.format_str(
    value: str,
    *,
    strip: bool = True,
    case: str | None = None,     # "upper" | "lower" | "title" | None
    replace: dict[str, str] | None = None,
    regex: bool = False,
) -> str
ThaStr.slugify(
    value: str,
    *,
    sep: str = "-",
    prefix: str = "",
    suffix: str = "",
) -> str
runner = ThaStr()

runner.format_str_rows(
    rows,
    column,
    *,
    strip=True,
    case=None,
    replace=None,
    regex=False,
    out_column=None,
    on_error="error",            # "error" | "skip" | "blank"
    skip_statuses=None,          # default: ["error", "warning"]
) -> list[dict]

runner.slugify_rows(
    rows,
    columns,                     # str or list[str] — multiple columns are joined with sep
    out_column,
    *,
    sep="-",
    prefix="",
    suffix="",
    on_error="error",
    skip_statuses=None,
) -> list[dict]
ThaStr.format_str("  HELLO WORLD  ", case="lower")    # "hello world"
ThaStr.slugify("Hello World!")                          # "hello-world"
ThaStr.slugify("café résumé", sep="_")                  # "cafe_resume"

runner = ThaStr()
runner.format_str_rows(rows, "Name", case="lower", out_column="Name Slug")
runner.slugify_rows(rows, ["First", "Last"], out_column="id")

Raises StrError on invalid case or on_error. Unicode is converted to ASCII via NFKD normalization.


ThaNum

Numeric string parsing. format_num is a static method callable without instantiation. format_num_rows requires an instance and stores results in self.rows.

ThaNum.format_num(
    value: str | int | float,
    *,
    strip_currency: bool = True,   # removes $€£¥₹₩₽₺₫฿₱₴
    strip_commas: bool = True,
    round_to: int | None = None,
    cast: str = "float",           # "float" | "int"
) -> float | int
runner = ThaNum()

runner.format_num_rows(
    rows,
    column,
    *,
    strip_currency=True,
    strip_commas=True,
    round_to=None,
    cast="float",
    out_column=None,
    on_error="error",
    skip_statuses=None,
) -> list[dict]
ThaNum.format_num("$1,234.56")          # 1234.56
ThaNum.format_num("(£500)", cast="int") # -500
ThaNum.format_num("€9.99", round_to=1)  # 10.0

Parenthetical negatives ((100)) are converted automatically. Raises NumError on unparseable input, bool input, or invalid cast.


ThaDT

Date format auto-detection and conversion. format_date and now are static methods. format_date_rows requires an instance and stores results in self.rows.

ThaDT.now(fmt="%Y_%m_%d_%H_%M_%S") -> str

ThaDT.format_date(value: str, to_fmt: str) -> str

runner = ThaDT()

runner.format_date_rows(
    rows,
    column,
    to_fmt,
    *,
    out_column=None,
    on_error="error",
    skip_statuses=None,
) -> list[dict]

Auto-detects: ISO 8601 (with/without time, with/without ms/Z), compact ISO (20240415), year-month (2024-04), US MM/DD/YYYY, US MM/DD/YY, MM/DD, long and short month names (April 15, 2024 / Apr 15, 2024).

ThaDT.format_date("Apr 15, 2024", "%Y-%m-%d")   # "2024-04-15"
ThaDT.format_date("04/15/2024", "%m/%d/%y")      # "04/15/24"
ThaDT.now()                                       # "2024_04_15_13_30_00"

Raises DateError on unrecognized formats or invalid on_error.


on_error (all row methods)

Value Behaviour
"error" row status="error", message=..., output column set to ""
"skip" Row returned unchanged
"blank" Output column set to "", row status untouched

skip_statuses

Rows whose "row status" value is in this list are passed through unchanged. Default: ["error", "warning"]. Pass [] to process all rows regardless of status.


Error classes

Class Raised by
UtilsError Base class — catch all tha-utils-helper errors
StrError ThaStr methods
NumError ThaNum methods
DateError ThaDT methods
from tha_utils_helper import StrError, NumError, DateError, UtilsError

Composing with tha-csv-runner

from tha_csv_runner import ThaCSV
from tha_utils_helper import ThaNum, ThaStr, ThaDT

csv = ThaCSV()
csv.read("Load", "input.csv", ["Org BK", "Budget", "Start Date", "Name"])

rows = ThaNum().format_num_rows(csv.rows, column="Budget", cast="float", round_to=2)
rows = ThaDT().format_date_rows(rows, column="Start Date", to_fmt="%Y-%m-%d")
rows = ThaStr().format_str_rows(rows, column="Name", case="lower")

csv.write("Write", "output.csv", rows=rows)

Alternatives

This library is intentionally limited in scope — it exists as a zero-dependency utility layer for the tha-* ecosystem. If you need something more comprehensive, these are the go-to options:

General utilities:

  • toolz — covers most of what's here and much more: chunking, flattening, pick, omit, nested get, and functional composition
  • funcy — functional helpers including pick, omit, chunks, and silent type coercions

String normalization / slugification:

  • python-slugify — full-featured slugification with transliteration support and configurable stop words
  • Unidecode — broad unicode-to-ASCII transliteration

Numeric parsing:

  • babel — locale-aware number parsing that handles locale-specific decimal and grouping separators
  • price-parser — extracts prices and currency from arbitrary text

Date parsing:

  • python-dateutil — flexible date parsing including fuzzy matching; no row-level error handling
  • pendulum — timezone-aware datetime with parsing and formatting

Choose this library when you want all of the above in a single zero-dependency install with consistent row-level error capture that slots into the tha-* pipeline.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tha_utils_helper-0.2.0.tar.gz (38.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tha_utils_helper-0.2.0-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file tha_utils_helper-0.2.0.tar.gz.

File metadata

  • Download URL: tha_utils_helper-0.2.0.tar.gz
  • Upload date:
  • Size: 38.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tha_utils_helper-0.2.0.tar.gz
Algorithm Hash digest
SHA256 21cff4e3ae7f4514f8bc7bb4a7c4cac4271d48a0ff1d8d81f168183d305c9037
MD5 8e40a223dc7c1d714e45862555568f89
BLAKE2b-256 6a05c4aa2b8437ff931f9fe340f63e68850c37bb5685e146704d2aeeb4a089fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for tha_utils_helper-0.2.0.tar.gz:

Publisher: publish.yml on tha-guy-nate/tha-utils-helper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tha_utils_helper-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for tha_utils_helper-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cdb59ffbaf403d4de9c93c1b4ddefada91ae8b7a4174cd8985c176399c302ea9
MD5 09f539a9ee25c7fcda097290858c9cc0
BLAKE2b-256 3ec9ced83a77173ba78d4549b0852428ce32400628fe0defbb3daaab35a4e82a

See more details on using hashes here.

Provenance

The following attestation bundles were made for tha_utils_helper-0.2.0-py3-none-any.whl:

Publisher: publish.yml on tha-guy-nate/tha-utils-helper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page