Skip to main content

A Tabular Helper utility library with general-purpose helpers, string normalization, numeric parsing, and date formatting for the tha-* ecosystem.

Project description

tha-utils-helper

CI

A Tabular Helper utility library for the tha-* ecosystem. Includes general-purpose dict/list/type helpers, string normalization and slugification, numeric string parsing, and date format conversion — all with row-level error handling for CSV pipeline use.

Install

pip install tha-utils-helper

Quick start

from tha_utils_helper import ThaDict, ThaList, ThaType, ThaStr, ThaNum, ThaDT

# Structural helpers — work on single values or lists of row dicts
ThaDict.pick({"a": 1, "b": 2, "c": 3}, ["a", "c"])         # {"a": 1, "c": 3}
ThaDict.rename_keys_rows(rows, {"studentUniqueId": "id"})   # rename across all rows

# String normalization
ThaStr.format_str("  HELLO WORLD  ", case="lower")            # "hello world"
ThaStr.slugify("Hello World!")                                 # "hello-world"

# Numeric parsing
ThaNum.format_num("$1,234.56")                                 # 1234.56
ThaNum.format_num("(£500)", cast="int")                        # -500

# Date formatting
ThaDT.format_date("Apr 15, 2024", "%Y-%m-%d")                 # "2024-04-15"

# Row-level processing with on_error and skip_statuses
formatter = ThaNum()
rows = formatter.format_num_rows(rows, column="Budget", cast="float", round_to=2)

API

ThaDict

Static methods for single dicts and lists of row dicts.

ThaDict.pick(d, keys)               # new dict with only the specified keys
ThaDict.omit(d, keys)               # new dict with the specified keys removed
ThaDict.safe_get(d, *keys)          # traverse nested dicts safely — returns None on miss
ThaDict.rename_keys(d, mapping)     # rename keys; unmapped keys are preserved

ThaDict.pick_rows(rows, keys)       # pick() applied to every row
ThaDict.omit_rows(rows, keys)       # omit() applied to every row
ThaDict.rename_keys_rows(rows, mapping)  # rename_keys() applied to every row
ThaDict.pick({"a": 1, "b": 2, "c": 3}, ["a", "c"])
# {"a": 1, "c": 3}

ThaDict.safe_get({"student": {"id": 42}}, "student", "id")
# 42

ThaDict.rename_keys_rows(rows, {"studentUniqueId": "student_id"})
# [{"student_id": ..., ...}, ...]

ThaList

Static methods for lists.

ThaList.chunk(lst, size)   # split into consecutive chunks of size
ThaList.flatten(lst)       # flatten one level of nesting
ThaList.chunk([1, 2, 3, 4, 5], 2)    # [[1, 2], [3, 4], [5]]
ThaList.flatten([[1, 2], [3, 4]])     # [1, 2, 3, 4]

chunk also works on lists of row dicts directly: ThaList.chunk(rows, 100).


ThaType

Static methods for coercing values. Row methods return None on failure (consistent with safe_int / safe_float).

ThaType.normalize_bool(val)                                   # bool or raises ValueError
ThaType.safe_int(val)                                         # int | None
ThaType.safe_float(val)                                       # float | None

ThaType.normalize_bool_rows(rows, column, *, out_column=None) # None on failure
ThaType.safe_int_rows(rows, column, *, out_column=None)
ThaType.safe_float_rows(rows, column, *, out_column=None)

normalize_bool recognizes:

Truthy Falsy
True, 1, "true", "yes", "1", "t", "y" False, 0, "false", "no", "0", "f", "n"

String matching is case-insensitive and strips whitespace.

ThaType.normalize_bool("Yes")     # True
ThaType.safe_int("3.14")          # None  (not an integer string)
ThaType.safe_float("abc")         # None

ThaType.safe_int_rows(rows, "count", out_column="count_int")
# adds "count_int" column; original "count" column preserved

ThaStr

String normalization and slugification. format_str and slugify are static methods callable without instantiation. Row methods require an instance and store results in self.rows.

ThaStr.format_str(
    value: str,
    *,
    strip: bool = True,
    case: str | None = None,     # "upper" | "lower" | "title" | None
    replace: dict[str, str] | None = None,
    regex: bool = False,
) -> str
ThaStr.slugify(
    value: str,
    *,
    sep: str = "-",
    prefix: str = "",
    suffix: str = "",
) -> str
runner = ThaStr()

runner.format_str_rows(
    rows,
    column,
    *,
    strip=True,
    case=None,
    replace=None,
    regex=False,
    out_column=None,
    on_error="error",            # "error" | "skip" | "blank"
    skip_statuses=None,          # default: ["error", "warning"]
) -> list[dict]

runner.slugify_rows(
    rows,
    columns,                     # str or list[str] — multiple columns are joined with sep
    out_column,
    *,
    sep="-",
    prefix="",
    suffix="",
    on_error="error",
    skip_statuses=None,
) -> list[dict]
ThaStr.format_str("  HELLO WORLD  ", case="lower")    # "hello world"
ThaStr.slugify("Hello World!")                          # "hello-world"
ThaStr.slugify("café résumé", sep="_")                  # "cafe_resume"

runner = ThaStr()
runner.format_str_rows(rows, "Name", case="lower", out_column="Name Slug")
runner.slugify_rows(rows, ["First", "Last"], out_column="id")

Raises StrError on invalid case or on_error. Unicode is converted to ASCII via NFKD normalization.


ThaNum

Numeric string parsing. format_num is a static method callable without instantiation. format_num_rows requires an instance and stores results in self.rows.

ThaNum.format_num(
    value: str | int | float,
    *,
    strip_currency: bool = True,   # removes $€£¥₹₩₽₺₫฿₱₴
    strip_commas: bool = True,
    round_to: int | None = None,
    cast: str = "float",           # "float" | "int"
) -> float | int
runner = ThaNum()

runner.format_num_rows(
    rows,
    column,
    *,
    strip_currency=True,
    strip_commas=True,
    round_to=None,
    cast="float",
    out_column=None,
    on_error="error",
    skip_statuses=None,
) -> list[dict]
ThaNum.format_num("$1,234.56")          # 1234.56
ThaNum.format_num("(£500)", cast="int") # -500
ThaNum.format_num("€9.99", round_to=1)  # 10.0

Parenthetical negatives ((100)) are converted automatically. Raises NumError on unparseable input, bool input, or invalid cast.


ThaDT

Date format auto-detection and conversion. format_date and now are static methods. format_date_rows requires an instance and stores results in self.rows.

ThaDT.now(fmt="%Y_%m_%d_%H_%M_%S") -> str

ThaDT.format_date(value: str, to_fmt: str) -> str

runner = ThaDT()

runner.format_date_rows(
    rows,
    column,
    to_fmt,
    *,
    out_column=None,
    on_error="error",
    skip_statuses=None,
) -> list[dict]

Auto-detects: ISO 8601 (with/without time, with/without ms/Z), compact ISO (20240415), year-month (2024-04), US MM/DD/YYYY, US MM/DD/YY, MM/DD, long and short month names (April 15, 2024 / Apr 15, 2024).

ThaDT.format_date("Apr 15, 2024", "%Y-%m-%d")   # "2024-04-15"
ThaDT.format_date("04/15/2024", "%m/%d/%y")      # "04/15/24"
ThaDT.now()                                       # "2024_04_15_13_30_00"

Raises DateError on unrecognized formats or invalid on_error.


on_error (all row methods)

Value Behaviour
"error" row status="error", message=..., output column set to ""
"skip" Row returned unchanged
"blank" Output column set to "", row status untouched

skip_statuses

Rows whose "row status" value is in this list are passed through unchanged. Default: ["error", "warning"]. Pass [] to process all rows regardless of status.


Error classes

Class Raised by
UtilsError Base class — catch all tha-utils-helper errors
StrError ThaStr methods
NumError ThaNum methods
DateError ThaDT methods
from tha_utils_helper import StrError, NumError, DateError, UtilsError

Composing with tha-csv-runner

from tha_csv_runner import ThaCSV
from tha_utils_helper import ThaNum, ThaStr, ThaDT

csv = ThaCSV()
csv.read("Load", "input.csv", ["Org BK", "Budget", "Start Date", "Name"])

rows = ThaNum().format_num_rows(csv.rows, column="Budget", cast="float", round_to=2)
rows = ThaDT().format_date_rows(rows, column="Start Date", to_fmt="%Y-%m-%d")
rows = ThaStr().format_str_rows(rows, column="Name", case="lower")

csv.write("Write", "output.csv", rows=rows)

Alternatives

This library is intentionally limited in scope — it exists as a zero-dependency utility layer for the tha-* ecosystem. If you need something more comprehensive, these are the go-to options:

General utilities:

  • toolz — covers most of what's here and much more: chunking, flattening, pick, omit, nested get, and functional composition
  • funcy — functional helpers including pick, omit, chunks, and silent type coercions

String normalization / slugification:

  • python-slugify — full-featured slugification with transliteration support and configurable stop words
  • Unidecode — broad unicode-to-ASCII transliteration

Numeric parsing:

  • babel — locale-aware number parsing that handles locale-specific decimal and grouping separators
  • price-parser — extracts prices and currency from arbitrary text

Date parsing:

  • python-dateutil — flexible date parsing including fuzzy matching; no row-level error handling
  • pendulum — timezone-aware datetime with parsing and formatting

Choose this library when you want all of the above in a single zero-dependency install with consistent row-level error capture that slots into the tha-* pipeline.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tha_utils_helper-0.2.4.tar.gz (39.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tha_utils_helper-0.2.4-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file tha_utils_helper-0.2.4.tar.gz.

File metadata

  • Download URL: tha_utils_helper-0.2.4.tar.gz
  • Upload date:
  • Size: 39.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tha_utils_helper-0.2.4.tar.gz
Algorithm Hash digest
SHA256 46978a4faf1d6bf4bd318509eb90cdf986a4f248bb8b2037415a45bee1e3a15a
MD5 c15665e8c4d195404eee36c356e5a1d3
BLAKE2b-256 1cd31f85877ffa0f43844766f61f21d47d12d96aadbc52902e589a0fdf02cd7c

See more details on using hashes here.

Provenance

The following attestation bundles were made for tha_utils_helper-0.2.4.tar.gz:

Publisher: publish.yml on tha-guy-nate/tha-utils-helper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tha_utils_helper-0.2.4-py3-none-any.whl.

File metadata

File hashes

Hashes for tha_utils_helper-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 fe457cfa53501b85f3ffdbdb613300554a495d63e2d1f6b661edfb5082d09793
MD5 8fcc672a45a4488e387b909de39db4b9
BLAKE2b-256 425f1595e0d9cc78c4c3e77ff85db593bf14a3a9d1a41e2726bfd7b49e2b79c3

See more details on using hashes here.

Provenance

The following attestation bundles were made for tha_utils_helper-0.2.4-py3-none-any.whl:

Publisher: publish.yml on tha-guy-nate/tha-utils-helper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page