Skip to main content

A Tabular Helper utility library with general-purpose helpers, string normalization, numeric parsing, and date formatting for the tha-* ecosystem.

Project description

tha-utils-helper

CI

A Tabular Helper utility library for the tha-* ecosystem. Includes general-purpose dict/list/type helpers, string normalization and slugification, numeric string parsing, and date format conversion — all with row-level error handling for CSV pipeline use.

Install

pip install tha-utils-helper

Quick start

from tha_utils_helper import ThaDict, ThaList, ThaType, ThaStr, ThaNum, ThaDT

# Structural helpers — work on single values or lists of row dicts
ThaDict.pick({"a": 1, "b": 2, "c": 3}, ["a", "c"])         # {"a": 1, "c": 3}
ThaDict.rename_keys_rows(rows, {"studentUniqueId": "id"})   # rename across all rows

# String normalization
ThaStr.format_str("  HELLO WORLD  ", case="lower")            # "hello world"
ThaStr.slugify("Hello World!")                                 # "hello-world"

# Numeric parsing
ThaNum.format_num("$1,234.56")                                 # 1234.56
ThaNum.format_num("(£500)", cast="int")                        # -500

# Date formatting
ThaDT.format_date("Apr 15, 2024", "%Y-%m-%d")                 # "2024-04-15"

# Row-level processing with on_error and skip_statuses
formatter = ThaNum()
rows = formatter.format_num_rows(rows, column="Budget", cast="float", round_to=2)

API

ThaDict

Static methods for single dicts and lists of row dicts.

ThaDict.pick(d, keys)               # new dict with only the specified keys
ThaDict.omit(d, keys)               # new dict with the specified keys removed
ThaDict.safe_get(d, *keys)          # traverse nested dicts safely — returns None on miss
ThaDict.rename_keys(d, mapping)     # rename keys; unmapped keys are preserved

ThaDict.pick_rows(rows, keys)       # pick() applied to every row
ThaDict.omit_rows(rows, keys)       # omit() applied to every row
ThaDict.rename_keys_rows(rows, mapping)  # rename_keys() applied to every row
ThaDict.pick({"a": 1, "b": 2, "c": 3}, ["a", "c"])
# {"a": 1, "c": 3}

ThaDict.safe_get({"student": {"id": 42}}, "student", "id")
# 42

ThaDict.rename_keys_rows(rows, {"studentUniqueId": "student_id"})
# [{"student_id": ..., ...}, ...]

ThaList

Static methods for lists.

ThaList.chunk(lst, size)   # split into consecutive chunks of size
ThaList.flatten(lst)       # flatten one level of nesting
ThaList.chunk([1, 2, 3, 4, 5], 2)    # [[1, 2], [3, 4], [5]]
ThaList.flatten([[1, 2], [3, 4]])     # [1, 2, 3, 4]

chunk also works on lists of row dicts directly: ThaList.chunk(rows, 100).


ThaType

Static methods for coercing values. Row methods return None on failure (consistent with safe_int / safe_float).

ThaType.normalize_bool(val)                                   # bool or raises ValueError
ThaType.safe_int(val)                                         # int | None
ThaType.safe_float(val)                                       # float | None

ThaType.normalize_bool_rows(rows, column, *, out_column=None) # None on failure
ThaType.safe_int_rows(rows, column, *, out_column=None)
ThaType.safe_float_rows(rows, column, *, out_column=None)

normalize_bool recognizes:

Truthy Falsy
True, 1, "true", "yes", "1", "t", "y" False, 0, "false", "no", "0", "f", "n"

String matching is case-insensitive and strips whitespace.

ThaType.normalize_bool("Yes")     # True
ThaType.safe_int("3.14")          # None  (not an integer string)
ThaType.safe_float("abc")         # None

ThaType.safe_int_rows(rows, "count", out_column="count_int")
# adds "count_int" column; original "count" column preserved

ThaStr

String normalization and slugification. format_str and slugify are static methods callable without instantiation. Row methods require an instance and store results in self.rows.

ThaStr.format_str(
    value: str,
    *,
    strip: bool = True,
    case: str | None = None,     # "upper" | "lower" | "title" | None
    replace: dict[str, str] | None = None,
    regex: bool = False,
) -> str
ThaStr.slugify(
    value: str,
    *,
    sep: str = "-",
    prefix: str = "",
    suffix: str = "",
) -> str
runner = ThaStr()

runner.format_str_rows(
    rows,
    column,
    *,
    strip=True,
    case=None,
    replace=None,
    regex=False,
    out_column=None,
    on_error="error",            # "error" | "skip" | "blank"
    skip_statuses=None,          # default: ["error", "warning"]
) -> list[dict]

runner.slugify_rows(
    rows,
    columns,                     # str or list[str] — multiple columns are joined with sep
    out_column,
    *,
    sep="-",
    prefix="",
    suffix="",
    on_error="error",
    skip_statuses=None,
) -> list[dict]
ThaStr.format_str("  HELLO WORLD  ", case="lower")    # "hello world"
ThaStr.slugify("Hello World!")                          # "hello-world"
ThaStr.slugify("café résumé", sep="_")                  # "cafe_resume"

runner = ThaStr()
runner.format_str_rows(rows, "Name", case="lower", out_column="Name Slug")
runner.slugify_rows(rows, ["First", "Last"], out_column="id")

Raises StrError on invalid case or on_error. Unicode is converted to ASCII via NFKD normalization.


ThaNum

Numeric string parsing. format_num is a static method callable without instantiation. format_num_rows requires an instance and stores results in self.rows.

ThaNum.format_num(
    value: str | int | float,
    *,
    strip_currency: bool = True,   # removes $€£¥₹₩₽₺₫฿₱₴
    strip_commas: bool = True,
    round_to: int | None = None,
    cast: str = "float",           # "float" | "int"
) -> float | int
runner = ThaNum()

runner.format_num_rows(
    rows,
    column,
    *,
    strip_currency=True,
    strip_commas=True,
    round_to=None,
    cast="float",
    out_column=None,
    on_error="error",
    skip_statuses=None,
) -> list[dict]
ThaNum.format_num("$1,234.56")          # 1234.56
ThaNum.format_num("(£500)", cast="int") # -500
ThaNum.format_num("€9.99", round_to=1)  # 10.0

Parenthetical negatives ((100)) are converted automatically. Raises NumError on unparseable input, bool input, or invalid cast.


ThaDT

Date format auto-detection and conversion. format_date and now are static methods. format_date_rows requires an instance and stores results in self.rows.

ThaDT.now(fmt="%Y_%m_%d_%H_%M_%S") -> str

ThaDT.format_date(value: str, to_fmt: str) -> str

runner = ThaDT()

runner.format_date_rows(
    rows,
    column,
    to_fmt,
    *,
    out_column=None,
    on_error="error",
    skip_statuses=None,
) -> list[dict]

Auto-detects: ISO 8601 (with/without time, with/without ms/Z), compact ISO (20240415), year-month (2024-04), US MM/DD/YYYY, US MM/DD/YY, MM/DD, long and short month names (April 15, 2024 / Apr 15, 2024).

ThaDT.format_date("Apr 15, 2024", "%Y-%m-%d")   # "2024-04-15"
ThaDT.format_date("04/15/2024", "%m/%d/%y")      # "04/15/24"
ThaDT.now()                                       # "2024_04_15_13_30_00"

Raises DateError on unrecognized formats or invalid on_error.


on_error (all row methods)

Value Behaviour
"error" row status="error", message=..., output column set to ""
"skip" Row returned unchanged
"blank" Output column set to "", row status untouched

skip_statuses

Rows whose "row status" value is in this list are passed through unchanged. Default: ["error", "warning"]. Pass [] to process all rows regardless of status.


Error classes

Class Raised by
UtilsError Base class — catch all tha-utils-helper errors
StrError ThaStr methods
NumError ThaNum methods
DateError ThaDT methods
from tha_utils_helper import StrError, NumError, DateError, UtilsError

Composing with tha-csv-runner

from tha_csv_runner import ThaCSV
from tha_utils_helper import ThaNum, ThaStr, ThaDT

csv = ThaCSV()
csv.read("Load", "input.csv", ["Org BK", "Budget", "Start Date", "Name"])

rows = ThaNum().format_num_rows(csv.rows, column="Budget", cast="float", round_to=2)
rows = ThaDT().format_date_rows(rows, column="Start Date", to_fmt="%Y-%m-%d")
rows = ThaStr().format_str_rows(rows, column="Name", case="lower")

csv.write("Write", "output.csv", rows=rows)

Alternatives

This library is intentionally limited in scope — it exists as a zero-dependency utility layer for the tha-* ecosystem. If you need something more comprehensive, these are the go-to options:

General utilities:

  • toolz — covers most of what's here and much more: chunking, flattening, pick, omit, nested get, and functional composition
  • funcy — functional helpers including pick, omit, chunks, and silent type coercions

String normalization / slugification:

  • python-slugify — full-featured slugification with transliteration support and configurable stop words
  • Unidecode — broad unicode-to-ASCII transliteration

Numeric parsing:

  • babel — locale-aware number parsing that handles locale-specific decimal and grouping separators
  • price-parser — extracts prices and currency from arbitrary text

Date parsing:

  • python-dateutil — flexible date parsing including fuzzy matching; no row-level error handling
  • pendulum — timezone-aware datetime with parsing and formatting

Choose this library when you want all of the above in a single zero-dependency install with consistent row-level error capture that slots into the tha-* pipeline.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tha_utils_helper-0.2.2.tar.gz (38.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tha_utils_helper-0.2.2-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file tha_utils_helper-0.2.2.tar.gz.

File metadata

  • Download URL: tha_utils_helper-0.2.2.tar.gz
  • Upload date:
  • Size: 38.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tha_utils_helper-0.2.2.tar.gz
Algorithm Hash digest
SHA256 91054bb360645971c6ea8d4cbd3a385aed1a9ea0bbf8fa816af8c0240a8b5c42
MD5 798f877a756b06e9954f7d9212ee1e58
BLAKE2b-256 5c566dad1324a33f185fef9df2484266de242969a0f9f7cdfcb448266791a3b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for tha_utils_helper-0.2.2.tar.gz:

Publisher: publish.yml on tha-guy-nate/tha-utils-helper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tha_utils_helper-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for tha_utils_helper-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 875734c6b30bcea6a1c6a8a6ab9557ae5d72d79813e3f8a6530d5d04fa5d9142
MD5 314d78df6acbfbc953afdb1e5567dfc0
BLAKE2b-256 8574adc8e615eff4afde34dda989294ef82f1de735f16b5693e1f761643024dc

See more details on using hashes here.

Provenance

The following attestation bundles were made for tha_utils_helper-0.2.2-py3-none-any.whl:

Publisher: publish.yml on tha-guy-nate/tha-utils-helper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page