Skip to main content

A Tabular Helper utility library with general-purpose helpers, string normalization, numeric parsing, and date formatting for the tha-* ecosystem.

Project description

tha-utils-helper

CI

A Tabular Helper utility library for the tha-* ecosystem. Includes general-purpose dict/list/type helpers, string normalization and slugification, numeric string parsing, and date format conversion — all with row-level error handling for CSV pipeline use.

Install

pip install tha-utils-helper

Quick start

from tha_utils_helper import ThaDict, ThaList, ThaType, ThaStr, ThaNum, ThaDT

# Structural helpers — work on single values or lists of row dicts
ThaDict.pick({"a": 1, "b": 2, "c": 3}, ["a", "c"])         # {"a": 1, "c": 3}
ThaDict.rename_keys_rows(rows, {"studentUniqueId": "id"})   # rename across all rows

# String normalization
ThaStr.format_str("  HELLO WORLD  ", case="lower")            # "hello world"
ThaStr.slugify("Hello World!")                                 # "hello-world"

# Numeric parsing
ThaNum.format_num("$1,234.56")                                 # 1234.56
ThaNum.format_num("(£500)", cast="int")                        # -500

# Date formatting
ThaDT.format_date("Apr 15, 2024", "%Y-%m-%d")                 # "2024-04-15"

# Row-level processing with on_error and skip_statuses
formatter = ThaNum()
rows = formatter.format_num_rows(rows, column="Budget", cast="float", round_to=2)

API

ThaDict

Static methods for single dicts and lists of row dicts.

ThaDict.pick(d, keys)               # new dict with only the specified keys
ThaDict.omit(d, keys)               # new dict with the specified keys removed
ThaDict.safe_get(d, *keys)          # traverse nested dicts safely — returns None on miss
ThaDict.rename_keys(d, mapping)     # rename keys; unmapped keys are preserved

ThaDict.pick_rows(rows, keys)       # pick() applied to every row
ThaDict.omit_rows(rows, keys)       # omit() applied to every row
ThaDict.rename_keys_rows(rows, mapping)  # rename_keys() applied to every row
ThaDict.pick({"a": 1, "b": 2, "c": 3}, ["a", "c"])
# {"a": 1, "c": 3}

ThaDict.safe_get({"student": {"id": 42}}, "student", "id")
# 42

ThaDict.rename_keys_rows(rows, {"studentUniqueId": "student_id"})
# [{"student_id": ..., ...}, ...]

ThaList

Static methods for lists.

ThaList.chunk(lst, size)   # split into consecutive chunks of size
ThaList.flatten(lst)       # flatten one level of nesting
ThaList.chunk([1, 2, 3, 4, 5], 2)    # [[1, 2], [3, 4], [5]]
ThaList.flatten([[1, 2], [3, 4]])     # [1, 2, 3, 4]

chunk also works on lists of row dicts directly: ThaList.chunk(rows, 100).


ThaType

Static methods for coercing values. Row methods return None on failure (consistent with safe_int / safe_float).

ThaType.normalize_bool(val)                                   # bool or raises ValueError
ThaType.safe_int(val)                                         # int | None
ThaType.safe_float(val)                                       # float | None

ThaType.normalize_bool_rows(rows, column, *, out_column=None) # None on failure
ThaType.safe_int_rows(rows, column, *, out_column=None)
ThaType.safe_float_rows(rows, column, *, out_column=None)

normalize_bool recognizes:

Truthy Falsy
True, 1, "true", "yes", "1", "t", "y" False, 0, "false", "no", "0", "f", "n"

String matching is case-insensitive and strips whitespace.

ThaType.normalize_bool("Yes")     # True
ThaType.safe_int("3.14")          # None  (not an integer string)
ThaType.safe_float("abc")         # None

ThaType.safe_int_rows(rows, "count", out_column="count_int")
# adds "count_int" column; original "count" column preserved

ThaStr

String normalization and slugification. format_str and slugify are static methods callable without instantiation. Row methods require an instance and store results in self.rows.

ThaStr.format_str(
    value: str,
    *,
    strip: bool = True,
    case: str | None = None,     # "upper" | "lower" | "title" | None
    replace: dict[str, str] | None = None,
    regex: bool = False,
) -> str
ThaStr.slugify(
    value: str,
    *,
    sep: str = "-",
    prefix: str = "",
    suffix: str = "",
) -> str
runner = ThaStr()

runner.format_str_rows(
    rows,
    column,
    *,
    strip=True,
    case=None,
    replace=None,
    regex=False,
    out_column=None,
    on_error="error",            # "error" | "skip" | "blank"
    skip_statuses=None,          # default: ["error", "warning"]
) -> list[dict]

runner.slugify_rows(
    rows,
    columns,                     # str or list[str] — multiple columns are joined with sep
    out_column,
    *,
    sep="-",
    prefix="",
    suffix="",
    on_error="error",
    skip_statuses=None,
) -> list[dict]
ThaStr.format_str("  HELLO WORLD  ", case="lower")    # "hello world"
ThaStr.slugify("Hello World!")                          # "hello-world"
ThaStr.slugify("café résumé", sep="_")                  # "cafe_resume"

runner = ThaStr()
runner.format_str_rows(rows, "Name", case="lower", out_column="Name Slug")
runner.slugify_rows(rows, ["First", "Last"], out_column="id")

Raises StrError on invalid case or on_error. Unicode is converted to ASCII via NFKD normalization.


ThaNum

Numeric string parsing. format_num is a static method callable without instantiation. format_num_rows requires an instance and stores results in self.rows.

ThaNum.format_num(
    value: str | int | float,
    *,
    strip_currency: bool = True,   # removes $€£¥₹₩₽₺₫฿₱₴
    strip_commas: bool = True,
    round_to: int | None = None,
    cast: str = "float",           # "float" | "int"
) -> float | int
runner = ThaNum()

runner.format_num_rows(
    rows,
    column,
    *,
    strip_currency=True,
    strip_commas=True,
    round_to=None,
    cast="float",
    out_column=None,
    on_error="error",
    skip_statuses=None,
) -> list[dict]
ThaNum.format_num("$1,234.56")          # 1234.56
ThaNum.format_num("(£500)", cast="int") # -500
ThaNum.format_num("€9.99", round_to=1)  # 10.0

Parenthetical negatives ((100)) are converted automatically. Raises NumError on unparseable input, bool input, or invalid cast.


ThaDT

Date format auto-detection and conversion. format_date and now are static methods. format_date_rows requires an instance and stores results in self.rows.

ThaDT.now(fmt="%Y_%m_%d_%H_%M_%S") -> str

ThaDT.format_date(value: str, to_fmt: str) -> str

runner = ThaDT()

runner.format_date_rows(
    rows,
    column,
    to_fmt,
    *,
    out_column=None,
    on_error="error",
    skip_statuses=None,
) -> list[dict]

Auto-detects: ISO 8601 (with/without time, with/without ms/Z), compact ISO (20240415), year-month (2024-04), US MM/DD/YYYY, US MM/DD/YY, MM/DD, long and short month names (April 15, 2024 / Apr 15, 2024).

ThaDT.format_date("Apr 15, 2024", "%Y-%m-%d")   # "2024-04-15"
ThaDT.format_date("04/15/2024", "%m/%d/%y")      # "04/15/24"
ThaDT.now()                                       # "2024_04_15_13_30_00"

Raises DateError on unrecognized formats or invalid on_error.


on_error (all row methods)

Value Behaviour
"error" row status="error", message=..., output column set to ""
"skip" Row returned unchanged
"blank" Output column set to "", row status untouched

skip_statuses

Rows whose "row status" value is in this list are passed through unchanged. Default: ["error", "warning"]. Pass [] to process all rows regardless of status.


Error classes

Class Raised by
UtilsError Base class — catch all tha-utils-helper errors
StrError ThaStr methods
NumError ThaNum methods
DateError ThaDT methods
from tha_utils_helper import StrError, NumError, DateError, UtilsError

Composing with tha-csv-runner

from tha_csv_runner import ThaCSV
from tha_utils_helper import ThaNum, ThaStr, ThaDT

csv = ThaCSV()
csv.read("Load", "input.csv", ["Org BK", "Budget", "Start Date", "Name"])

rows = ThaNum().format_num_rows(csv.rows, column="Budget", cast="float", round_to=2)
rows = ThaDT().format_date_rows(rows, column="Start Date", to_fmt="%Y-%m-%d")
rows = ThaStr().format_str_rows(rows, column="Name", case="lower")

csv.write("Write", "output.csv", rows=rows)

Alternatives

This library is intentionally limited in scope — it exists as a zero-dependency utility layer for the tha-* ecosystem. If you need something more comprehensive, these are the go-to options:

General utilities:

  • toolz — covers most of what's here and much more: chunking, flattening, pick, omit, nested get, and functional composition
  • funcy — functional helpers including pick, omit, chunks, and silent type coercions

String normalization / slugification:

  • python-slugify — full-featured slugification with transliteration support and configurable stop words
  • Unidecode — broad unicode-to-ASCII transliteration

Numeric parsing:

  • babel — locale-aware number parsing that handles locale-specific decimal and grouping separators
  • price-parser — extracts prices and currency from arbitrary text

Date parsing:

  • python-dateutil — flexible date parsing including fuzzy matching; no row-level error handling
  • pendulum — timezone-aware datetime with parsing and formatting

Choose this library when you want all of the above in a single zero-dependency install with consistent row-level error capture that slots into the tha-* pipeline.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tha_utils_helper-0.2.5.tar.gz (40.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tha_utils_helper-0.2.5-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file tha_utils_helper-0.2.5.tar.gz.

File metadata

  • Download URL: tha_utils_helper-0.2.5.tar.gz
  • Upload date:
  • Size: 40.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tha_utils_helper-0.2.5.tar.gz
Algorithm Hash digest
SHA256 2dc591c4b883d0a6230f40149b70db1ec99eb2564b8803095d49c4e87293ddd1
MD5 cdb31ab4b87d561632cbc662762e86c0
BLAKE2b-256 2f0738afda6bb0fc2e54363ea98efd7db62c28c70ac211e25e5a00d5a270af6a

See more details on using hashes here.

Provenance

The following attestation bundles were made for tha_utils_helper-0.2.5.tar.gz:

Publisher: publish.yml on tha-guy-nate/tha-utils-helper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tha_utils_helper-0.2.5-py3-none-any.whl.

File metadata

File hashes

Hashes for tha_utils_helper-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 a563fcfa6791b5df55aa3b48d5cf04dd1be4969a9d026755ad2172f459811750
MD5 34887b53d4f2c8a686e01f0986cb587c
BLAKE2b-256 56a9b6efb90d5338f59b09ea6c73f6cc42aed5bd5a0628df19134aab5bf2cb3c

See more details on using hashes here.

Provenance

The following attestation bundles were made for tha_utils_helper-0.2.5-py3-none-any.whl:

Publisher: publish.yml on tha-guy-nate/tha-utils-helper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page