Skip to main content

*("yoots")*: utilities augmenting the Python standard library; processes, Pytest, Pandas, Plotly, …

Project description

utz

("yoots"): utilities augmenting the Python standard library; processes, Pytest, Pandas, Plotly, …

Install

pip install utz
  • utz has one dependency, stdlb (wild-card standard library imports).
  • "Extras" provide optional deps (e.g. Pandas, Plotly, …).

Import: from utz import *

Jupyter

I often import utz.* in Jupyter notebooks:

from utz import *

This imports most standard library modules/functions (via stdlb), as well as the utz members below.

Python REPL

You can also import utz.* during Python REPL startup:

cat >~/.pythonrc <<EOF
try:
    from utz import *
    err("Imported utz")
except ImportError:
    err("Couldn't find utz")
EOF
export PYTHONSTARTUP=~/.pythonrc
# Configure for Python REPL in new Bash shells:
echo 'export PYTHONSTARTUP=~/.pythonrc' >> ~/.bashrc

Modules

Here are a few utz modules, in rough descending order of how often I use them:

utz.proc: subprocess wrappers; shell out commands, parse output

from utz.proc import *

# Run a command
run('git', 'commit', '-m', 'message')  # Commit staged changes

# Passing a single string implies `shell=True` (for all functions listed here)
# Return `list[str]` of stdout lines
lines('git log -n5 --format=%h')  # Last 5 commit SHAs

# Verify exactly one line of stdout, return it
line('git log -1 --format=%h')  # Current HEAD commit SHA

# Return stdout as a single string
output('git log -1 --format=%B')  # Current HEAD commit message

# Check whether a command succeeds, suppress output
check('git diff --exit-code --quiet')  # `True` iff there are no uncommitted changes
# Nested arrays are flattened (for all commands above):
check(['git', 'diff', ['--exit-code', '--quiet']])

err("This will be output to stderr")

# Execute a "pipeline" of commands
pipeline(['seq 10', 'head -n5'])  # '1\n2\n3\n4\n5\n'

See also: test_proc.py.

utz.proc.aio: async subprocess wrappers

Async versions of most utz.proc helpers are also available:

from utz.proc.aio import *
import asyncio
from asyncio import gather

async def test():
  _1, _2, _3, nums = await gather(*[
      run('sleep', '1'),
      run('sleep', '2'),
      run('sleep', '3'),
      lines('seq', '10'),
  ])
  return nums

asyncio.run(test())
# ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

utz.collections: collection/list helpers

from utz import *

# Verify a collection has one element, return it:
singleton(["aaa"])               # ✅ "aaa"
singleton({'a': 1})              # ✅ ('a', 1); works on `dict`s`
singleton([("aaa",), ("aaa",)])  # ✅ ("aaa",); dedupes by default (elems must be hashable)
singleton(["aaa", "bbb"])        # ❌ `raise utz.collections.Expected1FoundN("2 elems found: bbb,aaa")`

# `solo`, `one`, and `e1` are aliases for `singleton`:
solo(["aaa"])  # "aaa"
one(["aaa"])   # "aaa"
e1(["aaa"])    # "aaa"

# Filter by a predicate
one([2, 3, 4], lambda n: n % 2)  # 3
one([{'a': 1}, {'b': 2}], lambda o: 'a' in o)  # {'a': 1}

See also: test_collections.py.

utz.env: os.environ wrapper + contextmanager

from utz import env, os

# Temporarily set env vars
with env(FOO='bar'):
    assert os.environ['FOO'] == 'bar'

assert 'FOO' not in os.environ

The env() contextmanager also supports configurable on_conflict and on_exit kwargs, for handling env vars that were patched, then changed while the context was active.

See also: test_env.py.

utz.fn: decorator/function utilities

utz.decos: compose decorators

from utz import decos
from click import option

common_opts = decos(
    option('-n', type=int),
    option('-v', is_flag=True),
)

@common_opts
def subcmd1(n: int, v: bool):
    ...

@common_opts
def subcmd2(n: int, v: bool):
    ...

utz.call: only pass expected kwargs to functions

from utz import call, wraps
def fn1(a, b):
    ...

@wraps(fn1)
def fn2(a, c, **kwargs):
    ...
kwargs = dict(a=11, b='22', c=33, d=44)
call(fn1, **kwargs)  # passes {a, b}, not {c, d}
call(fn2, **kwargs)  # passes {a, b, c}, not {d}

See also: test_fn.py.

utz.jsn: JsonEncoder for datetimes, dataclasses

from utz import dataclass, Encoder, fromtimestamp, json  # Convenience imports from standard library
epoch = fromtimestamp(0)
print(json.dumps({ 'epoch': epoch }, cls=Encoder))
# {"epoch": "1969-12-31 19:00:00"}
print(json.dumps({ 'epoch': epoch }, cls=Encoder("%Y-%m-%d"), indent=2))
# {
#   "epoch": "1969-12-31"
# }

@dataclass
class A:
    n: int

print(json.dumps(A(111), cls=Encoder))
# {"n": 111}

See test_jsn.py for more examples.

utz.context: {async,}contextmanager helpers

  • ctxs: compose contextmanagers
  • actxs: compose asynccontextmanagers
  • with_exit_hook: wrap a contextmanager's __exit__ method in another contextmanager

utz.cli: click helpers

utz.cli provides wrappers around click.option for parsing common option formats:

  • @count: "count" options, including optional value mappings (e.g. -v → "info", -vv → "debug")
  • @multi: parse comma-delimited values (or other delimiter), with optional value-parse callback (e.g. -a1,2 -a3(1,2,3))
  • @num: parse numeric values, including human-readable SI/IEC suffixes (i.e. 10k10_000)
  • @obj: parse dictionaries from multi-value options (e.g. -eFOO=BAR -eBAZ=QUXdict(FOO="BAR", BAZ="QUX"))
  • @incs/@excs: construct an Includes or Excludes object for regex-filtering of string arguments
  • @inc_exc: combination of @incs and @excs; constructs an Includes or Excludes for regex-filtering of strings, from two (mutually-exclusive) options
  • @opt, @arg, @flag: wrappers for click.{option,argument}, option(is_flag=True)

Examples:

# cli.py
from utz.cli import cmd, count, incs, multi, num, obj
from utz import Includes, Literal

@cmd  # Alias for `click.command`
@multi('-a', '--arr', parse=int, help="Comma-separated integers")
@obj('-e', '--env', help='Env vars, in the form `k=v`')
@incs('-i', '--include', 'includes', help="Only print `env` keys that match one of these regexs")
@num('-m', '--max-memory', help='Max memory size (e.g. "100m"')
@count('-v', '--verbosity', values=['warn', 'info', 'debug'], help='0x: "warn", 1x: "info", 2x: "debug"')
def main(
    arr: tuple[int, ...],
    env: dict[str, str],
    includes: Includes,
    max_memory: int,
    verbosity: Literal['warn', 'info', 'debug'],
):
    filtered_env = { k: v for k, v in env.items() if includes(k) }
    print(f"{arr} {filtered_env} {max_memory} {verbosity}")

if __name__ == '__main__':
    main()

Saving the above as cli.py and running will yield:

python cli.py -a1,2 -a3 -eAAA=111 -eBBB=222 -eccc=333 -i[A-Z] -m10k
# (1, 2, 3) {'AAA': '111', 'BBB': '222'} 10000 warn
python cli.py -m 1Gi -v
# () {} 1073741824 info
from utz.cli import arg, cmd, inc_exc, multi
from utz.rgx import Patterns

@cmd
@inc_exc(
    multi('-i', '--include', help="Print arguments iff they match at least one of these regexs; comma-delimited, and can be passed multiple times"),
    multi('-x', '--exclude', help="Print arguments iff they don't match any of these regexs; comma-delimited, and can be passed multiple times"),
)
@arg('vals', nargs=-1)
def main(patterns: Patterns, vals: tuple[str, ...]):
    print(' '.join([ val for val in vals if patterns(val) ]))

if __name__ == '__main__':
    main()

Saving the above as cli.py and running will yield:

python cli.py -i a.,b aa bc cb c a AA B
# aa bc cb
python cli.py -x a.,b aa bc cb c a AA B
# c a AA B

See test_cli for more examples.

utz.mem: memray wrapper

Use memray to profile memory allocations, extract stats, flamegraph HTML, and peak memory use:

from utz.mem import Tracker
from utz import iec
with (tracker := Tracker()):
    nums = list(sorted(range(1_000_000, 0, -1)))

peak_mem = tracker.peak_mem
print(f'Peak memory use: {peak_mem:,} ({iec(peak_mem)})')
# Peak memory use: 48,530,432 (46.3 MiB)

utz.time: Time timer, now/today helpers

Time: minimal timer class

from utz import Time, sleep

time = Time()
time("step 1")
sleep(1)
time("step 2")  # Ends "step 1" timer
sleep(1)
time()  # Ends "step 2" timer
print(f'Step 1 took {time["step 1"]:.1f}s, step 2 took {time["step 2"]:.1f}s.')
# Step 1 took 1.0s, step 2 took 1.0s.

# contextmanager timers can overlap/contain others
with time("run"):    # ≈2s
    time("sleep-1")  # ≈1s
    sleep(1)
    time("sleep-2")  # ≈1s
    sleep(1)

print(f'Run took {time["run"]:.1f}s')
# Run took 1.0s

now, today

now and today are wrappers around datetime.datetime.now that expose convenient functions:

from utz import now, today
now()     # 2024-10-11T15:43:54Z
today()   # 2024-10-11
now().s   # 1728661583
now().ms  # 1728661585952

Use in conjunction with utz.bases codecs for easy timestamp-nonces:

from utz import b62, now
b62(now().s)   # A18Q1l
b62(now().ms)  # dZ3fYdS
b62(now().us)  # G31Cn073v

Sample values for various units and codecs:

unit b62 b64 b90
s A2kw7P +aYIh1 :Kn>H
ds R7FCrj D8oM9b "tn_BH
cs CCp7kK0 /UpIuxG =Fc#jK
ms dj4u83i MFSOKhy #8;HF8g
us G6cozJjWb 385u0dp8B D>$y/9Hr

(generated by time-slug-grid.py)

utz.size: humanize.naturalsize wrapper

iec wraps humanize.naturalsize, printing IEC-formatted sizes by default, to 3 sigfigs:

from utz import iec
iec(2**30 + 2**29 + 2**28 + 2**27)
# '1.88 GiB'

utz.hash_file: hash file contents

from utz import hash_file
hash_file("path/to/file")  # sha256 by default
hash_file("path/to/file", 'md5')

utz.ym: YM (year/month) class

The YM class represents a year/month, e.g. 202401 for January 2024.

from utz import YM
ym = YM(202501)  # Jan '25
assert ym + 1 == YM(202502)  # Add one month
assert YM(202502) - YM(202406) == 8  # subtract months
YM(202401).until(YM(202501))  # 202401, 202402, ..., 202412

# `YM` constructor accepts several representations:
assert all(ym == YM(202401) for ym in [
    YM(202401),
    YM('202401'),
    YM('2024-01'),
    YM(2024, 1),
    YM(y=2024, m=1),
    YM(dict(year=2022, month=12)),
    YM(YM(202401)),
])

utz.cd: "change directory" contextmanagers

from utz import cd, cd_tmpdir

with cd('..'):
    # Inside parent dir
    ...
# Back in original dir

with cd('a/b/c', mk=True):
    # Moved into a/b/c (created it if it didn't exist)
    ...

with cd_tmpdir(dir='.', name='my_tmpdir') as tmpdir:
    # Inside a temporary subdirectory of previous working directory, with basename `my_tmpdir`
    ...

See also test_cd.py.

utz.gzip: deterministic GZip helpers

from utz import deterministic_gzip_open, hash_file
with deterministic_gzip_open('a.gz', 'w') as f:
    f.write('\n'.join(map(str, range(10))))
hash_file('a.gz')  # dfbe03625c539cbc2a2331d806cc48652dd3e1f52fe187ac2f3420dbfb320504

See also: test_gzip.py.

utz.s3: S3 utilities

  • client(): cached boto3 S3 client
  • parse_bkt_key(args: tuple[str, ...]) -> tuple[str, str]: parse bucket and key from s3:// URL or separate arguments
  • get_etag(*args: str, err_ok: bool = False, strip: bool = True) -> str | None: get ETag of S3 object
  • get_etags(*args: str) -> dict[str, str]: get ETags for all objects with the given prefix
  • atomic_edit(...) -> Iterator[str]: context manager for atomically editing S3 objects
from utz import s3, pd

url = 's3://bkt/key.parquet'
# `url`'s ETag is snapshotted on initial read
with s3.atomic_edit(url) as out_path:
    df = pd.read_parquet(url)
    df.sort_index(inplace=True)
    df.to_parquet(out_path)
    # On contextmanager exit, `out_path` is uploaded to `url`, iff
    # `url`'s ETag hasn't changed (no concurrent update has occurred).

utz.plot: Plotly helpers

Helpers for Plotly transformations I make frequently, e.g.:

from utz import plot
import plotly.express as px
fig = px.bar(x=[1, 2, 3], y=[4, 5, 6])
plot(
    fig,
    name='my-plot',  # Filename stem. will save my-plot.png, my-plot.json, optional my-plot.html
    title=['Some Title', 'Some subtitle'],  # Plot title, followed by "subtitle" line(s) (smaller font, just below)
    bg='white', xgrid='#ccc',  # white background, grey x-gridlines
    hoverx=True,  # show x-values on hover
    x="X-axis title",  # x-axis title or configs
    y=dict(title="Y-axis title", zerolines=True),  # y-axis title or configs
    # ...
)

Example usages: hudcostreets/nj-crashes, ryan-williams/arrayloader-benchmarks.

utz.setup: setup.py helper

utz/setup.py provides defaults for various setuptools.setup() params:

  • name: use parent directory name
  • version: parse from git tag (otherwise from git describe --tags)
  • install_requires: read requirements.txt
  • author_{name,email}: infer from last commit
  • long_description: parse README.md (and set long_description_content_type)
  • description: parse first <p> under opening <h1> from README.md
  • license: parse from LICENSE file (MIT and Apache v2 supported)

For an example, see gsmo==0.0.1 (and corresponding release).

This library also "self-hosts" using utz.setup; see pyproject.toml:

[build-system]
requires = ["setuptools", "utz[setup]==0.4.2", "wheel"]
build-backend = "setuptools.build_meta"

and setup.py:

from utz.setup import setup

extras_require = {
    # …
}

# Various fields auto-populated from git, README.md, requirements.txt, …
setup(
    name="utz",
    version="0.8.0",
    extras_require=extras_require,
    url="https://github.com/runsascoded/utz",
    python_requires=">=3.10",
)

The setup helper can be installed via a pip "extra":

pip install utz[setup]

utz.test: dataclass test cases, raises helper

utz.parametrize: pytest.mark.parametrize wrapper, accepts dataclass instances

from utz import parametrize
from dataclasses import dataclass


def fn(f: float, fmt: str) -> str:
    """Example function, to be tested with ``Case``s below."""
    return f"{f:{fmt}}"


@dataclass
class case:
    """Container for a test-case; float, format, and expected output."""
    f: float
    fmt: str
    expected: str

    @property
    def id(self):
        return f"fmt-{self.f}-{self.fmt}"


@parametrize(
    case(1.23, "0.1f", "1.2"),
    case(123.456, "0.1e", "1.2e+02"),
    case(-123.456, ".0f", "-123"),
)
def test_fn(f, fmt, expected):
    """Example test, "parametrized" by several ``Cases``s."""
    assert fn(f, fmt) == expected

test_parametrize.py contains more examples, customizing test "ID"s, adding parameter sweeps, etc.

utz.raises: pytest.raises wrapper, match a regex or multiple strings

utz.tmpdir

from utz import TmpDir, tmp_ensure_dir, TmpPath

#  ``TemporaryDirectory`` wrapper that creates ``dir`` (and parents), if necessary (and removes any dirs it created, on exit)
# Also adds support for specifying exact basename, via ``name`` kwarg.
with TmpDir(dir='nested/subdir', name='basename') as tmpdir:
    ...

# Yields a path with the requested basename, inside a ``TemporaryDirectory``.
# As with ``TmpDir``, ``dir`` (and parents) will be created, if necessary (and removed on exit, leaving the filesystem in the same state it started in)
with TmpPath('basename.txt', dir='nested/subdir') as tmppath:
    ...

# Multiple right-most path components can be specified exactly.
with TmpPath('dir1/dir2/basename.txt', dir='nested/subdir') as tmppath:
    ...

# Used by ``TmpDir``/``TmpPath`` above, creates ``dir`` (and parents), if necessary (and removes any dirs it created, on exit)
with tmp_ensure_dir(dir='nested/subdir'):
    ...

See also: test_tmpdir.py.

utz.docker, utz.bases, etc.

Misc other modules:

  • bases: encode/decode in various bases (62, 64, 90, …)
  • escape: split/join on an arbitrary delimiter, with backslash-escaping; utz.esc escapes a specific character in a string.
  • ctxs: compose contextmanagers
  • o: dict wrapper exposing keys as attrs (e.g.: o({'a':1}).a == 1)
  • docker: DSL for programmatically creating Dockerfiles (and building images from them)
  • tmpdir: make temporary directories with a specific basename
  • ssh: SSH tunnel wrapped in a context manager
  • backoff: exponential-backoff utility
  • git: Git helpers, wrappers around GitPython
  • pnds: pandas imports and helpers

Examples / Users

Some repos that use utz:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

utz-0.20.0.tar.gz (97.2 kB view details)

Uploaded Source

Built Distribution

utz-0.20.0-py3-none-any.whl (82.6 kB view details)

Uploaded Python 3

File details

Details for the file utz-0.20.0.tar.gz.

File metadata

  • Download URL: utz-0.20.0.tar.gz
  • Upload date:
  • Size: 97.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for utz-0.20.0.tar.gz
Algorithm Hash digest
SHA256 9c0af213188ab8937043efefa952e12ff452af5ddd0d9705501073664f2081e7
MD5 1c2fd968b061163ebf41ef9c45ff9162
BLAKE2b-256 254c0a6a20fd6ffda6f3cfaeeae9737760fb81a45c2ba3ec888355e1a1bf58d0

See more details on using hashes here.

File details

Details for the file utz-0.20.0-py3-none-any.whl.

File metadata

  • Download URL: utz-0.20.0-py3-none-any.whl
  • Upload date:
  • Size: 82.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for utz-0.20.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8fc16a3c32f4a0345d2eeea7b4064e646f0bd7f56dc36717fdf6b02de6ca2df1
MD5 bc918404c9f81a9179ebf6b3fce32d62
BLAKE2b-256 100d5cb8e154511d446f46cd347fec3653b66ccb5477711bc848ad9ac3d30099

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page