Skip to main content

Utility functions that are often useful

Project description

pelutils

Various utilities useful for Python projects. Features include

  • Feature-rich logger using Rich for colourful printing
  • Parsing for combining config files and command-line arguments - especially useful for parametric methods
  • Time taking and profiling
  • Easy to use data storage class for easy data saving and loading
  • Table formatting
  • Miscellaneous standalone functions providing various functionalities - see pelutils/__init__.py
  • Data-science submodule with extra utilities for statistics, plotting, and machine learning using PyTorch
  • unique function similar to np.unique but in linear time (currently Linux x86_64 only)

pelutils supports Python 3.7+.

Logging

Easy to use logger which fits common needs.

# Configure logger for the script
log.configure("path/to/save/log.log", "Title of log")

# Start logging
for i in range(70):  # Nice
    log("Execution %i" % i)

# Sections
log.section("New section in the logfile")

# Adjust logging levels
log.warning("Will be logged")
with log.level(Levels.ERROR):  # Only log at ERROR level or above
    log.warning("Will not be logged")

# Error handling
# The zero-division error and stacktrace is logged
with log.log_errors:
    0 / 0
# Entire chained stacktrace is logged
with log.log_errors:
    try:
        0 / 0
    except ZeroDivisionError as e:
        raise ValueError("Denominator must be non-zero") from e

# Disable printing if using tqdm
# Do not do this if the loop may be ended by a break statement!
for elem in log.tqdm(tqdm(range(5))):
    log(elem)  # Will be logged, but not printed

# User input
inp = log.input("WHAT... is your favourite colour? ")

# Log all logs from a function at the same time
# This is especially useful when using multiple threads so logging does not get mixed up
def fun():
    log("Hello there")
    log("General Kenobi!")
with mp.Pool() as p:
    p.map(collect_logs(fun), args)

# Disable printing when using tqdm so as to not print a million progress bars
for i in log.tqdm(tqdm(range(100))):
    log(i)  # i will be logged to logfile but not printed

Time Taking and Profiling

Simple time taker inspired by Matlab Tic, Toc, which also has profiling tooling.

TT.tick()
<some task>
seconds_used = TT.tock()

for i in range(100):
    TT.profile("Repeated code")
    <some task>
    TT.profile("Subtask")
    <some subtask>
    TT.end_profile()
    TT.end_profile()
print(TT)  # Prints a table view of profiled code sections

# Alternative syntax using with statement
with TT.profile("The best task"):
    <some task>

# Profile a loop
# Do not do this if the loop may be ended by a break statement!
for elem in TT.profile_iter(range(100), "The second best task"):
    <some task>

# When using multiprocessing, it can be useful to simulate multiple hits of the same profile
with mp.Pool() as p, tt.profile("Processing 100 items on multiple threads", hits=100):
    p.map(100 items)

Data Storage

A data class that saves/loads its fields from disk. Anything that can be saved to a json file will be. Other data types will be saved to relevant file formats.

@dataclass
class Person(DataStorage):
    name: str
    age: int
    numbers: np.ndarray
    subfolder = "older"
    json_name = "yoda.json"

yoda = Person(name="Yoda", age=900, numbers=np.array([69, 420]))
yoda.save("old")
# Saved data at old/older/yoda.json
# {
#     "name": "Yoda",
#     "age": 900
# }
# There will also be a file named numbers.npy
yoda = Person.load("old")

Parsing

A combination of parsing CLI and config file arguments which allows for a powerful, easy-to-use workflow. Useful for parametric methods such as machine learning.

A file main.py could contain:

options = {
    "learning-rate": { "default": 1.5e-3, "help": "Controls size of parameter update", "type": float },
    "gamma": { "default": 1, "help": "Use of generator network in updating", "type": float },
    "initialize-zeros": { "help": "Whether to initialize all parameters to 0", "action": "store_true" },
}
parser = Parser(options)
location = parser.location  # Experiments are stored here
experiments = parser.parse()
parser.document_settings()  # Save a config file to reproduce the experiment

This could then by run by python main.py data/my-big-experiment --learning_rate 1e-5 or by python main.py data/my-big-experiment --config cfg.ini where cfg.ini could contain

[DEFAULT]
gamma = 0.95
[RUN1]
learning-rate = 1e-4
initialize-zeros
[RUN2]
learning-rate = 1e-5
gamma = 0.9

pelutils.ds

This submodule contains various utility functions for data science and machine learning. To make sure the necessary requirements are installed, install using

pip install pelutils[ds]

Note that in some terminals, you will instead have to write

pip install pelutils\[ds\]

PyTorch

All PyTorch functions work independently of whether CUDA is available or not.

# Clear CUDA cache and synchronize
reset_cuda()

# Inference only: No gradients should be tracked in the following function
# Same as putting entire function body inside with torch.no_grad()
@no_grad
def infer():
    <code that includes feedforwarding>

# Feed forward in batches to prevent using too much memory
# Every time a memory allocation error is encountered, the number of batches is doubled
# Same as using y = net(x), but without risk of running out of memory
bff = BatchFeedForward(net, len(x))
y = bff(x)
# Change to another network
bff.update_net(net2)

Statistics

Includes various commonly used statistical functions.

# Get one sided z value for exponential(lambda=2) distribution with a significance level of 1 %
zval = z(alpha=0.01, two_sided=False, distribution=scipy.stats.expon(loc=1/2))

# Get correlation, confidence interval, and p value for two vectors
a, b = np.random.randn(100), np.random.randn(100)
r, lower_r, upper_r, p = corr_ci(a, b, alpha=0.01)

Matplotlib

Contains predefined rc params, colours, and figure sizes.

# Set wide figure size
plt.figure(figsize=figsize_wide)

# Use larger font for larger figures - works well with predefined figure sizes
update_rc_params(rc_params)

# 15 different, unique colours
c = iter(colours)
for i in range(15):
    plt.plot(x[i], y[i], color=next(c))

History

0.6.3 - Breaking changes

  • Fixed bug where TickTock profiles would sometimes not be printed in the correct order
  • Removed TickTock.reset
  • Added __len__ and __iter__ methods to TickTock
  • Added option to print standard deviation for profiles
  • Renamed TimeUnit to TimeUnits to follow enum naming scheme
  • Time unit lengths are now given in units/s rather than s/unit

0.6.2

  • TickTock.__str__ now raises a ValueError if profiling is still ongoing to prevent incorrect calculations
  • Printing a TickTock instance now indents percentage of time spent to indicate task subsets

0.6.1

  • Added subfolder argument to Parser.document_settings

0.6.0 - Breaking changes

  • A global instance of TickTock, TT, has been added - similar to log
  • Added TickTock.profile_iter for performing profiling over a for loop
  • Fixed wrong error being thrown when keyboard interrupting within with TT.profile(...)
  • All collected logs are now logged upon an exception being thrown when using log.log_errors and collect_logs
  • Made log.log_errors capable of handling chained exeptions
  • Made log.throw private, as it had little use and could be exploited
  • get_repo no longer throws an error if a repository has not been found
  • Added utility functions for reading and writing .jsonl files
  • Fixed incorrect torch installations breaking importing pelutils

0.5.9

  • Add split_path function which splits a path into components
  • Fix bug in MainTest where test files where not deleted

0.5.7

  • Logger prints to stderr instead of stdout at level WARNING or above
  • Added log.tqdm that disables printing while looping over a tqdm object
  • Fixed from __future__ import annotations breaking DataStorage

0.5.6

  • DataStorage can save all picklable formats + torch.Tensor specifically

0.5.5

  • Test logging now uses Levels.DEBUG by default
  • Added TickTock.fuse_multiple for combining several TickTock instances
  • Fixed bugs when using multiple TickTock instances
  • Allow multiple hits in single profile
  • Now possible to profile using with statement
  • Added method to logger to parse boolean user input
  • Added method to Table for adding vertical lines manually

0.5.4 - Breaking changes

  • Change log error colour

  • Replace default log level with print level that defaults to Levels.INFO

    __call__ now always defaults to Levels.INFO

  • Print microseconds as us instead of mus

0.5.3

  • Fixed missing regex requirement

0.5.2

  • Allowed disabling printing by default in logger

0.5.1

  • Fixed accidental rich formatting in logger
  • Fixed logger crashing when not configured

0.5.0 - Breaking changes

  • Added np.unique-style unique function to ds that runs in linear time but does not sort
  • Replaced verbose/non-verbose logging with logging levels similar to built-in logging module
  • Added with_print option to log.__call__
  • Undid change from 0.3.4 such that None is now logged again
  • Added format module. Currently supports tables
  • Updated stringification of profiles to include percentage of parent profile
  • Added throws function that checks if a functions throws an exception of a specific type
  • Use Rich for printing to console when logging

0.4.1

  • Added append mode to logger to append to old log files instead of overwriting

0.4.0

  • Added ds submodule for data science and machine learning utilities

    This includes PyTorch utility functions, statistics, and matplotlib default values

0.3.4

  • Logger now raises errors normally instead of using throw method

0.3.3

  • get_repo now accepts a custom path search for repo as opposed to always using working dir

0.3.2

  • log.input now also accepts iterables as input

    For such inputs, it will return a generator of user inputs

0.3.1 - Breaking changes

  • Added functionality to logger for logging repository commit

  • Removed function get_commit

  • Added function get_repo which returns repository path and commit

    It attempts to find a repository by searching from working directory and upwards

  • Updates to examples in README and other minor documentation changes

  • set_seeds no longer returns seed, as this is already given as input to the function

0.3.0 - Breaking changes

  • Only works for Python 3.7+

  • If logger has not been configured, it now does no logging instead of crashing

    This prevents dependecies that use the logger to crash the program if it is not used

  • log.throw now also logs the actual error rather than just the stack trace

  • log now has public property is_verbose

  • Fixed with log.log_errors always throwing errors

  • Added code samples to README

  • Parser no longer automatically determines if experiments should be placed in subfolders

    Instead, this is given explicitly as an argument to __init__

    It also supports boolean flags in the config file

0.2.13

  • Readd clean method to logger

0.2.12 - Breaking changes

  • The logger is now solely a global variable

    Different loggers are handled internally in the global _Logger instance

0.2.11

  • Add catch property to logger to allow automatically logging errors with with
  • All code is now indented using spaces

0.2.10

  • Allow finer verbosity control in logger
  • Allow multiple log commands to be collected and logged at the same time
  • Add decorator for aforementioned feature
  • Change thousand_seps from TickTock method to stand-alone function in __init__
  • Verbose logging now has same signature as normal logging

0.2.8

  • Add code to execute code with specific environment variables

0.2.7

  • Fix error where the full stacktrace was not printed by log.throw

  • set_seeds now checks if torch is available

    This means torch seeds are still set without needing it as a dependency

0.2.6 - Breaking changes

  • Make Unverbose class private and update documentation
  • Update formatting when using .input

0.2.5

  • Add input method to logger

0.2.4

  • Better logging of errors

0.2.1 - Breaking changes

  • Removed torch as dependency

0.2.0 - Breaking changes

  • Logger is now a global variable, log

    Logging should happen by importing the log variable and calling .configure to set it up

    To reset the logger, .clean can be called

  • It is still possible to just import Logger and use it in the traditional way, though .configure should be called first

  • Changed timestamp function to give a cleaner output

  • get_commit now returns None if gitpython is not installed

0.1.2

  • Update documentation for logger and ticktock
  • Fix bug where seperator was not an argument to Logger.__call__

0.1.0

  • Include DataStorage
  • Logger can throw errors and handle seperators
  • TickTock includes time handling and units
  • Minor parser path changes

0.0.1

  • Logger, Parser, and TickTock added from previous projects

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pelutils-0.6.3.tar.gz (37.0 kB view hashes)

Uploaded Source

Built Distribution

pelutils-0.6.3-py3-none-any.whl (42.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page