Utility functions that are often useful

Project description

pelutils

Various utilities useful for Python projects. Features include

Feature-rich logger using Rich for colourful printing
Parsing for combining config files and command-line arguments - especially useful for parametric methods
Time taking and profiling
Easy to use data storage class for easy data saving and loading
Table formatting
Miscellaneous standalone functions providing various functionalities - see pelutils/__init__.py
Data-science submodule with extra utilities for statistics, plotting, and machine learning using PyTorch
unique function similar to np.unique but in linear time (currently Linux x86_64 only)

pelutils supports Python 3.7+.

Logging

Easy to use logger which fits common needs.

# Configure logger for the script
log.configure("path/to/save/log.log", "Title of log")

# Start logging
for i in range(70):  # Nice
    log("Execution %i" % i)

# Sections
log.section("New section in the logfile")

# Adjust logging levels
log.warning("Will be logged")
with log.level(Levels.ERROR):  # Only log at ERROR level or above
    log.warning("Will not be logged")

# Error handling
# The zero-division error and stacktrace is logged
with log.log_errors:
    0 / 0
# Entire chained stacktrace is logged
with log.log_errors:
    try:
        0 / 0
    except ZeroDivisionError as e:
        raise ValueError("Denominator must be non-zero") from e

# Disable printing if using tqdm
# Do not do this if the loop may be ended by a break statement!
for elem in log.tqdm(tqdm(range(5))):
    log(elem)  # Will be logged, but not printed

# User input
inp = log.input("WHAT... is your favourite colour? ")

# Log all logs from a function at the same time
# This is especially useful when using multiple threads so logging does not get mixed up
def fun():
    log("Hello there")
    log("General Kenobi!")
with mp.Pool() as p:
    p.map(collect_logs(fun), args)

# Disable printing when using tqdm so as to not print a million progress bars
for i in log.tqdm(tqdm(range(100))):
    log(i)  # i will be logged to logfile but not printed

Time Taking and Profiling

Simple time taker inspired by Matlab Tic, Toc, which also has profiling tooling.

TT.tick()
<some task>
seconds_used = TT.tock()

for i in range(100):
    TT.profile("Repeated code")
    <some task>
    TT.profile("Subtask")
    <some subtask>
    TT.end_profile()
    TT.end_profile()
print(TT)  # Prints a table view of profiled code sections

# Alternative syntax using with statement
with TT.profile("The best task"):
    <some task>

# Profile a loop
# Do not do this if the loop may be ended by a break statement!
for elem in TT.profile_iter(range(100), "The second best task"):
    <some task>

# When using multiprocessing, it can be useful to simulate multiple hits of the same profile
with mp.Pool() as p, tt.profile("Processing 100 items on multiple threads", hits=100):
    p.map(100 items)

Data Storage

A data class that saves/loads its fields from disk. Anything that can be saved to a json file will be. Other data types will be saved to relevant file formats.

@dataclass
class Person(DataStorage):
    name: str
    age: int
    numbers: np.ndarray
    subfolder = "older"  # Save in this subfolder within folder given to .save and .load. Don't set for no subfolder
    json_name = "yoda.json"

yoda = Person(name="Yoda", age=900, numbers=np.array([69, 420]))
yoda.save("old")  # Save to 'old' folder
# Saved data at old/older/yoda.json
# {
#     "name": "Yoda",
#     "age": 900
# }
# There will also be a file named numbers.npy
yoda = Person.load("old")

Parsing

A combination of parsing CLI and config file arguments which allows for a powerful, easy-to-use workflow. Useful for parametric methods such as machine learning.

A file main.py could contain:

options = {
    "learning-rate": { "default": 1.5e-3, "help": "Controls size of parameter update", "type": float },
    "gamma": { "default": 1, "help": "Use of generator network in updating", "type": float },
    "initialize-zeros": { "help": "Whether to initialize all parameters to 0", "action": "store_true" },
}
parser = Parser(options)
location = parser.location  # Experiments are stored here
experiments = parser.parse()
parser.document_settings()  # Save a config file to reproduce the experiment
# Run each experiment
for args in experiments:
    run_experiment(location, args)

# Alternatively, if there is only ever a single job
parser = Parser(options, multiple_jobs=False)
location = parser.location
args = parser.parse()
parser.document_settings()
run_experiment(location, args)

# Check if an argument has been given explictly, either from cli or config file, or if default value is used
parser.is_explicit("learning-rate")

This could then by run by python main.py data/my-big-experiment --learning-rate 1e-5 or by python main.py data/my-big-experiment --config cfg.ini or using a combination where CLI args takes precedence: python main.py data/my-big-experiment --config cfg.ini --learning-rate 1e-5 where cfg.ini could contain

[DEFAULT]
gamma = 0.95

[RUN1]
learning-rate = 1e-4
initialize-zeros

[RUN2]
learning-rate = 1e-5
gamma = 0.9

pelutils.ds

This submodule contains various utility functions for data science and machine learning. To make sure the necessary requirements are installed, install using

pip install pelutils[ds]

Note that in some terminals, you will instead have to write

pip install pelutils\[ds\]

PyTorch

All PyTorch functions work independently of whether CUDA is available or not.

# Inference only: No gradients should be tracked in the following function
# Same as putting entire function body inside with torch.no_grad()
@no_grad
def infer():
    <code that includes feedforwarding>

# Feed forward in batches to prevent using too much memory
# Every time a memory allocation error is encountered, the number of batches is doubled
# Same as using y = net(x), but without risk of running out of memory
# Gradients are not tracked
bff = BatchFeedForward(net)
y = bff(x)

Statistics

Includes various commonly used statistical functions.

# Get one sided z value for exponential(lambda=2) distribution with a significance level of 1 %
zval = z(alpha=0.01, two_sided=False, distribution=scipy.stats.expon(loc=1/2))

# Get correlation, confidence interval, and p value for two vectors
a, b = np.random.randn(100), np.random.randn(100)
r, lower_r, upper_r, p = corr_ci(a, b, alpha=0.01)

Matplotlib

Contains predefined rc params, colours, and figure sizes.

# Set wide figure size
plt.figure(figsize=figsize_wide)

# Use larger font for larger figures - works well with predefined figure sizes
update_rc_params(rc_params)

# 15 different, unique colours
c = iter(colours)
for i in range(15):
    plt.plot(x[i], y[i], color=next(c))

History

0.6.9 - Nice

Made load_jsonl load the file lazily

0.6.7

Logger can now be used without writing to file

0.6.6

Fix parser naming when using config files and not multiple_jobs
Fix parser naming when using cli only and multiple_jobs

0.6.5 - Breaking changes

Parser.parse now returns only a single experiment dict if multiple_jobs is False
Improved logger error messages
Added Parser.is_explicit to check if an argument was given explicitly, either from CLI or a config file
Fixed bug in parser, where if a type was not given, values from config files would not be used
Made fields that should not be used externally private in parser
Made pelutils.ds.unique slightly faster

0.6.4 - Breaking changes

Commit is now logged as DEBUG
Removed BatchFeedForward.update_net
BatchFeedForward no longer requires batch size and increase factor as an argument
Removed reset_cuda function, as was a too small and obscure function and broke distributed training
Added ignore_missing field to DataStorage for ignoring missing fields in stored data

0.6.3 - Breaking changes

Fixed bug where TickTock profiles would sometimes not be printed in the correct order
Removed TickTock.reset
Added __len__ and __iter__ methods to TickTock
Added option to print standard deviation for profiles
Renamed TimeUnit to TimeUnits to follow enum naming scheme
Time unit lengths are now given in units/s rather than s/unit

0.6.2

TickTock.__str__ now raises a ValueError if profiling is still ongoing to prevent incorrect calculations
Printing a TickTock instance now indents percentage of time spent to indicate task subsets

0.6.1

Added subfolder argument to Parser.document_settings

0.6.0 - Breaking changes

A global instance of TickTock, TT, has been added - similar to log
Added TickTock.profile_iter for performing profiling over a for loop
Fixed wrong error being thrown when keyboard interrupting within with TT.profile(...)
All collected logs are now logged upon an exception being thrown when using log.log_errors and collect_logs
Made log.log_errors capable of handling chained exeptions
Made log.throw private, as it had little use and could be exploited
get_repo no longer throws an error if a repository has not been found
Added utility functions for reading and writing .jsonl files
Fixed incorrect torch installations breaking importing pelutils

0.5.9

Add split_path function which splits a path into components
Fix bug in MainTest where test files where not deleted

0.5.7

Logger prints to stderr instead of stdout at level WARNING or above
Added log.tqdm that disables printing while looping over a tqdm object
Fixed from __future__ import annotations breaking DataStorage

0.5.6

DataStorage can save all picklable formats + torch.Tensor specifically

0.5.5

Test logging now uses Levels.DEBUG by default
Added TickTock.fuse_multiple for combining several TickTock instances
Fixed bugs when using multiple TickTock instances
Allow multiple hits in single profile
Now possible to profile using with statement
Added method to logger to parse boolean user input
Added method to Table for adding vertical lines manually

0.5.4 - Breaking changes

Change log error colour
Replace default log level with print level that defaults to Levels.INFO

__call__ now always defaults to Levels.INFO
Print microseconds as us instead of mus

0.5.3

Fixed missing regex requirement

0.5.2

Allowed disabling printing by default in logger

0.5.1

Fixed accidental rich formatting in logger
Fixed logger crashing when not configured

0.5.0 - Breaking changes

Added np.unique-style unique function to ds that runs in linear time but does not sort
Replaced verbose/non-verbose logging with logging levels similar to built-in logging module
Added with_print option to log.__call__
Undid change from 0.3.4 such that None is now logged again
Added format module. Currently supports tables
Updated stringification of profiles to include percentage of parent profile
Added throws function that checks if a functions throws an exception of a specific type
Use Rich for printing to console when logging

0.4.1

Added append mode to logger to append to old log files instead of overwriting

0.4.0

Added ds submodule for data science and machine learning utilities

This includes PyTorch utility functions, statistics, and matplotlib default values

0.3.4

Logger now raises errors normally instead of using throw method

0.3.3

get_repo now accepts a custom path search for repo as opposed to always using working dir

0.3.2

log.input now also accepts iterables as input

For such inputs, it will return a generator of user inputs

0.3.1 - Breaking changes

Added functionality to logger for logging repository commit
Removed function get_commit
Added function get_repo which returns repository path and commit

It attempts to find a repository by searching from working directory and upwards
Updates to examples in README and other minor documentation changes
set_seeds no longer returns seed, as this is already given as input to the function

0.3.0 - Breaking changes

Only works for Python 3.7+
If logger has not been configured, it now does no logging instead of crashing

This prevents dependecies that use the logger to crash the program if it is not used
log.throw now also logs the actual error rather than just the stack trace
log now has public property is_verbose
Fixed with log.log_errors always throwing errors
Added code samples to README
Parser no longer automatically determines if experiments should be placed in subfolders

Instead, this is given explicitly as an argument to __init__

It also supports boolean flags in the config file

0.2.13

Readd clean method to logger

0.2.12 - Breaking changes

The logger is now solely a global variable

Different loggers are handled internally in the global _Logger instance

0.2.11

Add catch property to logger to allow automatically logging errors with with
All code is now indented using spaces

0.2.10

Allow finer verbosity control in logger
Allow multiple log commands to be collected and logged at the same time
Add decorator for aforementioned feature
Change thousand_seps from TickTock method to stand-alone function in __init__
Verbose logging now has same signature as normal logging

0.2.8

Add code to execute code with specific environment variables

0.2.7

Fix error where the full stacktrace was not printed by log.throw
set_seeds now checks if torch is available

This means torch seeds are still set without needing it as a dependency

0.2.6 - Breaking changes

Make Unverbose class private and update documentation
Update formatting when using .input

0.2.5

Add input method to logger

0.2.4

Better logging of errors

0.2.1 - Breaking changes

Removed torch as dependency

0.2.0 - Breaking changes

Logger is now a global variable, log

Logging should happen by importing the log variable and calling .configure to set it up

To reset the logger, .clean can be called
It is still possible to just import Logger and use it in the traditional way, though .configure should be called first
Changed timestamp function to give a cleaner output
get_commit now returns None if gitpython is not installed

0.1.2

Update documentation for logger and ticktock
Fix bug where seperator was not an argument to Logger.__call__

0.1.0

Include DataStorage
Logger can throw errors and handle seperators
TickTock includes time handling and units
Minor parser path changes

0.0.1

Logger, Parser, and TickTock added from previous projects

Project details

Release history Release notifications | RSS feed

3.1.0

Mar 22, 2024

3.0.1

Mar 1, 2024

3.0.0

Feb 19, 2024

3.0.0a4 pre-release

Feb 5, 2024

3.0.0a3 pre-release

Feb 2, 2024

3.0.0a2 pre-release

Sep 7, 2023

3.0.0a1 pre-release

May 30, 2023

2.0.0

Dec 25, 2022

1.1.0

May 29, 2022

1.0.0

Apr 19, 2022

0.99.0

Apr 18, 2022

This version

0.6.9

Oct 5, 2021

0.6.7

Jun 17, 2021

0.6.6

Jun 4, 2021

0.6.5

May 24, 2021

0.6.4

May 21, 2021

0.6.3

May 9, 2021

0.6.2

Apr 30, 2021

0.6.1

Apr 22, 2021

0.6.0

Apr 22, 2021

0.5.9

Apr 6, 2021

0.5.7

Mar 19, 2021

0.5.6

Mar 12, 2021

0.5.5

Feb 14, 2021

0.5.4

Feb 7, 2021

0.5.3

Jan 31, 2021

0.5.2

Jan 31, 2021

0.5.1

Jan 29, 2021

0.5.0

Jan 28, 2021

0.4.1

Jan 24, 2021

0.4.0

Jan 18, 2021

0.3.5

Jan 17, 2021

0.3.3

Jan 12, 2021

0.3.2

Jan 8, 2021

0.3.1

Jan 8, 2021

0.3.0

Jan 2, 2021

0.2.13

Dec 27, 2020

0.2.12

Dec 27, 2020

0.2.11

Dec 26, 2020

0.2.10

Dec 22, 2020

0.2.9

Dec 22, 2020

0.2.8

Dec 19, 2020

0.2.7

Nov 15, 2020

0.2.6

Nov 3, 2020

0.2.5

Oct 6, 2020

0.2.4

Oct 1, 2020

0.2.3

Oct 1, 2020

0.2.2

Oct 1, 2020

0.2.1

Sep 29, 2020

0.2.0

Sep 29, 2020

0.1.2

Aug 2, 2020

0.1.1

Jul 29, 2020

0.1.0

Jul 29, 2020

0.0.1.post1

May 31, 2020

0.0.1

May 31, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pelutils-0.6.9.tar.gz (38.3 kB view hashes)

Uploaded Oct 5, 2021 Source

Built Distribution

pelutils-0.6.9-py3-none-any.whl (35.4 kB view hashes)

Uploaded Oct 5, 2021 Python 3

Hashes for pelutils-0.6.9.tar.gz

Hashes for pelutils-0.6.9.tar.gz
Algorithm	Hash digest
SHA256	`60d0166e948a13f8b19144f2eb63af28396c6a95531ff5606a75d28e84614b40`
MD5	`9ee8e78d87a095854ea89bfe9b169cd8`
BLAKE2b-256	`a75c4188a1707e224ed0e9ebebb4066c25bbe49137c656b1637b47d4a92ea8aa`

Hashes for pelutils-0.6.9-py3-none-any.whl

Hashes for pelutils-0.6.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f38680183e12bff5cf8972b0ee6e65c548b6f8345565ab34aa4929ec562edade`
MD5	`f4ff2d241ccd50a859388b4235e622c0`
BLAKE2b-256	`3cdbb7986a6bbb607fb58da106c32882fc6901b084653c10f762aeb4512720c9`

pelutils 0.6.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

pelutils

Logging

Time Taking and Profiling

Data Storage

Parsing

pelutils.ds

PyTorch

Statistics

Matplotlib

History

0.6.9 - Nice

0.6.7

0.6.6

0.6.5 - Breaking changes

0.6.4 - Breaking changes

0.6.3 - Breaking changes

0.6.2

0.6.1

0.6.0 - Breaking changes

0.5.9

0.5.7

0.5.6

0.5.5

0.5.4 - Breaking changes

0.5.3

0.5.2

0.5.1

0.5.0 - Breaking changes

0.4.1

0.4.0

0.3.4

0.3.3

0.3.2

0.3.1 - Breaking changes

0.3.0 - Breaking changes

0.2.13

0.2.12 - Breaking changes

0.2.11

0.2.10

0.2.8

0.2.7

0.2.6 - Breaking changes

0.2.5

0.2.4

0.2.1 - Breaking changes

0.2.0 - Breaking changes

0.1.2

0.1.0

0.0.1

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution