Utility functions that are often useful
Project description
pelutils
Various utilities useful for Python projects. Features include
- Feature-rich logger using
Rich
for colourful printing - Parsing for combining config files and command-line arguments - especially useful for parametric methods
- Time taking and profiling
- Easy to use data storage class for easy data saving and loading
- Table formatting
- Miscellaneous standalone functions providing various functionalities - see
pelutils/__init__.py
- Data-science submodule with extra utilities for statistics, plotting, and machine learning using
PyTorch
unique
function similar tonp.unique
but in linear time (currently Linux x86_64 only)
pelutils
supports Python 3.7+.
Logging
Easy to use logger which fits common needs.
# Configure logger for the script
log.configure("path/to/save/log.log", "Title of log")
# Start logging
for i in range(70): # Nice
log("Execution %i" % i)
# Sections
log.section("New section in the logfile")
# Verbose logging for less important things
log.verbose("Will be logged")
with log.unverbose:
log.verbose("Will not be logged")
# Error handling
# This explicitly logs a ValueError and then raises it
log.throw(ValueError("Your value is bad, and you should feel bad"))
# The zero-division error is logged
with log.log_errors:
0 / 0
# User input
inp = log.input("WHAT... is your favourite colour? ")
# Log all logs from a function at the same time
# This is especially useful when using multiple threads so logging does not get mixed up
def fun():
log("Hello there")
log("General Kenobi!")
with mp.Pool() as p:
p.map(collect_logs(fun), args)
# Disable printing when using tqdm so as to not print a million progress bars
for i in log.tqdm(tqdm(range(100))):
log(i) # i will be logged to logfile but not printed
Time Taking and Profiling
Simple time taker inspired by Matlab Tic, Toc, which also has profiling tooling.
tt = TickTock()
tt.tick()
<some task>
seconds_used = tt.tock()
for i in range(100):
tt.profile("Repeated code")
<some task>
tt.profile("Subtask")
<some subtask>
tt.end_profile()
tt.end_profile()
print(tt) # Prints a table view of profiled code sections
# Alternative syntax using with statement
with tt.profile("The best task"):
<some task>
# When using multiprocessing, it can be useful to simulate multiple hits of the same profile
with mp.Pool() as p, tt.profile("Processing 100 items on multiple threads", hits=100):
p.map(100 items)
Data Storage
A data class that saves/loads its fields from disk.
Anything that can be saved to a json
file will be.
Other data types will be saved to relevant file formats.
@dataclass
class Person(DataStorage):
name: str
age: int
numbers: np.ndarray
subfolder = "older"
json_name = "yoda.json"
yoda = Person(name="Yoda", age=900, numbers=np.array([69, 420]))
yoda.save("old")
# Saved data at old/older/yoda.json
# {
# "name": "Yoda",
# "age": 900
# }
# There will also be a file named numbers.npy
yoda = Person.load("old")
Parsing
A combination of parsing CLI and config file arguments which allows for a powerful, easy-to-use workflow. Useful for parametric methods such as machine learning.
A file main.py
could contain:
options = {
"learning-rate": { "default": 1.5e-3, "help": "Controls size of parameter update", "type": float },
"gamma": { "default": 1, "help": "Use of generator network in updating", "type": float },
"initialize-zeros": { "help": "Whether to initialize all parameters to 0", "action": "store_true" },
}
parser = Parser(options)
location = parser.location # Experiments are stored here
experiments = parser.parse()
parser.document_settings() # Save a config file to reproduce the experiment
This could then by run by
python main.py data/my-big-experiment --learning_rate 1e-5
or by
python main.py data/my-big-experiment --config cfg.ini
where cfg.ini
could contain
[DEFAULT]
gamma = 0.95
[RUN1]
learning-rate = 1e-4
initialize-zeros
[RUN2]
learning-rate = 1e-5
gamma = 0.9
pelutils.ds
This submodule contains various utility functions for data science and machine learning. To make sure the necessary requirements are installed, install using
pip install pelutils[ds]
Note that in some terminals, you will instead have to write
pip install pelutils\[ds\]
PyTorch
All PyTorch functions work independently of whether CUDA is available or not.
# Clear CUDA cache and synchronize
reset_cuda()
# Inference only: No gradients should be tracked in the following function
# Same as putting entire function body inside with torch.no_grad()
@no_grad
def infer():
<code that includes feedforwarding>
# Feed forward in batches to prevent using too much memory
# Every time a memory allocation error is encountered, the number of batches is doubled
# Same as using y = net(x), but without risk of running out of memory
bff = BatchFeedForward(net, len(x))
y = bff(x)
# Change to another network
bff.update_net(net2)
Statistics
Includes various commonly used statistical functions.
# Get one sided z value for exponential(lambda=2) distribution with a significance level of 1 %
zval = z(alpha=0.01, two_sided=False, distribution=scipy.stats.expon(loc=1/2))
# Get correlation, confidence interval, and p value for two vectors
a, b = np.random.randn(100), np.random.randn(100)
r, lower_r, upper_r, p = corr_ci(a, b, alpha=0.01)
Matplotlib
Contains predefined rc params, colours, and figure sizes.
# Set wide figure size
plt.figure(figsize=figsize_wide)
# Use larger font for larger figures - works well with predefined figure sizes
update_rc_params(rc_params)
# 15 different, unique colours
c = iter(colours)
for i in range(15):
plt.plot(x[i], y[i], color=next(c))
History
0.5.9
- Add
split_path
function which splits a path into components - Fix bug in
MainTest
where test files where not deleted
0.5.7
- Logger prints to
stderr
instead ofstdout
at level WARNING or above - Added
log.tqdm
that disables printing while looping over atqdm
object - Fixed
from __future__ import annotations
breakingDataStorage
0.5.6
- DataStorage can save all picklable formats +
torch.Tensor
specifically
0.5.5
- Test logging now uses
Levels.DEBUG
by default - Added
TickTock.fuse_multiple
for combining severalTickTock
instances - Fixed bugs when using multiple
TickTock
instances - Allow multiple hits in single profile
- Now possible to profile using
with
statement - Added method to logger to parse boolean user input
- Added method to
Table
for adding vertical lines manually
0.5.4 - Breaking changes
-
Change log error colour
-
Replace default log level with print level that defaults to
Levels.INFO
__call__
now always defaults toLevels.INFO
-
Print microseconds as
us
instead ofmus
0.5.3
- Fixed missing regex requirement
0.5.2
- Allowed disabling printing by default in logger
0.5.1
- Fixed accidental rich formatting in logger
- Fixed logger crashing when not configured
0.5.0 - Breaking changes
- Added np.unique-style unique function to
ds
that runs in linear time but does not sort - Replaced verbose/non-verbose logging with logging levels similar to built-in
logging
module - Added
with_print
option tolog.__call__
- Undid change from 0.3.4 such that
None
is now logged again - Added
format
module. Currently supports tables - Updated stringification of profiles to include percentage of parent profile
- Added
throws
function that checks if a functions throws an exception of a specific type - Use
Rich
for printing to console when logging
0.4.1
- Added append mode to logger to append to old log files instead of overwriting
0.4.0
-
Added
ds
submodule for data science and machine learning utilitiesThis includes
PyTorch
utility functions, statistics, andmatplotlib
default values
0.3.4
- Logger now raises errors normally instead of using
throw
method
0.3.3
get_repo
now accepts a custom path search for repo as opposed to always using working dir
0.3.2
-
log.input
now also accepts iterables as inputFor such inputs, it will return a generator of user inputs
0.3.1 - Breaking changes
-
Added functionality to logger for logging repository commit
-
Removed function
get_commit
-
Added function
get_repo
which returns repository path and commitIt attempts to find a repository by searching from working directory and upwards
-
Updates to examples in
README
and other minor documentation changes -
set_seeds
no longer returns seed, as this is already given as input to the function
0.3.0 - Breaking changes
-
Only works for Python 3.7+
-
If logger has not been configured, it now does no logging instead of crashing
This prevents dependecies that use the logger to crash the program if it is not used
-
log.throw
now also logs the actual error rather than just the stack trace -
log
now has public propertyis_verbose
-
Fixed
with log.log_errors
always throwing errors -
Added code samples to
README
-
Parser
no longer automatically determines if experiments should be placed in subfoldersInstead, this is given explicitly as an argument to
__init__
It also supports boolean flags in the config file
0.2.13
- Readd clean method to logger
0.2.12 - Breaking changes
-
The logger is now solely a global variable
Different loggers are handled internally in the global _Logger instance
0.2.11
- Add catch property to logger to allow automatically logging errors with with
- All code is now indented using spaces
0.2.10
- Allow finer verbosity control in logger
- Allow multiple log commands to be collected and logged at the same time
- Add decorator for aforementioned feature
- Change thousand_seps from TickTock method to stand-alone function in
__init__
- Verbose logging now has same signature as normal logging
0.2.8
- Add code to execute code with specific environment variables
0.2.7
-
Fix error where the full stacktrace was not printed by log.throw
-
set_seeds
now checks if torch is availableThis means torch seeds are still set without needing it as a dependency
0.2.6 - Breaking changes
- Make Unverbose class private and update documentation
- Update formatting when using .input
0.2.5
- Add input method to logger
0.2.4
- Better logging of errors
0.2.1 - Breaking changes
- Removed torch as dependency
0.2.0 - Breaking changes
-
Logger is now a global variable,
log
Logging should happen by importing the log variable and calling
.configure
to set it upTo reset the logger,
.clean
can be called -
It is still possible to just import
Logger
and use it in the traditional way, though.configure
should be called first -
Changed timestamp function to give a cleaner output
-
get_commit
now returnsNone
ifgitpython
is not installed
0.1.2
- Update documentation for logger and ticktock
- Fix bug where seperator was not an argument to
Logger.__call__
0.1.0
- Include
DataStorage
- Logger can throw errors and handle seperators
- TickTock includes time handling and units
- Minor parser path changes
0.0.1
- Logger, Parser, and TickTock added from previous projects
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.