Utility functions that are often useful
Project description
pelutils
Various utilities useful for Python projects. Features include
- Feature-rich logger using
Rich
for colourful printing - Parsing for combining config files and command-line arguments - especially useful for parametric methods
- Time taking and profiling
- Easy to use data storage class for easy data saving and loading
- Table formatting
- Miscellaneous standalone functions providing various functionalities - see
pelutils/__init__.py
- Data-science submodule with extra utilities for statistics, plotting, and machine learning using
PyTorch
unique
function similar tonp.unique
but in linear time (currently Linux x86_64 only)
pelutils
supports Python 3.7+.
Logging
Easy to use logger which fits common needs.
# Configure logger for the script
log.configure("path/to/save/log.log", "Title of log")
# Start logging
for i in range(70): # Nice
log("Execution %i" % i)
# Sections
log.section("New section in the logfile")
# Adjust logging levels
log.warning("Will be logged")
with log.level(Levels.ERROR): # Only log at ERROR level or above
log.warning("Will not be logged")
# Error handling
# The zero-division error and stacktrace is logged
with log.log_errors:
0 / 0
# Entire chained stacktrace is logged
with log.log_errors:
try:
0 / 0
except ZeroDivisionError as e:
raise ValueError("Denominator must be non-zero") from e
# Disable printing if using tqdm
# Do not do this if the loop may be ended by a break statement!
for elem in log.tqdm(tqdm(range(5))):
log(elem) # Will be logged, but not printed
# User input
inp = log.input("WHAT... is your favourite colour? ")
# Log all logs from a function at the same time
# This is especially useful when using multiple threads so logging does not get mixed up
def fun():
log("Hello there")
log("General Kenobi!")
with mp.Pool() as p:
p.map(collect_logs(fun), args)
# Disable printing when using tqdm so as to not print a million progress bars
for i in log.tqdm(tqdm(range(100))):
log(i) # i will be logged to logfile but not printed
Time Taking and Profiling
Simple time taker inspired by Matlab Tic, Toc, which also has profiling tooling.
TT.tick()
<some task>
seconds_used = TT.tock()
for i in range(100):
TT.profile("Repeated code")
<some task>
TT.profile("Subtask")
<some subtask>
TT.end_profile()
TT.end_profile()
print(TT) # Prints a table view of profiled code sections
# Alternative syntax using with statement
with TT.profile("The best task"):
<some task>
# Profile a loop
# Do not do this if the loop may be ended by a break statement!
for elem in TT.profile_iter(range(100), "The second best task"):
<some task>
# When using multiprocessing, it can be useful to simulate multiple hits of the same profile
with mp.Pool() as p, tt.profile("Processing 100 items on multiple threads", hits=100):
p.map(100 items)
Data Storage
A data class that saves/loads its fields from disk.
Anything that can be saved to a json
file will be.
Other data types will be saved to relevant file formats.
@dataclass
class Person(DataStorage):
name: str
age: int
numbers: np.ndarray
subfolder = "older" # Save in this subfolder within folder given to .save and .load. Don't set for no subfolder
json_name = "yoda.json"
yoda = Person(name="Yoda", age=900, numbers=np.array([69, 420]))
yoda.save("old") # Save to 'old' folder
# Saved data at old/older/yoda.json
# {
# "name": "Yoda",
# "age": 900
# }
# There will also be a file named numbers.npy
yoda = Person.load("old")
Parsing
A combination of parsing CLI and config file arguments which allows for a powerful, easy-to-use workflow. Useful for parametric methods such as machine learning.
A file main.py
could contain:
options = {
"learning-rate": { "default": 1.5e-3, "help": "Controls size of parameter update", "type": float },
"gamma": { "default": 1, "help": "Use of generator network in updating", "type": float },
"initialize-zeros": { "help": "Whether to initialize all parameters to 0", "action": "store_true" },
}
parser = Parser(options)
location = parser.location # Experiments are stored here
experiments = parser.parse()
parser.document_settings() # Save a config file to reproduce the experiment
# Run each experiment
for args in experiments:
run_experiment(location, args)
# Alternatively, if there is only ever a single job
parser = Parser(options, multiple_jobs=False)
location = parser.location
args = parser.parse()
parser.document_settings()
run_experiment(location, args)
# Check if an argument has been given explictly, either from cli or config file, or if default value is used
parser.is_explicit("learning-rate")
This could then by run by
python main.py data/my-big-experiment --learning-rate 1e-5
or by
python main.py data/my-big-experiment --config cfg.ini
or using a combination where CLI args takes precedence:
python main.py data/my-big-experiment --config cfg.ini --learning-rate 1e-5
where cfg.ini
could contain
[DEFAULT]
gamma = 0.95
[RUN1]
learning-rate = 1e-4
initialize-zeros
[RUN2]
learning-rate = 1e-5
gamma = 0.9
pelutils.ds
This submodule contains various utility functions for data science and machine learning. To make sure the necessary requirements are installed, install using
pip install pelutils[ds]
Note that in some terminals, you will instead have to write
pip install pelutils\[ds\]
PyTorch
All PyTorch functions work independently of whether CUDA is available or not.
# Inference only: No gradients should be tracked in the following function
# Same as putting entire function body inside with torch.no_grad()
@no_grad
def infer():
<code that includes feedforwarding>
# Feed forward in batches to prevent using too much memory
# Every time a memory allocation error is encountered, the number of batches is doubled
# Same as using y = net(x), but without risk of running out of memory
# Gradients are not tracked
bff = BatchFeedForward(net)
y = bff(x)
Statistics
Includes various commonly used statistical functions.
# Get one sided z value for exponential(lambda=2) distribution with a significance level of 1 %
zval = z(alpha=0.01, two_sided=False, distribution=scipy.stats.expon(loc=1/2))
# Get correlation, confidence interval, and p value for two vectors
a, b = np.random.randn(100), np.random.randn(100)
r, lower_r, upper_r, p = corr_ci(a, b, alpha=0.01)
Matplotlib
Contains predefined rc params, colours, and figure sizes.
# Set wide figure size
plt.figure(figsize=figsize_wide)
# Use larger font for larger figures - works well with predefined figure sizes
update_rc_params(rc_params)
# 15 different, unique colours
c = iter(colours)
for i in range(15):
plt.plot(x[i], y[i], color=next(c))
History
0.6.9 - Nice
- Made
load_jsonl
load the file lazily
0.6.7
- Logger can now be used without writing to file
0.6.6
- Fix parser naming when using config files and not
multiple_jobs
- Fix parser naming when using cli only and
multiple_jobs
0.6.5 - Breaking changes
Parser.parse
now returns only a single experiment dict ifmultiple_jobs
is False- Improved logger error messages
- Added
Parser.is_explicit
to check if an argument was given explicitly, either from CLI or a config file - Fixed bug in parser, where if a type was not given, values from config files would not be used
- Made fields that should not be used externally private in parser
- Made
pelutils.ds.unique
slightly faster
0.6.4 - Breaking changes
- Commit is now logged as
DEBUG
- Removed
BatchFeedForward.update_net
BatchFeedForward
no longer requires batch size and increase factor as an argument- Removed
reset_cuda
function, as was a too small and obscure function and broke distributed training - Added
ignore_missing
field toDataStorage
for ignoring missing fields in stored data
0.6.3 - Breaking changes
- Fixed bug where TickTock profiles would sometimes not be printed in the correct order
- Removed
TickTock.reset
- Added
__len__
and__iter__
methods toTickTock
- Added option to print standard deviation for profiles
- Renamed
TimeUnit
toTimeUnits
to followenum
naming scheme - Time unit lengths are now given in units/s rather than s/unit
0.6.2
TickTock.__str__
now raises aValueError
if profiling is still ongoing to prevent incorrect calculations- Printing a
TickTock
instance now indents percentage of time spent to indicate task subsets
0.6.1
- Added
subfolder
argument toParser.document_settings
0.6.0 - Breaking changes
- A global instance of
TickTock
,TT
, has been added - similar tolog
- Added
TickTock.profile_iter
for performing profiling over a for loop - Fixed wrong error being thrown when keyboard interrupting within
with TT.profile(...)
- All collected logs are now logged upon an exception being thrown when using
log.log_errors
andcollect_logs
- Made
log.log_errors
capable of handling chained exeptions - Made
log.throw
private, as it had little use and could be exploited get_repo
no longer throws an error if a repository has not been found- Added utility functions for reading and writing
.jsonl
files - Fixed incorrect
torch
installations breaking importingpelutils
0.5.9
- Add
split_path
function which splits a path into components - Fix bug in
MainTest
where test files where not deleted
0.5.7
- Logger prints to
stderr
instead ofstdout
at level WARNING or above - Added
log.tqdm
that disables printing while looping over atqdm
object - Fixed
from __future__ import annotations
breakingDataStorage
0.5.6
- DataStorage can save all picklable formats +
torch.Tensor
specifically
0.5.5
- Test logging now uses
Levels.DEBUG
by default - Added
TickTock.fuse_multiple
for combining severalTickTock
instances - Fixed bugs when using multiple
TickTock
instances - Allow multiple hits in single profile
- Now possible to profile using
with
statement - Added method to logger to parse boolean user input
- Added method to
Table
for adding vertical lines manually
0.5.4 - Breaking changes
-
Change log error colour
-
Replace default log level with print level that defaults to
Levels.INFO
__call__
now always defaults toLevels.INFO
-
Print microseconds as
us
instead ofmus
0.5.3
- Fixed missing regex requirement
0.5.2
- Allowed disabling printing by default in logger
0.5.1
- Fixed accidental rich formatting in logger
- Fixed logger crashing when not configured
0.5.0 - Breaking changes
- Added np.unique-style unique function to
ds
that runs in linear time but does not sort - Replaced verbose/non-verbose logging with logging levels similar to built-in
logging
module - Added
with_print
option tolog.__call__
- Undid change from 0.3.4 such that
None
is now logged again - Added
format
module. Currently supports tables - Updated stringification of profiles to include percentage of parent profile
- Added
throws
function that checks if a functions throws an exception of a specific type - Use
Rich
for printing to console when logging
0.4.1
- Added append mode to logger to append to old log files instead of overwriting
0.4.0
-
Added
ds
submodule for data science and machine learning utilitiesThis includes
PyTorch
utility functions, statistics, andmatplotlib
default values
0.3.4
- Logger now raises errors normally instead of using
throw
method
0.3.3
get_repo
now accepts a custom path search for repo as opposed to always using working dir
0.3.2
-
log.input
now also accepts iterables as inputFor such inputs, it will return a generator of user inputs
0.3.1 - Breaking changes
-
Added functionality to logger for logging repository commit
-
Removed function
get_commit
-
Added function
get_repo
which returns repository path and commitIt attempts to find a repository by searching from working directory and upwards
-
Updates to examples in
README
and other minor documentation changes -
set_seeds
no longer returns seed, as this is already given as input to the function
0.3.0 - Breaking changes
-
Only works for Python 3.7+
-
If logger has not been configured, it now does no logging instead of crashing
This prevents dependecies that use the logger to crash the program if it is not used
-
log.throw
now also logs the actual error rather than just the stack trace -
log
now has public propertyis_verbose
-
Fixed
with log.log_errors
always throwing errors -
Added code samples to
README
-
Parser
no longer automatically determines if experiments should be placed in subfoldersInstead, this is given explicitly as an argument to
__init__
It also supports boolean flags in the config file
0.2.13
- Readd clean method to logger
0.2.12 - Breaking changes
-
The logger is now solely a global variable
Different loggers are handled internally in the global _Logger instance
0.2.11
- Add catch property to logger to allow automatically logging errors with with
- All code is now indented using spaces
0.2.10
- Allow finer verbosity control in logger
- Allow multiple log commands to be collected and logged at the same time
- Add decorator for aforementioned feature
- Change thousand_seps from TickTock method to stand-alone function in
__init__
- Verbose logging now has same signature as normal logging
0.2.8
- Add code to execute code with specific environment variables
0.2.7
-
Fix error where the full stacktrace was not printed by log.throw
-
set_seeds
now checks if torch is availableThis means torch seeds are still set without needing it as a dependency
0.2.6 - Breaking changes
- Make Unverbose class private and update documentation
- Update formatting when using .input
0.2.5
- Add input method to logger
0.2.4
- Better logging of errors
0.2.1 - Breaking changes
- Removed torch as dependency
0.2.0 - Breaking changes
-
Logger is now a global variable,
log
Logging should happen by importing the log variable and calling
.configure
to set it upTo reset the logger,
.clean
can be called -
It is still possible to just import
Logger
and use it in the traditional way, though.configure
should be called first -
Changed timestamp function to give a cleaner output
-
get_commit
now returnsNone
ifgitpython
is not installed
0.1.2
- Update documentation for logger and ticktock
- Fix bug where seperator was not an argument to
Logger.__call__
0.1.0
- Include
DataStorage
- Logger can throw errors and handle seperators
- TickTock includes time handling and units
- Minor parser path changes
0.0.1
- Logger, Parser, and TickTock added from previous projects
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.