Utility functions that are often useful
Project description
pelutils
Various utilities useful for python projects. Features include
- Feature-rich logger using
Rich
for colourful printing - Parsing for combining config files and command-line arguments - especially useful for parametric methods
- Time taking and profiling
- Easy to use data storage class for easy data saving and loading
- Table formatting
- Miscellaneous standalone functions providing various functionalities - see
pelutils/__init__.py
- Data-science submodule with extra utilities for statistics, plotting, and machine learning using
PyTorch
unique
function similar tonp.unique
but in linear time (currently Linux x86_64 only)
Parsing
A combination of parsing CLI and config file arguments which allows for a powerful, easy-to-use workflow. Useful for parametric methods such as machine learning.
A file main.py
could contain:
options = {
"learning-rate": { "default": 1.5e-3, "help": "Controls size of parameter update", "type": float },
"gamma": { "default": 1, "help": "Use of generator network in updating", "type": float },
"initialize-zeros": { "help": "Whether to initialize all parameters to 0", "action": "store_true" },
}
parser = Parser(options)
location = parser.location # Experiments are stored here
experiments = parser.parse()
parser.document_settings() # Save a config file to reproduce the experiment
This could then by run by
python main.py data/my-big-experiment --learning_rate 1e-5
or by
python main.py data/my-big-experiment --config cfg.ini
where cfg.ini
could contain
[DEFAULT]
gamma = 0.95
[RUN1]
learning-rate = 1e-4
initialize-zeros
[RUN2]
learning-rate = 1e-5
gamma = 0.9
Logging
Easy to use logger which fits common needs.
# Configure logger for the script
log.configure("path/to/save/log.log", "Title of log")
# Start logging
for i in range(70): # Nice
log("Execution %i" % i)
# Sections
log.section("New section in the logfile")
# Verbose logging for less important things
log.verbose("Will be logged")
with log.unverbose:
log.verbose("Will not be logged")
# Error handling
# This explicitly logs a ValueError and then raises it
log.throw(ValueError("Your value is bad, and you should feel bad"))
# The zero-division error is logged
with log.log_errors:
0 / 0
# User input
inp = log.input("WHAT... is your favourite colour? ")
# Log all logs from a function at the same time
# This is especially useful when using multiple threads so logging does not get mixed up
def fun():
log("Hello there")
log("General Kenobi!")
with mp.Pool() as p:
p.map(collect_logs(fun), args)
Time Taking and Profiling
Simple time taker inspired by Matlab Tic, Toc, which also has profiling tooling.
tt = TickTock()
tt.tick()
<some task>
seconds_used = tt.tock()
for i in range(100):
tt.profile("Repeated code")
<some task>
tt.profile("Subtask")
<some subtask>
tt.end_profile()
tt.end_profile()
print(tt) # Prints a table view of profiled code sections
Data Storage
A data class that saves/loads its fields from disk.
Anything that can be saved to a json
file will be.
Other data types will be saved to relevant file formats.
Currently, numpy
arrays is the only supported data type that is not saved to the json
file.
@dataclass
class Person(DataStorage):
name: str
age: int
numbers: np.ndarray
subfolder = "older"
json_name = "yoda.json"
yoda = Person(name="Yoda", age=900, numbers=np.array([69, 420]))
yoda.save("old")
# Saved data at old/older/yoda.json
# {
# "name": "Yoda",
# "age": 900
# }
# There will also be a file named numbers.npy
yoda = Person.load("old")
pelutils.ds
This submodule contains various utility functions for data science and machine learning. To make sure the necessary requirements are installed, install using
pip install pelutils[ds]
Note that in some terminals, you will instead have to write
pip install pelutils\[ds\]
PyTorch
All PyTorch functions work independently of whether CUDA is available or not.
# Clear CUDA cache and synchronize
reset_cuda()
# Inference only: No gradients should be tracked in the following function
# Same as putting entire function body inside with torch.no_grad()
@no_grad
def infer():
<code that includes feedforwarding>
# Feed forward in batches to prevent using too much memory
# Every time a memory allocation error is encountered, the number of batches is doubled
# Same as using y = net(x), but without risk of running out of memory
bff = BatchFeedForward(net, len(x))
y = bff(x)
# Change to another network
bff.update_net(net2)
Statistics
Includes various commonly used statistical functions.
# Get one sided z value for exponential(lambda=2) distribution with a significance level of 1 %
zval = z(alpha=0.01, two_sided=False, distribution=scipy.stats.expon(loc=1/2))
# Get correlation, confidence interval, and p value for two vectors
a, b = np.random.randn(100), np.random.randn(100)
r, lower_r, upper_r, p = corr_ci(a, b, alpha=0.01)
Matplotlib
Contains predefined rc params, colours, and figure sizes.
# Set wide figure size
plt.figure(figsize=figsize_wide)
# Use larger font for larger figures - works well with predefined figure sizes
update_rc_params(rc_params)
# 15 different, unique colours
c = iter(colours)
for i in range(15):
plt.plot(x[i], y[i], color=next(c))
History
0.5.4 - Breaking changes
-
Change log error colour
-
Replace default log level with print level that defaults to
Levels.INFO
__call__
now always defaults toLevels.INFO
-
Print microseconds as
us
instead ofmus
0.5.3
- Fixed missing regex requirement
0.5.2
- Allowed disabling printing by default in logger
0.5.1
- Fixed accidental rich formatting in logger
- Fixed logger crashing when not configured
0.5.0 - Breaking changes
- Added np.unique-style unique function to
ds
that runs in linear time but does not sort - Replaced verbose/non-verbose logging with logging levels similar to built-in
logging
module - Added
with_print
option tolog.__call__
- Undid change from 0.3.4 such that
None
is now logged again - Added
format
module. Currently supports tables - Updated stringification of profiles to include percentage of parent profile
- Added
throws
function that checks if a functions throws an exception of a specific type - Use
Rich
for printing to console when logging
0.4.1
- Added append mode to logger to append to old log files instead of overwriting
0.4.0
-
Added
ds
submodule for data science and machine learning utilitiesThis includes
PyTorch
utility functions, statistics, andmatplotlib
default values
0.3.4
- Logger now raises errors normally instead of using
throw
method
0.3.3
get_repo
now accepts a custom path search for repo as opposed to always using working dir
0.3.2
-
log.input
now also accepts iterables as inputFor such inputs, it will return a generator of user inputs
0.3.1 - Breaking changes
-
Added functionality to logger for logging repository commit
-
Removed function
get_commit
-
Added function
get_repo
which returns repository path and commitIt attempts to find a repository by searching from working directory and upwards
-
Updates to examples in
README
and other minor documentation changes -
set_seeds
no longer returns seed, as this is already given as input to the function
0.3.0 - Breaking changes
-
Only works for Python 3.7+
-
If logger has not been configured, it now does no logging instead of crashing
This prevents dependecies that use the logger to crash the program if it is not used
-
log.throw
now also logs the actual error rather than just the stack trace -
log
now has public propertyis_verbose
-
Fixed
with log.log_errors
always throwing errors -
Added code samples to
README
-
Parser
no longer automatically determines if experiments should be placed in subfoldersInstead, this is given explicitly as an argument to
__init__
It also supports boolean flags in the config file
0.2.13
- Readd clean method to logger
0.2.12 - Breaking changes
-
The logger is now solely a global variable
Different loggers are handled internally in the global _Logger instance
0.2.11
- Add catch property to logger to allow automatically logging errors with with
- All code is now indented using spaces
0.2.10
- Allow finer verbosity control in logger
- Allow multiple log commands to be collected and logged at the same time
- Add decorator for aforementioned feature
- Change thousand_seps from TickTock method to stand-alone function in
__init__
- Verbose logging now has same signature as normal logging
0.2.8
- Add code to execute code with specific environment variables
0.2.7
-
Fix error where the full stacktrace was not printed by log.throw
-
set_seeds now checks if torch is available
This means torch seeds are still set without needing it as a dependency
0.2.6 - Breaking changes
- Make Unverbose class private and update documentation
- Update formatting when using .input
0.2.5
- Add input method to logger
0.2.4
- Better logging of errors
0.2.1 - Breaking changes
- Removed torch as dependency
0.2.0 - Breaking changes
-
Logger is now a global variable
Logging should happen by importing the log variable and calling .configure to set it up
To reset the logger,
.clean
can be called -
It is still possible to just import
Logger
and use it in the traditional way, though.configure
should be called first -
Changed timestamp function to give a cleaner output
-
get_commit
now returnsNone
ifgitpython
is not installed
0.1.2
- Update documentation for logger and ticktock
- Fix bug where seperator was not an argument to Logger.call
0.1.0
- Include DataStorage
- Logger can throw errors and handle seperators
- TickTock includes time handling and units
- Minor parser path changes
0.0.1
- Logger, Parser, TickTock added from previous projects
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.