SciTrack provides basic logging capabilities to track scientific computations.
Project description
About scitrack
One of the critical challenges in scientific analysis is to track all the elements involved. This includes the arguments provided to a specific application (including default values), input data files referenced by those arguments and output data generated by the application. In addition to this, tracking a minimal set of system specific information.
scitrack is a simple package aimed at researchers writing scripts, or more substantial scientific software, to support the tracking of scientific computation. The package provides elementary functionality to support logging. The primary capabilities concern generating checksums on input and output files and facilitating logging of the computational environment.
To see some projects using scitrack, see the “Used by” link at the top of the project GitHub page.
Installing
For the released version:
$ pip install scitrack
For the very latest version:
$ pip install git+https://github.com/HuttleyLab/scitrack
Or clone it:
$ git clone git@github.com:HuttleyLab/scitrack.git
And then install:
$ pip install ~/path/to/scitrack
CachingLogger
There is a single object provided by scitrack, CachingLogger. This object is basically a wrapper around the Python standard library logging module. On invocation, CachingLogger captures basic information regarding the system and the command line call that was made to invoke the application.
In addition, the class provides convenience methods for logging both the path and the md5 hexdigest checksum [1] of input/output files. A method is also provided for producing checksums of text data. The latter is useful for the case when data are from a stream or a database, for instance.
All logging calls are cached until a path for a logfile is provided. The logger can also, optionally, create directories.
Simple instantiation of the logger
Creating the logger. Setting create_dir=True means on creation of the logfile, the directory path will be created also.
from scitrack import CachingLogger
LOGGER = CachingLogger(create_dir=True)
LOGGER.log_file_path = "somedir/some_path.log"
The last assignment triggers creation of somedir/some_path.log.
Capturing a programs arguments and options
scitrack will write the contents of sys.argv to the log file, prefixed by command_string. However, this only captures arguments specified on the command line. Tracking the value of optional arguments not specified, which may have default values, is critical to tracking the full command set. Doing this is now easy with the simple statement LOGGER.log_args(). The logger can also record the versions of named dependencies.
Here’s one approach to incorporating scitrack into a command line application built using the click command line interface library. Below we create a simple click app and capture the required and optional argument values.
import click
from scitrack import CachingLogger
LOGGER = CachingLogger()
@click.command()
@click.option("-i", "--infile", type=click.Path(exists=True))
@click.option("-t", "--test", is_flag=True, help="Run test.")
def main(infile, test):
# capture the local variables, at this point just provided arguments
LOGGER.log_args()
LOGGER.log_versions("numpy")
LOGGER.input_file(infile)
LOGGER.log_file_path = "some_path.log"
if __name__ == "__main__":
main()
The CachingLogger.write() method takes a message and a label. All other logging methods wrap log_message(), providing a specific label. For instance, the method input_file() writes out two lines in the log.
input_file_path, the absolute path to the intput file
input_file_path md5sum, the hex digest of the file
output_file() behaves analogously. An additional method text_data() is useful for other data input/output sources (e.g. records from a database). For this to have value for arbitrary data types requires a systematic approach to ensuring the text conversion is robust across platforms.
The log_args() method captures all local variables within a scope.
The log_versions() method captures versions for the current file and that of a list of named packages, e.g. LOGGER.log_versions(['numpy', 'sklearn']).
Some sample output
2020-05-25 13:32:07 Eratosthenes:98447 INFO system_details : system=Darwin Kernel Version 19.4.0: Wed Mar 4 22:28:40 PST 2020; root:xnu-6153.101.6~15/RELEASE_X86_64 2020-05-25 13:32:07 Eratosthenes:98447 INFO python : 3.8.2 2020-05-25 13:32:07 Eratosthenes:98447 INFO user : gavin 2020-05-25 13:32:07 Eratosthenes:98447 INFO command_string : ./demo.py -i /Users/gavin/repos/SciTrack/tests/sample-lf.fasta 2020-05-25 13:32:07 Eratosthenes:98447 INFO params : {'infile': '/Users/gavin/repos/SciTrack/tests/sample-lf.fasta', 'test': False} 2020-05-25 13:32:07 Eratosthenes:98447 INFO version : __main__==None 2020-05-25 13:32:07 Eratosthenes:98447 INFO version : numpy==1.18.4 2020-05-25 13:32:07 Eratosthenes:98447 INFO input_file_path : /Users/gavin/repos/SciTrack/tests/sample-lf.fasta 2020-05-25 13:32:07 Eratosthenes:98447 INFO input_file_path md5sum : 96eb2c2632bae19eb65ea9224aaafdad
Other useful functions
Two other useful functions are get_file_hexdigest and get_text_hexdigest.
Reporting issues
Use the project issue tracker.
For Developers
We use flit for package building. Having cloned the repository onto your machine. Install flit:
$ python3 -m pip install flit
Do a developer install of scitrack using flit as:
$ cd path/to/cloned/repo $ flit install -s --python `which python`
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scitrack-2021.5.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81d38bd9d143c0041ec94e976cb41dd21662365ce0d6a749b479f07f25b0fdef |
|
MD5 | e3b3f2f0317695c791103433c9ecf688 |
|
BLAKE2b-256 | 0be2854ae52b107aaf67b239d880d3cba40cb7496ed4b803e8cc5518447e0794 |