long_task_printer

Measure iterations in a long-running task, provide a time estimation

These details have not been verified by PyPI

Project description

This is a python rewrite of my ruby tool progressor, so some of the code and examples might not feel pythonic. It may change over time as I use it more.

Basic example

Here's an example long-running task:

for product in fetch_products():
    if product.not_something_we_want_to_process():
        continue

    product.calculate_interesting_stats()

In order to understand how it's progressing, we might add some print statements:

for product in fetch_products():
    if product.not_something_we_want_to_process():
        print(f"Skipping product: {product.id}")
        continue

    print(f"Working on product: {product.id}")
    product.calculate_interesting_stats()

This gives us some indication of progress, but no idea how much time is left. We could take a count and maintain a manual index, and then eyeball it based on how fast the numbers are adding up. LongTask automates that process:

long_task = LongTask(total_count=Product.count())

for product in fetch_products():
    if product.not_something_we_want_to_process():
        long_task.skip(1)
        continue

    with long_task.measure() as progress:
        print(f"[{progress} Working on product: {product.id}")
        product.calculate_interesting_stats()

Each invocation of measure measures how long its block took and records it. The progress parameter is an object that can be converted to a string to provide progress information.

The output might look like this:

...
[0038/1000, (004%), t/i: 0.5s, ETA: 8m:00s] Product 38
[0039/1000, (004%), t/i: 0.5s, ETA: 7m:58s] Product 39
[0040/1000, (004%), t/i: 0.5s, ETA: 7m:57s] Product 40
...

You can check the documentation for the LongTask class for details on the methods you can call to get the individual pieces of data shown in the report.

Limited and unlimited sequences

Initializing a LongTask with a provided total_count= parameter gives you a limited sequence, which can give you not only a progress report, but an estimation of when it'll be done:

[<current loop>/<total count>, (<progress>%), t/i: <time per iteration>, ETA: <time until it's done>]

The calculation is done by maintaining a list of measurements with a limited size, and a list of averages of those measurements. The average of averages is the "time per iteration" and it's multiplied by the remaining count to produce the estimation.

I can't really say how reliable this is, but it seems to provide smoothly changing estimations that seem more or less correct to me, for similarly-sized chunks of work per iteration.

Not providing a total_count= parameter leads to less available information:

long_task = LongTask()

for _ in range(100):
    with long_task.measure() as progress:
        print(progress)
        time.sleep(random.random())

A sample of output might look like this:

...
11, t: 5.32s, t/i: 442.39ms
12, t: 5.58s, t/i: 446.11ms
...

The format is:

<current>, t: <time from start>, t/i: <time per iteration>

Configuration

Apart from total_count, which is optional and affects the kind of sequence that will be stored, you can provide min_samples and max_samples. You can also provide a custom formatter:

long_task = LongTask(
    total_count=1000,
    min_samples=5,
    max_samples=10,
    formatter=lambda p: p.eta()
)

The option min_samples determines how many loops the tool will wait until trying to produce an estimation. A higher number means no information in the beginning, but no wild fluctuations, either. It needs to be at least 1 and the default is 1.

The option max_samples is how many measurements will be retained. Those measurements will be averaged, and then those averages averaged to get a time-per-iteration estimate. A smaller number means giving more weight to later events, while a larger one would average over a larger amount of samples. The default is 100.

The formatter is a callback that gets a progress object as an argument and you can return your own string to output on every loop. Check LimitedSequence and UnlimitedSequence for the available methods and accessors you can use.

Related work

This project is based on my ruby tool progressor. In terms of other python libraries, there seem to be a few:

ProgressPrinter shows an animated progress bar, which likely means it moves the cursor on top when outputting its progress. This project just gives you a string to print, which could sit in logs, inbetween other output, etc.
longtask gives you a different interface where you create a separate class to encapsulate your long-running task.
progressor uses scikit to estimate the remaining time and shows an animated progress bar

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.2

Dec 12, 2024

0.1.1

Dec 12, 2024

This version

0.1.0

Aug 11, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

long_task_printer-0.1.0.tar.gz (6.2 kB view details)

Uploaded Aug 11, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

long_task_printer-0.1.0-py3-none-any.whl (8.7 kB view details)

Uploaded Aug 11, 2024 Python 3

File details

Details for the file long_task_printer-0.1.0.tar.gz.

File metadata

Download URL: long_task_printer-0.1.0.tar.gz
Upload date: Aug 11, 2024
Size: 6.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.12.4 Linux/6.9.9-arch1-1

File hashes

Hashes for long_task_printer-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`debc13d30816240971b41766a229e47a9848fc512fc3715ddf39657548b46311`
MD5	`8f5d181f2fd7d6b2f552b0e672529e2e`
BLAKE2b-256	`94a022a692e9dd3ab2d0091cceaa9f13dcf0e8852ca077cc124ef4ca019ec68a`

See more details on using hashes here.

File details

Details for the file long_task_printer-0.1.0-py3-none-any.whl.

File metadata

Download URL: long_task_printer-0.1.0-py3-none-any.whl
Upload date: Aug 11, 2024
Size: 8.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.12.4 Linux/6.9.9-arch1-1

File hashes

Hashes for long_task_printer-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`75ea328aae06ede2b6f80e8e5e96def9282ae7dea9992fc4a2b40530a7c03594`
MD5	`ae203282d299615982f4a291f612c0e9`
BLAKE2b-256	`29f3412f8c322a249dd2c9d23ba15e7ed8f626d2690513a8e01cd4d207b056ba`

See more details on using hashes here.

long_task_printer 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Basic example

Limited and unlimited sequences

Configuration

Related work

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes