Skip to main content

A utility that wraps an Iterable and regularly prints out progress on the processing of that Iterable

Project description

progress_tracker is an easy and flexible way to print custom progress messages while processing streams of events on the CLI.

It was originally developed at exactEarth Ltd . See this presentation to DevHouse Waterloo for the original motivation.

Built and tested with Python 3.6+

Quick Start

% pip install progress_tracker
>>> from progress_tracker import track_progress
>>> for _ in track_progress(list(range(1000)), every_n_records=100):
...     continue
...
100/1000 (10.0%) in 0:00:00.000114 (Time left: 0:00:00.001026)
200/1000 (20.0%) in 0:00:00.000274 (Time left: 0:00:00.001096)
300/1000 (30.0%) in 0:00:00.000374 (Time left: 0:00:00.000873)
400/1000 (40.0%) in 0:00:00.000473 (Time left: 0:00:00.000710)
500/1000 (50.0%) in 0:00:00.000572 (Time left: 0:00:00.000572)
600/1000 (60.0%) in 0:00:00.000671 (Time left: 0:00:00.000447)
700/1000 (70.0%) in 0:00:00.000770 (Time left: 0:00:00.000330)
800/1000 (80.0%) in 0:00:00.000868 (Time left: 0:00:00.000217)
900/1000 (90.0%) in 0:00:00.000979 (Time left: 0:00:00.000109)
1000 in 0:00:00.001086

Usage

progress_tracker is very customizable to fit your desires, but tries to have sensible defaults.

The core of progress_tracker is a method called track_progress. By changing the parameters passed to track_progress, you can customize how frequently (and with what messages) the tracker will report.

def track_progress(
    iterable: Iterable[T], # The iterable to iterate over
    total: Optional[int] = None, # Override for the total message count, defaults to len(iterable)
    callback: Callable[[str], Any] = print, # A function (f(str) -> None) that gets called each time a condition matches
    format_callback: Callable[[Dict[str, Any], Set[str]], str] = default_format_callback, # A function (f(str) -> str) that formats the progress values into a string.
    every_n_percent: Optional[float] = None, # Reports after every n percent
    every_n_records: Optional[int] = None, # Reports every n records
    every_n_seconds: Optional[float] = None, # Reports every n seconds
    every_n_seconds_idle: Optional[float] = None, # Report if there has not been a record processed in the past n seconds. Useful for infinite streams.
    every_n_seconds_since_report: Optional[float] = None, # Report if there hasn’t been any report in the past n seconds.
    report_first_record: bool = False, # Report after the first record
    report_last_record: bool = False # Report after the last record
    ) -> None

Examples

Combining trigger conditions

As seen in the previous example, you can combine multiple conditions together to dictate when a report is created.

Each of the conditions are combined using an OR operator, meaning that if any condition is met, a report is created.

Even if multiple conditions are met simultaneously, only a single report will be created.

Report Creation Invariants

Report creation observes two invariants:

  1. At most a single report is created per processed record.
  2. Reports are only created in response to a record being processed.

Customizing the report formatting / Internationalization

By default, progress_tracker formats the report into an English language string. This can be overriden by supplying a different function as the format_callback parameter to track_progress.

This can be used to perform advanced formatting, or to add internationalization/localization.

def format_en_francais(report: Dict[str, Any], reasons: Set[str]):
    i = report["i"]
    total = report["total"]
    if total is None or i == total:
        format_string = "{i} messages traités en {time_taken}"
    else:
        format_string = "{i}/{total} messages traités en {time_taken} (temps restant: {estimated_time_remaining})"
    return format_string.format(**report)

for poste in track_progress(postes, every_n_records=100, format_callback=format_en_francais):
    traité(poste)

(Veuillez excuser toute erreur en français. C’est le résultat de Google Translate.)

Simple cases can also be done using a lambda:

>>> from progress_tracker import track_progress
>>>
>>> for _ in track_progress(list(range(5)), every_n_records=1, format_callback=lambda **kwargs: "Got one!"):
...     continue
...
Got one!
Got one!
Got one!
Got one!
Got one!

Report values available

The following values are available in every report for use in the format_callback:

Value Type Meaning
{records_seen} int The number of records processed so far.
{total} Optional[int] The total of records in the iterable, if known. Else None
{percent_complete} Optional[float] The percentage of records processed so far. None if {total} is None or records_seen = 0
{time_taken} timedelta The amount of time that processing has taken thus far.
{estimated_time_remaining} Optional[timedelta] The estimated amount of time needed in order to process the rest of the records (simple linear estimate). None if total is None
{items_per_second} Optional[float] The number of records processed so far / the number of seconds elapsed. None if no time have elapsed.
{idle_time} timedelta The amount of idle time between the previous record’s processing and this record’s arrival.

Customizing the print behaviour

By default, progress_tracker calls Python’s print function with the formatted report. This can be overriden by supplying a different function as the callback parameter to track_progress.

every_n_seconds_idle

every_n_seconds_idle allows you to trigger a report if there is ever more than n seconds when no records were processed.

Note: If processing of a single record takes longer than every_n_seconds_idle, then it will be triggered after every record.

Difference between every_n_seconds and every_n_seconds_idle

  • every_n_seconds triggers a report anytime it has been more than n seconds since every_n_seconds last triggered a report.
  • every_n_seconds_idle triggers a report anytime there has not been a record processed in the past n seconds (ie. the processing has been idle).

For example:

After # of records processed in interval Cummulative records processed every_n_seconds=3 every_n_seconds_idle=3
0 seconds 0 0    
1 second 1 1    
2 seconds 1 2    
3 seconds 1 3 Triggered, since it is the first record T >= 3s (T >= 0s + 3s)  
4 seconds 1 4    
5 seconds 1 5    
6 seconds 1 6 Triggered, since it is the first record T >= 6s (T >= 3s + 3s)  
7 seconds 1 6    
8 seconds 0 6    
9 seconds 0 6    
10 seconds 0 6    
11 seconds 1 7 Triggered, since it is the first record T >= 9s (T >= 6s + 3s) Triggered, since it is the first record processed in the past 3 seconds (T >= 6s + 3s)
12 seconds 1 8    
13 seconds 1 9    
14 seconds 1 10 Triggered, since it is the first record T >= 14s (T >= 11s + 3s)  
15 seconds 1 11    

Note that every_n_seconds reports at 3 seconds and 6 seconds, as one would expect. Then it reports at 11 seconds, since that is the first time a record was processed after the 9 seconds mark. Then note that instead of next reporting at 12 seconds (9s + 3s), it reports next at 14 seconds (11s + 3s).

every_n_seconds_idle only reported at 11 seconds, since that was the only time that a record was processed without other records being processed during the previous 3 seconds.

Accessing tracker after processing

By default, track_progress hides the internal ProgressTracker object underneath. However, in some cases you might want to be able to access the internals of the object after iteration. In these cases, you can use track_progress an explicit context manager:

with track_progress(range(0, 101), every_n_percent=5) as tracker:
    for item in tracker:
        process(item)
    final_report = tracker.create_report()
    print(f"Processing took {final_report['time_taken']} and processed {final_report['records_seen']} records.")

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
progress_tracker-1.0.0-py3-none-any.whl (8.7 kB) Copy SHA256 hash SHA256 Wheel py3
progress_tracker-1.0.0.tar.gz (10.2 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page