Simple and interactive multi-processing for Python and notebooks
Project description
BusyBee 🐝
- Simple, interactive multiprocessing for slow I/O and calculation in your notebook
- Simple to use as a drop-in replacement for the standard
map
function - Prints the current progress and remaining time estimate
- No external dependencies and 100% test coverage
The PyPI project page is here: https://pypi.org/project/busybee/
Quick start
Install the BusyBee module via pip
and use it as a replacement for your current map
function. As BusyBee needs to know to total number of items the data must expose its length to len()
calls. The best approach is to provide it as a list
.
Install:
$ pip3 install busybee
Code:
import busybee
result = busybee.map(func, data)
Output:
BusyBee: Start processing 42 items with 8 processes...
BusyBee: 1/42, 2.4% (avg: 3.2s cpu, rem: 16.5s)
BusyBee: 15/42, 35.7% (avg: 2.4s cpu, rem: 8.1s)
BusyBee: 21/42, 50.0% (avg: 2.5s cpu, rem: 6.5s)
BusyBee: 24/42, 57.1% (avg: 2.6s cpu, rem: 5.8s)
BusyBee: 34/42, 81.0% (avg: 2.5s cpu, rem: 2.5s)
BusyBee: Finished processing 42 items in 16.1s (avg: 2.6s cpu)
Advanced usage 👩💻 👨💻
You can configure the amount of cores to be used using the processes
argument. For this you can either provide a number (e.g. 1, 8) or a simple formula such as n/2
or n-1
. The n
refers to the logical number of CPU cores returned by the multiprocessing
module.
Further, you can configure the output by providing a custom stdout
sink and configuring how often you want to receive an update. You can do so by using the arguments update_every_n_seconds
(default: 10) and update_every_n_percent
(default: 50).
If you do not want any output, just set quiet=False
.
Example:
import busybee
result = busybee.map(
func, data
processes='n-1',
update_every_n_seconds=10,
update_every_n_percent=25,
)
Q&A 🤔
Why did you built it? And why shouldn't I just use the multiprocessing
module?
I started building BusyBee when I was working with a lot of I/O and pre-processing in Python Notebooks. Parallelizing these cells made it much faster, but it was often more involved than a one-line change. For instance, the worker pool needs manual clean-up and leaks a substantial amount of memory otherwise.
Also, it was hard to predict the remaining time and whether it was worth to avoid context switching or actually making some tea/coffee ☕.
Is there more than the map(func, data, ...)
function?
Yes! While the map
function is the most universal, there are more: The filter(func, data, ...)
functions works similar to the regular filter
function applied on lists. The mk_dict(func, keys, ...)
function resembles the dictionary compression syntax {k: func(k) for k in keys)
.
I want a different output!
I want to allow choosing from certain output styles. This is on my roadmap, but I do not have any certain date in mind. To maintain the simplicity I do not envision supporting custom output formatting. However, I am happy to be convinced otherwise.
Contribute 👋
Awesome that you are interested in improving this code! When contributing, please follow the following (common-sense) steps:
- Create an issue before you write any code. This allows to guide you in the right direction.
- If you are after a simple 1-5 line fix, you might ignore this.
- In the pull-request explain the high-level goal and your approach. That provides valuable context.
- Convince others (and yourself) that the change is safe and sound.
- Run
python3 -m unittest tests/test*
after you added test cases for your changes - Run
coverage3 run --source busybee setup.py test && coverage3 report
to ensure that the code is actually fully covered
- Run
Reference/BibTex 📚
If you want to reference BusyBee in documentation or articles, feel free to use this suggested BibTex snippet:
@misc{hugenroth2020busybee,
author={{Daniel Hugenroth}},
title={BusyBee Python Software Library},
year={2020},
url={https://github.com/lambdapioneer/busybee},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file busybee-1.0.0.tar.gz
.
File metadata
- Download URL: busybee-1.0.0.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56c0f336802184a36e6887a1111e0ac6f0ad152ce5cf49069d3e1802f3fea8c4 |
|
MD5 | 46f8af455059e1fcaad8ce38fb952455 |
|
BLAKE2b-256 | 02c620f83fcc8d3123ce38f41d491fc1c53dc3d615a1df67ffe752f1c2507879 |
File details
Details for the file busybee-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: busybee-1.0.0-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe5140a1d90496596767b2367562572d2f1cbfc65068f2577af72852cb39a40d |
|
MD5 | 2420230119a9b06303283c2b9c620b99 |
|
BLAKE2b-256 | 7ebf9c6192f0b11134225b94f18bf6639d1824165861d3a7f8fa5cc70e3f0c7c |