Skip to main content

Simple and interactive multi-processing for Python and notebooks

Project description

BusyBee 🐝

  • Simple, interactive multiprocessing for slow I/O and calculation in your notebook
  • Simple to use as a drop-in replacement for the standard map function
  • Prints the current progress and remaining time estimate
  • No external dependencies and 100% test coverage

The PyPI project page is here: https://pypi.org/project/busybee/

CircleCI

Quick start

Install the BusyBee module via pip and use it as a replacement for your current map function. As BusyBee needs to know to total number of items the data must expose its length to len() calls. The best approach is to provide it as a list.

Install:

$ pip3 install busybee

Code:

import busybee
result = busybee.map(func, data)

Output:

BusyBee: Start processing 42 items with 8 processes...
BusyBee:  1/42,  2.4% (avg: 3.2s cpu, rem: 16.5s)
BusyBee: 15/42, 35.7% (avg: 2.4s cpu, rem: 8.1s)
BusyBee: 21/42, 50.0% (avg: 2.5s cpu, rem: 6.5s)
BusyBee: 24/42, 57.1% (avg: 2.6s cpu, rem: 5.8s)
BusyBee: 34/42, 81.0% (avg: 2.5s cpu, rem: 2.5s)
BusyBee: Finished processing 42 items in 16.1s (avg: 2.6s cpu)

Advanced usage 👩‍💻 👨‍💻

You can configure the amount of cores to be used using the processes argument. For this you can either provide a number (e.g. 1, 8) or a simple formula such as n/2 or n-1. The n refers to the logical number of CPU cores returned by the multiprocessing module.

Further, you can configure the output by providing a custom stdout sink and configuring how often you want to receive an update. You can do so by using the arguments update_every_n_seconds (default: 10) and update_every_n_percent (default: 50).

If you do not want any output, just set quiet=False.

Example:

import busybee
result = busybee.map(
    func, data
    processes='n-1',
    update_every_n_seconds=10,
    update_every_n_percent=25,
)

Q&A 🤔

Why did you built it? And why shouldn't I just use the multiprocessing module?

I started building BusyBee when I was working with a lot of I/O and pre-processing in Python Notebooks. Parallelizing these cells made it much faster, but it was often more involved than a one-line change. For instance, the worker pool needs manual clean-up and leaks a substantial amount of memory otherwise.

Also, it was hard to predict the remaining time and whether it was worth to avoid context switching or actually making some tea/coffee ☕.

Is there more than the map(func, data, ...) function?

Yes! While the map function is the most universal, there are more: The filter(func, data, ...) functions works similar to the regular filter function applied on lists. The mk_dict(func, keys, ...) function resembles the dictionary compression syntax {k: func(k) for k in keys).

I want a different output!

I want to allow choosing from certain output styles. This is on my roadmap, but I do not have any certain date in mind. To maintain the simplicity I do not envision supporting custom output formatting. However, I am happy to be convinced otherwise.

Contribute 👋

Awesome that you are interested in improving this code! When contributing, please follow the following (common-sense) steps:

  • Create an issue before you write any code. This allows to guide you in the right direction.
    • If you are after a simple 1-5 line fix, you might ignore this.
  • In the pull-request explain the high-level goal and your approach. That provides valuable context.
  • Convince others (and yourself) that the change is safe and sound.
    • Run python3 -m unittest tests/test* after you added test cases for your changes
    • Run coverage3 run --source busybee setup.py test && coverage3 report to ensure that the code is actually fully covered

Reference/BibTex 📚

If you want to reference BusyBee in documentation or articles, feel free to use this suggested BibTex snippet:

@misc{hugenroth2020busybee,
  author={{Daniel Hugenroth}},
  title={BusyBee Python Software Library},
  year={2020},
  url={https://github.com/lambdapioneer/busybee},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

busybee-1.0.0.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

busybee-1.0.0-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file busybee-1.0.0.tar.gz.

File metadata

  • Download URL: busybee-1.0.0.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.5

File hashes

Hashes for busybee-1.0.0.tar.gz
Algorithm Hash digest
SHA256 56c0f336802184a36e6887a1111e0ac6f0ad152ce5cf49069d3e1802f3fea8c4
MD5 46f8af455059e1fcaad8ce38fb952455
BLAKE2b-256 02c620f83fcc8d3123ce38f41d491fc1c53dc3d615a1df67ffe752f1c2507879

See more details on using hashes here.

File details

Details for the file busybee-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: busybee-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.5

File hashes

Hashes for busybee-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fe5140a1d90496596767b2367562572d2f1cbfc65068f2577af72852cb39a40d
MD5 2420230119a9b06303283c2b9c620b99
BLAKE2b-256 7ebf9c6192f0b11134225b94f18bf6639d1824165861d3a7f8fa5cc70e3f0c7c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page