Chunk-based, multiprocess processing of iterables.

These details have not been verified by PyPI

Project links

Project description

multiprocess_chunks

Chunk-based, multiprocess processing of iterables. Uses the multiprocess package to perform the multiprocessization. Uses the cloudpickle to pickle hard-to-pickle objects.

Why is this useful?

When using the built-in Python multiprocessing.Pool.map method the items being iterated are individually pickled. This can lead to a lot of pickling which can negatively affect performance. This is particularly true, and not necessarily obvious, if extra data is passed into the f function via a lambda. For example:

from multiprocessing import Pool
d = {...} # a large dict of some sort
p.map(lamda x: x + d[x], [1, 2, 3, ...])

In this case both x and d are pickled, individually, for every item in [1, 2, 3, ...].

The methods in this package divide the [1, 2, 3, ...] list into chunks and pickle each chunk and d a small number of times.

Installation

pip install multiprocess-chunks

Usage

There are two methods to choose from: map_list_as_chunks and map_list_in_chunks.

map_list_as_chunks

This method divides the iterable that is passed to it into chunks. The chunks are then processed in multiprocess. It returns the mapped chunks.

Parameters: def map_list_as_chunks(l, f, extra_data, cpus=None, max_chunk_size=None)

l: The iterable to process in multiprocess.
f: The function that processes each chunk. It takes two parameters: - chunk, extra_data
extra_data: Data that is passed into f for each chunk.
cpus: The number of CPUs to use. If None the number of cores on the system will be used. This value decides how many chunks to create.
max_chunk_size: Limits the chunk size.

Example:

from multiprocess_chunks import map_list_as_chunks

l = range(0, 10)
f = lambda chunk, ed: [c * ed for c in chunk]
result = map_list_as_chunks(l, f, 5, 2)
# result = [ [0, 5, 10, 15, 20], [25, 30, 35, 40, 45] ]

map_list_in_chunks

This method divides the iterable that is passed to it into chunks. The chunks are then processed in multiprocess. It unwinds the processed chunks to return the processed items.

Parameters: def map_list_in_chunks(l, f, extra_data)

l: The iterable to process in multiprocess.
f: The function that processes each chunk. It takes two parameters: item, extra_data
extra_data: Data that is passed into f for each chunk.

Example:

from multiprocess_chunks import map_list_in_chunks

l = range(0, 10)
f = lambda item, ed: item * ed
result = map_list_in_chunks(l, f, 5)
# result = [0, 5, 10, 15, 20 25, 30, 35, 40, 45]

Essentially, map_list_in_chunks gives the same output as multiprocessing.Pool.map but, behind the scenes, it is chunking and being efficient about pickling.

A note on pickling

This package uses the pathos package to perform the multiprocessization and the cloudpickle package to perform pickling. This allows it to pickle objects that Python's built-in multiprocessing cannot.

Performance

How much better will your code perform? There are many factors at play here. The only way to know is to do your own timings.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Mar 5, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multiprocess_chunks-1.0.0.tar.gz (3.9 kB view details)

Uploaded Mar 5, 2020 Source

File details

Details for the file multiprocess_chunks-1.0.0.tar.gz.

File metadata

Download URL: multiprocess_chunks-1.0.0.tar.gz
Upload date: Mar 5, 2020
Size: 3.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for multiprocess_chunks-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`dfdb6e18979779340b7289c4c188168577218ccfd1f40b6ec841b1c4b328c3c6`
MD5	`d9cf53429420c9d18a711c9eb14a5187`
BLAKE2b-256	`99bc67af1aeab9efce27301e01b7e0fc94634d843c9a27f21ecfc12b97829201`

See more details on using hashes here.

multiprocess-chunks 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

multiprocess_chunks

Why is this useful?

Installation

Usage

map_list_as_chunks

map_list_in_chunks

A note on pickling

Performance

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes