Sensible multi-core apply/map/applymap functions for Pandas
Project description
mapply
mapply
provides sensible multi-core apply/map/applymap functions for Pandas.
mapply vs. pandarallel vs. swifter
Where pandarallel
only requires dill
(and therefore has to rely on in-house multiprocessing and progressbars), swifter
relies on the heavy dask
framework, converting to Dask DataFrames and back. In an attempt to find the golden mean, mapply
is highly customizable and remains lightweight, leveraging the powerful pathos
framework, which shadows Python's built-in multiprocessing module using dill
for universal pickling.
Installation
This pure-Python, OS independent package is available on PyPI:
$ pip install mapply
Usage
For documentation, see mapply.readthedocs.io.
import pandas as pd
import mapply
mapply.init(
n_workers=-1,
chunk_size=100,
max_chunks_per_worker=8,
progressbar=False
)
df = pd.DataFrame({"a": list(range(100))})
# avoid unnecessary multiprocessing:
# due to chunk_size=100, this will act as regular apply.
# set chunk_size=1 to skip this check and let max_chunks_per_worker decide.
df["squared"] = df.mapply(lambda x: x ** 2)
Development
Run make help
for options like installing for development, linting, testing, and building docs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mapply-0.1.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46e1e3469975c3b6b2bd18ac0ed0ed59d2d321d269c47d4c2698321f6ccc7b0e |
|
MD5 | f35a72542b126bbe60a827114b432421 |
|
BLAKE2b-256 | b5fd3227dd7bd11b73e4cbfbf82794f6200a060cc27eb31ced4997bd8c3f1af3 |