Skip to main content

Extends Pandas to run apply methods for dataframe, series and groups on multiple cores at same time.

Project description

MultiprocessPandas

MultiprocessPandas package extends functionality of Pandas to easily run operations on multiple cores i.e. parallelize the operations. The current version of the package provides capability to parallelize apply() methods on DataFrames, Series and DataFrameGroupBy .

Importing the applyparallel module will add apply_parallel() method to DataFrame, Series and DataFrameGroupBy, which will allow to run operation on multiple cores.

Installation

The source code is currently hosted on GitHub at: https://github.com/akhtarshahnawaz/multiprocesspandas. The package can be build from the source from GitHub or can be installed from PyPi directly.

To install using pip

pip install multiprocesspandas

Setting up the Library

To use the library, you have to import applyparallel module. Import will attach required methods to the pandas, and you can call them directly on Pandas data objects.

from multiprocesspandas import applyparallel

Usage

Once imported, the library adds functionality to call apply_parallel() method on your DataFrame, Series or DataFrameGroupBy . The methods accepts a function that has to be applied, and two named arguments:

  • static_data (External Data required by passed function, defaults to None)
  • num_processes (Defaults to maximum available cores on your CPU)
  • axis (Only for DataFrames, defaults to 0 i.e. rows. For columns, set axis=1.

Note: Any extra module required by passed function must be re-imported again inside the function.

Usage with DataFrameGroupBy

def func(x):
    import pandas as pd
    return pd.Series([x['C'].mean()])

df.groupby(["A","B"]).apply_parallel(func, num_processes=30)

If you need some external data inside func(), it has to be passed and received as named argument static_data. If there is more that one external data that is needed, then static_data can be a list of all required data, and can be accessed inside func by indexing.

data1 = pd.Series([1,2,3])
data2 = 20

def func(x, static_data):
    import pandas as pd
    output = static_data[0] - x['C'].mean()
    return output * static_data[1]

df.groupby(["A","B"]).apply_parallel(func, num_processes=30, static_data=[data1, data2])

Usage with DataFrame

Usage with DataFrames is very similar to the usage with DataFrameGroupBy, except that you can pass an extra axis argument which tells whether to apply function on rows or columns.

def func(x):
    return x.mean()

df.apply_parallel(func, num_processes=30, axis=1)

External data can be passed in similar way

data = pd.Series([1,2,3])

def func(x, static_data):
    return static_data.sum() + x.mean()

df.apply_parallel(func, num_processes=30, static_data=data)

Usage with Series

Usage with Series is very similar to the usage with DataFrames and DataFrameGroupBy.

data = pd.Series([1,2,3])

def func(x, static_data):
    return static_data-x

series.apply_parallel(func, num_processes=30, static_data=data)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multiprocesspandas-1.1.2.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

multiprocesspandas-1.1.2-py3-none-any.whl (4.8 kB view details)

Uploaded Python 3

File details

Details for the file multiprocesspandas-1.1.2.tar.gz.

File metadata

  • Download URL: multiprocesspandas-1.1.2.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.5.0.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for multiprocesspandas-1.1.2.tar.gz
Algorithm Hash digest
SHA256 1bb93714983d2f50a30672768dde6ffa22a590ea66a8ed34114e48038fa2fa81
MD5 fcc94f1a68a16343cfa33cbd4909f3c3
BLAKE2b-256 664c02825f7fc20c5ecb2911d2fb114c76b478e2f3e40d971353e95fbc3faba6

See more details on using hashes here.

File details

Details for the file multiprocesspandas-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: multiprocesspandas-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 4.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.5.0.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for multiprocesspandas-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a44f2d88637b49c91b57e3fd2b8fb4038252fc746cc201b41d92e6487c00c162
MD5 0a941a923a96977ca7582783adaac04d
BLAKE2b-256 64b182e75e39d6ee6d0675dd7c11171329c7d69f383b8a972a46ba012901e143

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page