Skip to main content

Allows to construct a pipeline of functions to be applied independently on the groups of a groupby object.

Project description

gcGroupbyExtension: chain .apply methods on Pandas groupby object


The extension is available on PyPi

pip install gcGroupbyExtension-gcalmettes

(Or if you do not want to install the package in your python distribution, just download this repo and place the gcGroupbyExtension folder in the folder you're running your python script/notebook in.)


Once installed, the extension can be imported via:

import gcGroupbyExtension

What problems does this extension try to solve?

Pandas provides both the .pipe and .apply methods to work on its groupby object. The main difference between .pipe and .apply in the groupby context is that you have access to the entire scope of the groupby object (each group) with .pipe, while you only have access to the subcomponents scope (in the context of a groupby the subcomponents are slices of the dataframe that called groupby where each slice is a dataframe itself. This is analogous for a series groupby.)

  1. The .pipe method can be chained, while the .apply method can't.
  2. You can use the .agg method to limit the application of the functions on particular columns of the groups, but it is cumbersome to apply specific functions independantly on only a selection of the groups.
  3. There is no easy way to construct independant pipelines of functions for each group.

This extension provides this capability.

What does this extension actually do?

This extension allows to construct a pipeline of functions to be applied independently on the groups of a groupby object. The functions/transformations to be applied can be the same for all the groups or scoped to (a) specific group(s).

Details: This library registers a custom accessor on pandas DataFrame and Series objects. The methods of this extension are registered under the gc namespace.

See the DEMO notebook for details.

Care to show the syntax?

Sure! See the DEMO notebook for more details, but basically, you can do things like this:

    .resetIndex() # this is a special method baked in
    .apply(lambda x: x * 5, lambda x: x + x.iloc[3]) # accepts multiple functions
    .apply(mySpecialFunction, onlyGroups=['group1']) # limit the function to specific group(s)
    .apply(lambda x: x - x.mean(), ignoreGroups=['group4', 'group6']) # limit the function to specific group(s)
    .apply(lambda x: x.std(axis=1))
    .concat(axis=0, multiIndex=None).plot()

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for gcGroupbyExtension-gcalmettes, version 0.0.3
Filename, size File type Python version Upload date Hashes
Filename, size gcGroupbyExtension_gcalmettes-0.0.3-py3-none-any.whl (6.3 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size gcGroupbyExtension-gcalmettes-0.0.3.tar.gz (5.0 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page