Skip to main content

Allows to construct a pipeline of functions to be applied independently on the groups of a groupby object.

Project description

gcGroupbyExtension: chain .apply methods on Pandas groupby object

Install

The extension is available on PyPi

pip install gcGroupbyExtension-gcalmettes

(Or if you do not want to install the package in your python distribution, just download this repo and place the gcGroupbyExtension folder in the folder you're running your python script/notebook in.)

Import

Once installed, the extension can be imported via:

import gcGroupbyExtension

What problems does this extension try to solve?

Pandas provides both the .pipe and .apply methods to work on its groupby object. The main difference between .pipe and .apply in the groupby context is that you have access to the entire scope of the groupby object (each group) with .pipe, while you only have access to the subcomponents scope (in the context of a groupby the subcomponents are slices of the dataframe that called groupby where each slice is a dataframe itself. This is analogous for a series groupby.)

  1. The .pipe method can be chained, while the .apply method can't.
  2. You can use the .agg method to limit the application of the functions on particular columns of the groups, but it is cumbersome to apply specific functions independantly on only a selection of the groups.
  3. There is no easy way to construct independant pipelines of functions for each group.

This extension provides this capability.

What does this extension actually do?

This extension allows to construct a pipeline of functions to be applied independently on the groups of a groupby object. The functions/transformations to be applied can be the same for all the groups or scoped to (a) specific group(s).

Details: This library registers a custom accessor on pandas DataFrame and Series objects. The methods of this extension are registered under the gc namespace.

See the DEMO notebook for details.

Care to show the syntax?

Sure! See the DEMO notebook for more details, but basically, you can do things like this:

(df.gc.groupby("nameOfColumn")
    .resetIndex() # this is a special method baked in
    .apply(lambda x: x * 5, lambda x: x + x.iloc[3]) # accepts multiple functions
    .apply(mySpecialFunction, onlyGroups=['group1']) # limit the function to specific group(s)
    .apply(lambda x: x - x.mean(), ignoreGroups=['group4', 'group6']) # limit the function to specific group(s)
    .apply(lambda x: x.std(axis=1))
    .concat(axis=0, multiIndex=None).plot()
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcGroupbyExtension-gcalmettes-0.0.3.tar.gz (5.0 kB view hashes)

Uploaded Source

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page