Skip to main content

Allows to construct a pipeline of functions to be applied independently on the groups of a groupby object.

Project description

gcGroupbyExtension: chain .apply methods on Pandas groupby object

Install

The extension is available on PyPi

pip install gcGroupbyExtension-gcalmettes

(Or if you do not want to install the package in your python distribution, just download this repo and place the gcGroupbyExtension folder in the folder you're running your python script/notebook in.)

Import

Once installed, the extension can be imported via:

import gcGroupbyExtension

What problems does this extension try to solve?

Pandas provides both the .pipe and .apply methods to work on its groupby object. The main difference between .pipe and .apply in the groupby context is that you have access to the entire scope of the groupby object (each group) with .pipe, while you only have access to the subcomponents scope (in the context of a groupby the subcomponents are slices of the dataframe that called groupby where each slice is a dataframe itself. This is analogous for a series groupby.)

  1. The .pipe method can be chained, while the .apply method can't.
  2. You can use the .agg method to limit the application of the functions on particular columns of the groups, but it is cumbersome to apply specific functions independantly on only a selection of the groups.
  3. There is no easy way to construct independant pipelines of functions for each group.

This extension provides this capability.

What does this extension actually do?

This extension allows to construct a pipeline of functions to be applied independently on the groups of a groupby object. The functions/transformations to be applied can be the same for all the groups or scoped to (a) specific group(s).

Details: This library registers a custom accessor on pandas DataFrame and Series objects. The methods of this extension are registered under the gc namespace.

See the DEMO notebook for details.

Care to show the syntax?

Sure! See the DEMO notebook for more details, but basically, you can do things like this:

(df.gc.groupby("nameOfColumn")
    .resetIndex() # this is a special method baked in
    .apply(lambda x: x * 5, lambda x: x + x.iloc[3]) # accepts multiple functions
    .apply(mySpecialFunction, onlyGroups=['group1']) # limit the function to specific group(s)
    .apply(lambda x: x - x.mean(), ignoreGroups=['group4', 'group6']) # limit the function to specific group(s)
    .apply(lambda x: x.std(axis=1))
    .concat(axis=0, multiIndex=None).plot()
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcGroupbyExtension-gcalmettes-0.0.3.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gcGroupbyExtension_gcalmettes-0.0.3-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file gcGroupbyExtension-gcalmettes-0.0.3.tar.gz.

File metadata

  • Download URL: gcGroupbyExtension-gcalmettes-0.0.3.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.20.0 setuptools/40.5.0 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.0

File hashes

Hashes for gcGroupbyExtension-gcalmettes-0.0.3.tar.gz
Algorithm Hash digest
SHA256 3fa55ea4dd1715859e4d8f1ce65dec856e205e6ed0f5c7d908a677a7c34bfd2a
MD5 b9767aea16c49d400676801373ebccf7
BLAKE2b-256 cfe3f14b24c4fdf727521fae4b9e2cfd53640f713643bee3d871793bcf74523b

See more details on using hashes here.

File details

Details for the file gcGroupbyExtension_gcalmettes-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: gcGroupbyExtension_gcalmettes-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.20.0 setuptools/40.5.0 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.0

File hashes

Hashes for gcGroupbyExtension_gcalmettes-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f2791fa8acb118bfd3cb0df10369780d630a09b4d62144e83ff3b0ade19ec19f
MD5 d6b93204ebaed8195f6cccbfa06a0521
BLAKE2b-256 448f3bcb3da012970fe7495a188cc29543c88f232e0f91642681b17e016ba9b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page