Skip to main content

General level weighting of Pandas Dataframes

Project description

Weight Pandas Dataframes

pandas-weighting enables general level weighting (similar to spss) of dataframes. This makes it possible to calculate weighted means, frequencies etc. statistical figures without the need to write separate functions for applying weighting.

Weighting is done by repeating rows as many times as defined in 'weight' column. There are a few drawbacks related to weighting data this way:

  • Absolute weighting (all weights are above one, the sum of the weighted cases is more than sum of the unweighted cases) must be used instead of relative weighting (some weights are below zero, the sum of the weighted cases is the same as the sum of the unweighted cases), as it's not possible to repeat rows fractional times.

  • Weights are rounded to integers, which might cause inaccuracies, especially if the weights are small.

  • If dataframe / weights are large, weighting should be applied to individual columns in turns, instead of the whole dataframe, as this might cause memory issues.

Usage

from pandas_weighting.func import weight

df.col.pipe(weight, df.weights).describe()

or by monkey patching Series/Dataframe:

pd.Series.weight = weight
pd.DataFrame.weight = weight

df.col.weight(df.weights).describe()

Example

import pandas as pd
from pandas_weighting.func import weight

pd.Series.weight = weight
pd.DataFrame.weight = weight

df = pd.DataFrame({
    'val': [1, 2, 3, 4, 5, 6],
    'weights': [3, 2, 1, 1, 0, None],
})

# mean 3.5 =(1+2+3+4+5+6)/6
df.val.mean()

# weighted mean 2.0 =(3*1+2*2+1*3+1*4)/(3+2+1+1)
df.val.weight(df.weights).mean()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas-weighting-0.0.1.tar.gz (2.5 kB view hashes)

Uploaded Source

Built Distribution

pandas_weighting-0.0.1-py3-none-any.whl (3.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page