General level weighting of Pandas Dataframes
Project description
Weight Pandas Dataframes
pandas-weighting enables general level weighting (similar to spss) of dataframes. This makes it possible to calculate weighted means, frequencies etc. statistical figures without the need to write separate functions for applying weighting.
Weighting is done by repeating rows as many times as defined in 'weight' column. There are a few drawbacks related to weighting data this way:
-
Absolute weighting (all weights are above one, the sum of the weighted cases is more than sum of the unweighted cases) must be used instead of relative weighting (some weights are below zero, the sum of the weighted cases is the same as the sum of the unweighted cases), as it's not possible to repeat rows fractional times.
-
Weights are rounded to integers, which might cause inaccuracies, especially if the weights are small.
-
If dataframe / weights are large, weighting should be applied to individual columns in turns, instead of the whole dataframe, as this might cause memory issues.
Usage
from pandas_weighting.func import weight
df.col.pipe(weight, df.weights).describe()
or by monkey patching Series/Dataframe:
pd.Series.weight = weight
pd.DataFrame.weight = weight
df.col.weight(df.weights).describe()
Example
import pandas as pd
from pandas_weighting.func import weight
pd.Series.weight = weight
pd.DataFrame.weight = weight
df = pd.DataFrame({
'val': [1, 2, 3, 4, 5, 6],
'weights': [3, 2, 1, 1, 0, None],
})
# mean 3.5 =(1+2+3+4+5+6)/6
df.val.mean()
# weighted mean 2.0 =(3*1+2*2+1*3+1*4)/(3+2+1+1)
df.val.weight(df.weights).mean()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pandas_weighting-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4aa86efa22052b2a48b2ea98bd1575f7f8101484432cf9385b72d602ea2dfff1 |
|
MD5 | 7e05031b31d66d19c31bc9daf096fa93 |
|
BLAKE2b-256 | faeaf91946d76bbd37ac67ba3e87c5bbce5bdc11f2953554213896c792139d5a |