Skip to main content

General level weighting of Pandas Dataframes

Project description

Weight Pandas Dataframes

pandas-weighting enables general level weighting (similar to spss) of dataframes. This makes it possible to calculate weighted means, frequencies etc. statistical figures without the need to write separate functions for applying weighting.

Weighting is done by repeating rows as many times as defined in 'weight' column. There are a few drawbacks related to weighting data this way:

  • Absolute weighting (all weights are above one, the sum of the weighted cases is more than sum of the unweighted cases) must be used instead of relative weighting (some weights are below zero, the sum of the weighted cases is the same as the sum of the unweighted cases), as it's not possible to repeat rows fractional times.

  • Weights are rounded to integers, which might cause inaccuracies, especially if the weights are small.

  • If dataframe / weights are large, weighting should be applied to individual columns in turns, instead of the whole dataframe, as this might cause memory issues.

Usage

from pandas_weighting import weight

df.col.pipe(weight, df.weights).describe()

or by monkey patching Series/Dataframe:

pd.Series.weight = weight
pd.DataFrame.weight = weight

df.col.weight(df.weights).describe()

Example

import pandas as pd
from pandas_weighting import weight

pd.Series.weight = weight
pd.DataFrame.weight = weight

df = pd.DataFrame({
    'val': [1, 2, 3, 4, 5, 6],
    'weights': [3, 2, 1, 1, 0, None],
})

# mean 3.5 =(1+2+3+4+5+6)/6
df.val.mean()

# weighted mean 2.0 =(3*1+2*2+1*3+1*4)/(3+2+1+1)
df.val.weight(df.weights).mean()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas-weighting-0.0.2.tar.gz (2.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pandas_weighting-0.0.2-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file pandas-weighting-0.0.2.tar.gz.

File metadata

  • Download URL: pandas-weighting-0.0.2.tar.gz
  • Upload date:
  • Size: 2.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for pandas-weighting-0.0.2.tar.gz
Algorithm Hash digest
SHA256 1321033cee7171ac9684c8f528830083f07957fcdd86060c1ac267b335cacfe3
MD5 d89cd8aefec7085304b5c02dd8b6a611
BLAKE2b-256 528ef9cc2dc32cadb201225d49b8f73730037f0d60bb7216af0724c0f85aa9db

See more details on using hashes here.

File details

Details for the file pandas_weighting-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: pandas_weighting-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 3.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for pandas_weighting-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 41281ba60305be1f3543e6b1cf8dec8bb4cc9ccc10a97c14b6c11c37f92a05ec
MD5 3901fc3c83d086899229f35b491ca538
BLAKE2b-256 28ae8c609542546cff99cba9af37e8761bd7b12c7f3c64ee2c73098fede1b40a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page