No project description provided

Project description

Weightipy

Weightipy is a cut down version of Quantipy3 for weighting people data using the RIM (iterative raking) algorithm.

Changes from Quantipy

Removed all quantipy overhead. Weightipy supports the latest versions of Pandas and Numpy and is tested for Python 3.7, 3.8, 3.9, 3.10 and 3.11.
Weightipy runs up to 6 times faster than Quantipy, depending on the dataset.
Rim class will not generate reports like Quantipy did, unless the parameter verbose is set to True on the Rim constructor.

Installation

pip install weightipy

python3 -m pip install weightipy

Create a virtual envirionment

If you want to create a virtual environment when using Weightipy:

conda

conda create -n envwp python=3

with venv

python -m venv [your_env_name]

5-minutes to Weightipy

Get started

Assuming we have the variables gender and agecat we can weight the dataset like this:

import weightipy as wp

targets = {
    "agecat": {"18-24": 5.0, "25-34": 30.0, "35-49": 26.0, "50-64": 19.0, "65+": 20.0},
    "gender": {"Male": 49, "Female": 51}
}
scheme = wp.scheme_from_dict(targets)

df_weighted = wp.weight_dataframe(
    df=my_df,
    scheme=scheme,
    weight_column="weights"
)
efficiency = wp.weighting_efficiency(df_weighted["weights"])

In case we are working with census data, which also includes a region variable and we would like to weight the data by age and gender in each region, we can use the scheme_from_df function:

import weightipy as wp
import pandas as pd

df_data = pd.read_csv("data_to_weight.csv")
df_census = pd.read_csv("census_data.csv")

scheme = wp.scheme_from_df(
    df=df_census,
    cols_weighting=["agecat", "gender"],
    col_filter="region",
    col_freq="freq"
)
df_weighted = wp.weight_dataframe(
    df=d,
    scheme=scheme,
    weight_column="weights"
)
efficiency = wp.weighting_efficiency(df_weighted["weights"])

Or by using the underlying functions that will give more access to the weighting process, we can use the Rim and WeightEngine classes directly:

import weightipy as wp

# in this example, agecat and gender are int dtype

age_targets = {'agecat':{1:5.0, 2:30.0, 3:26.0, 4:19.0, 5:20.0}}
gender_targets = {'gender':{0:49, 1:51}}
scheme = wp.Rim('gender_and_age')
scheme.set_targets(targets=[age_targets, gender_targets])

my_df["identity"] = range(len(my_df))
engine = wp.WeightEngine(data=df)
engine.add_scheme(scheme=scheme, key="identity", verbose=False)
engine.run()
df_weighted = engine.dataframe()
col_weights = f"weights_{scheme.name}"

efficiency = wp.weighting_efficiency(df_weighted[col_weights])

print(engine.get_report())

Weight variable       weights_gender_and_age
Weight group                  _default_name_
Weight filter                           None
Total: unweighted                 582.000000
Total: weighted                   582.000000
Weighting efficiency               60.009826
Iterations required                14.000000
Mean weight factor                  1.000000
Minimum weight factor               0.465818
Maximum weight factor               6.187700
Weight factor ratio                13.283522

For more references on the underlying classes, refer to the Quantipy documentation

Overview of functions to get started:

Function	Description
weight_dataframe	Weights data by scheme, returns modified dataframe with new weight column.
weighting_efficiency	Takes weights and returns efficiency of weighting. See: https://quantipy.readthedocs.io/en/staging-develop/sites/lib_doc/weights/03_diags.html#the-weighting-efficiency
scheme_from_dict	Turns a dict of dicts into a Rim scheme. Keys of the dict are column names and the values are distributions. These are normalized.
scheme_from_df	Creates a Rim scheme from a dataframe from specified weighting columns and frequency column. Useful when working with census data.
Rim class	Useful for creation of more complex weighting schemas. For example when weighting subregions or groups, which require filters. See: https://quantipy.readthedocs.io/en/staging-develop/sites/lib_doc/weights/02_rim.html#using-the-rim-class
WeightEngine class	Useful for more specialised manipulation of the weighting process

Planned features

More utility functions to simplify the weighting process
More performance improvements, in order to better support batch weighting of many datasets
Support for multithreaded weighting (possibly using Polars)
Rewrite of the API to be less oriented towards how Quantipy worked and more in line with simple weighting needs
Far future: Support for more weighting algorithms

Contributing

The test suite for Weightipy can be run with the command

python3 -m pytest tests

But when developing a specific aspect of Weightipy, it might be quicker to run (e.g. for the Rim class)

python3 -m unittest tests.test_rim

We welcome volunteers and supporters. Please include a test case with any pull request, especially those that run calculations.

Quantipy

Origins

Quantipy was concieved of and instigated by Gary Nelson: http://www.datasmoothie.com

Contributors on Quantipy

Alexander Buchhammer, Alasdair Eaglestone, James Griffiths, Kerstin Müller : https://yougov.co.uk
Datasmoothie’s Birgir Hrafn Sigurðsson and Geir Freysson: http://www.datasmoothie.com

Project details

Release history Release notifications | RSS feed

This version

0.3.3

Jul 12, 2024

0.3.2

Jun 11, 2024

0.3.1

Jun 11, 2024

0.3.0

Jun 5, 2023

0.2.0

Jun 3, 2023

0.1.2

Jun 3, 2023

0.1.1

Jun 3, 2023

0.1.0

Mar 27, 2022

0.0.2

Mar 26, 2022

0.0.1

Mar 26, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weightipy-0.3.3.tar.gz (20.5 kB view details)

Uploaded Jul 12, 2024 Source

Built Distribution

weightipy-0.3.3-py3-none-any.whl (15.5 kB view details)

Uploaded Jul 12, 2024 Python 3

File details

Details for the file weightipy-0.3.3.tar.gz.

File metadata

Download URL: weightipy-0.3.3.tar.gz
Upload date: Jul 12, 2024
Size: 20.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for weightipy-0.3.3.tar.gz
Algorithm	Hash digest
SHA256	`0020603643974155fa637552ac708fae154f0013472a842405831f6d454eb62b`
MD5	`d5305c99cab7a55b2b796d92c7df6dda`
BLAKE2b-256	`27c9d4e91ebdbe3b92630d6dd9cbe1aefd5797946156ba7a7112c094d5b8b6e7`

See more details on using hashes here.

File details

Details for the file weightipy-0.3.3-py3-none-any.whl.

File metadata

Download URL: weightipy-0.3.3-py3-none-any.whl
Upload date: Jul 12, 2024
Size: 15.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for weightipy-0.3.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5b51099a5b309e25cf2176c69b0edea049180a190dcf79eb39107c3a499c2d35`
MD5	`867c54ee4fb4cbccf20e6470c374f230`
BLAKE2b-256	`ebdfa2b082ff21de32fa8b8a578e65f392bfcbf66b673c14dc23d5b87d96242a`

See more details on using hashes here.

weightipy 0.3.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Weightipy

Changes from Quantipy

Installation

Create a virtual envirionment

5-minutes to Weightipy

Planned features

Contributing

Quantipy

Origins

Contributors on Quantipy

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes