Apply Census weighting to survey data
Project description
Surveyweights
Apply Census weighting to survey data.
Example Usage
from surveyweights import run_weighting_scheme, run_weighting_iteration
# Define what to weigh on
weigh_on = ['age', 'education', 'gender', 'income', 'race', 'urban_rural', 'vote2016']
# Run weighting
output = run_weighting_scheme(survey_data, iters=25, weigh_on=weigh_on)
# Get data back with weight column
survey_data = output['final_df']
# See balance of weights
run_weighting_iteration(survey_data, weigh_on=weigh_on)
# Look at unweighted outcome data
print(survey_data['outcome'].value_counts(normalize=True) * 100)
# Look at weighted outcome data
print(survey_data['outcome'].value_counts(normalize=True) * survey_data.groupby('outcome')['weight'].mean() * 100)
Debugging
Help! The percentages don't sum to 100%!
If you subset the dataset, you subset the weights too and they will no longer work for the subsetted dataset. To fix this, use nomalize_weights
:
# Subset df
subset_df = df[df[var] == subset]
# Look at weighted data (will be wrong and will not sum to 100%!)
print(subset_df[var].value_counts(normalize=True) * subset_df.groupby(var)['weight'].mean() * 100)
# Normalize weights
df['weight'] = nomalize_weights(df['weight'])
# Look at weighted data (it is now fixed and still representative!)
print(subset_df[var].value_counts(normalize=True) * subset_df.groupby(var)['weight'].mean() * 100)
~
Help! The percentages still don't sum to 100% and I used normalize_weights
!
Another issue might be missing values. Try removing those.
df = df.dropna() # Remove NAs
df['weight'] = nomalize_weights(df['weight']) # Normalize weights
# Look at weighted data (it is now fixed and still representative!)
print(subset_df[var].value_counts(normalize=True) * subset_df.groupby(var)['weight'].mean() * 100)
Note that you may prefer to drop NAs just for particular columns of interest, or you may prefer to impute NAs with a particular value.
~
Help! Re-running changes my results!
The results should be deterministic, so re-running should not affect results. However, the weights still might be unstable and running the same weights in a different order could affect results. To fix this, try increasing the number of iterations and turning off early termination. Also, keep in mind that fluctuations of ~0.1 percentage point could be very normal - potentially a larger fluctuation for very small sample sizes.
Installation
pip3 install surveyweights
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file surveyweights-0.6.tar.gz
.
File metadata
- Download URL: surveyweights-0.6.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.8.0 tqdm/4.45.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c17f749abeaa31dbe9d4d049ef7066c0f90886ba7248aae4ce67258590d8eccb |
|
MD5 | f2af40204ab248bbdcc028728f283e11 |
|
BLAKE2b-256 | e3ef1a1ac4104be88802fe642b8d834e5ed81981805392e435ed753805e93dfd |
File details
Details for the file surveyweights-0.6-py3-none-any.whl
.
File metadata
- Download URL: surveyweights-0.6-py3-none-any.whl
- Upload date:
- Size: 8.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.8.0 tqdm/4.45.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0462b095649c2c92a2fb40900e746f647639cada4dfdd640ec0a7ee6a1812331 |
|
MD5 | 40af4b42d5e954bb99c51ec664b840fa |
|
BLAKE2b-256 | 3a9444965ced0261b7207924655a0138c947aee1335c1ba6b7e4380d3f76cbcc |