Skip to main content

Apply Census weighting to survey data

Project description

Surveyweights

Apply Census weighting to survey data.

Example Usage

from surveyweights import run_weighting_scheme, run_weighting_iteration

# Define what to weigh on
weigh_on = ['age', 'education', 'gender', 'income', 'race', 'urban_rural', 'vote2016']

# Run weighting
output = run_weighting_scheme(survey_data, iters=25, weigh_on=weigh_on)

# Get data back with weight column
survey_data = output['final_df']

# See balance of weights 
run_weighting_iteration(survey_data, weigh_on=weigh_on)

# Look at unweighted outcome data
print(survey_data['outcome'].value_counts(normalize=True) * 100)

# Look at weighted outcome data
print(survey_data['outcome'].value_counts(normalize=True) * survey_data.groupby('outcome')['weight'].mean() * 100)

Debugging

Help! The percentages don't sum to 100%!

If you subset the dataset, you subset the weights too and they will no longer work for the subsetted dataset. To fix this, use nomalize_weights:

# Subset df
subset_df = df[df[var] == subset]

# Look at weighted data (will be wrong and will not sum to 100%!)
print(subset_df[var].value_counts(normalize=True) * subset_df.groupby(var)['weight'].mean() * 100)

# Normalize weights
df['weight'] = nomalize_weights(df['weight'])

# Look at weighted data (it is now fixed and still representative!)
print(subset_df[var].value_counts(normalize=True) * subset_df.groupby(var)['weight'].mean() * 100)

~

Help! The percentages still don't sum to 100% and I used normalize_weights!

Another issue might be missing values. Try removing those.

df = df.dropna() # Remove NAs
df['weight'] = nomalize_weights(df['weight']) # Normalize weights

# Look at weighted data (it is now fixed and still representative!)
print(subset_df[var].value_counts(normalize=True) * subset_df.groupby(var)['weight'].mean() * 100)

Note that you may prefer to drop NAs just for particular columns of interest, or you may prefer to impute NAs with a particular value.

~

Help! Re-running changes my results!

The results should be deterministic, so re-running should not affect results. However, the weights still might be unstable and running the same weights in a different order could affect results. To fix this, try increasing the number of iterations and turning off early termination. Also, keep in mind that fluctuations of ~0.1 percentage point could be very normal - potentially a larger fluctuation for very small sample sizes.

Installation

pip3 install surveyweights

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

surveyweights-0.6.tar.gz (8.2 kB view hashes)

Uploaded source

Built Distribution

surveyweights-0.6-py3-none-any.whl (8.9 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page