Skip to main content

Apply Census weighting to survey data

Project description

Surveyweights

Apply Census weighting to survey data.

Example Usage

from surveyweights import run_weighting_scheme, run_weighting_iteration

# Define what to weigh on
weigh_on = ['age', 'education', 'gender', 'income', 'race', 'urban_rural', 'vote2016']

# Run weighting
output = run_weighting_scheme(survey_data, iters=25, weigh_on=weigh_on)

# Get data back with weight column
survey_data = output['final_df']

# See balance of weights 
run_weighting_iteration(survey_data, weigh_on=weigh_on)

# Look at unweighted outcome data
print(survey_data['outcome'].value_counts(normalize=True) * 100)

# Look at weighted outcome data
print(survey_data['outcome'].value_counts(normalize=True) * survey_data.groupby('outcome')['weight'].mean() * 100)

Debugging

Help! The percentages don't sum to 100%!

If you subset the dataset, you subset the weights too and they will no longer work for the subsetted dataset. To fix this, use nomalize_weights:

# Subset df
subset_df = df[df[var] == subset]

# Look at weighted data (will be wrong and will not sum to 100%!)
print(subset_df[var].value_counts(normalize=True) * subset_df.groupby(var)['weight'].mean() * 100)

# Normalize weights
df['weight'] = nomalize_weights(df['weight'])

# Look at weighted data (it is now fixed and still representative!)
print(subset_df[var].value_counts(normalize=True) * subset_df.groupby(var)['weight'].mean() * 100)

~

Help! The percentages still don't sum to 100% and I used normalize_weights!

Another issue might be missing values. Try removing those.

df = df.dropna() # Remove NAs
df['weight'] = nomalize_weights(df['weight']) # Normalize weights

# Look at weighted data (it is now fixed and still representative!)
print(subset_df[var].value_counts(normalize=True) * subset_df.groupby(var)['weight'].mean() * 100)

Note that you may prefer to drop NAs just for particular columns of interest, or you may prefer to impute NAs with a particular value.

~

Help! Re-running changes my results!

The results should be deterministic, so re-running should not affect results. However, the weights still might be unstable and running the same weights in a different order could affect results. To fix this, try increasing the number of iterations and turning off early termination. Also, keep in mind that fluctuations of ~0.1 percentage point could be very normal - potentially a larger fluctuation for very small sample sizes.

Installation

pip3 install surveyweights

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

surveyweights-0.6.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

surveyweights-0.6-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file surveyweights-0.6.tar.gz.

File metadata

  • Download URL: surveyweights-0.6.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.8.0 tqdm/4.45.0 CPython/3.8.10

File hashes

Hashes for surveyweights-0.6.tar.gz
Algorithm Hash digest
SHA256 c17f749abeaa31dbe9d4d049ef7066c0f90886ba7248aae4ce67258590d8eccb
MD5 f2af40204ab248bbdcc028728f283e11
BLAKE2b-256 e3ef1a1ac4104be88802fe642b8d834e5ed81981805392e435ed753805e93dfd

See more details on using hashes here.

File details

Details for the file surveyweights-0.6-py3-none-any.whl.

File metadata

  • Download URL: surveyweights-0.6-py3-none-any.whl
  • Upload date:
  • Size: 8.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.8.0 tqdm/4.45.0 CPython/3.8.10

File hashes

Hashes for surveyweights-0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 0462b095649c2c92a2fb40900e746f647639cada4dfdd640ec0a7ee6a1812331
MD5 40af4b42d5e954bb99c51ec664b840fa
BLAKE2b-256 3a9444965ced0261b7207924655a0138c947aee1335c1ba6b7e4380d3f76cbcc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page