Python parser and scraper for NHANES accelerometry and questionnaire
Project description
NHANES parser
Python parser and scraper for NHANES accelerometry and questionnaire
https://wwwn.cdc.gov/nchs/nhanes/default.aspx
Features
- Scrape .DOC files to Pandas DataFrame
- Parse .XPT and mortality .DAT files and convert to Pandas DataFrame
- Parse accelerometry .XPT files for 2003-2006 and 2011-2014 surveys to NumPy arrays
Installation
pip install pynhanes
Introduction
NHANES website has hierarchical organization of data:
-
Surveys (e.g. "2011-2012") ->
-
Components (e.g. "Questionnaire") ->
-
Categories (e.g. "Occupation") ->
- Data variables (.DOC and .XPT files)
-
-
It is conveninet to have all data in Pandas DataFrame of NumPy arrays for data analysis. This repo is here to help you make it.
NOTE: Please, keep in mind, that some NHANES data fields have been recoded since 1999. Make sure you have reviewed the NHANES website and understand how the code processed and changed the data. Especially pay attention to categorical data. This may have effect on data analysis results.
Quick start
NHANES Parser converts data to Pandas and NumPy format.
-
Make sure you have
wgetandunziputilities installed.
For Mac OS usebrew install wgetandbrew install unzip.
For Ubuntu useapt install wgetandapt install unzip. -
Make sure you have
1Gbfree space on disk for downloading data from NHANES website.
Optionally, make sure you have additional30Gbfree space on disk if you plan to download and parse NHANES accelerometry data. -
Download template scripts and subfolders from this github repository (35Kb). Unzip to make a working folder for downloading and parsing raw data from NHANES website (You can use another name instead of
workfolderif you wish).
wget https://github.com/timpyrkov/pynhanes/archive/master/scripts.zip
unzip -j scripts.zip 'pynhanes-master/scripts/*' 'pynhanes-master/pynhanes/wgetxpt.py' -d workfolder
- Go to your working folder, create subfolders, and move
nhanes_variables.jsonto theCSVsubfolder.
cd workfolder
mkdir XPT; mkdir NPZ; mkdir CSV; mv nhanes_variables.json CSV
wgetxpt.pydownloads .XPT category files you need.
For example, to downloadDEMOcategory files toXPT/subfolder, run:
python wgetxpt.py DEMO -o XPT
-
parse_codebook.ipynbscrapes hierarchy of NHANES data fields and saves to Pandas-readableCSV/nhanes_codebook.csv -
parse_userdata.ipynbparses .XPT and mortality .DAT files to Pandas-readableCSV/nhanes_userdata.csv.
You need to manually download mortality .DAT files from the NHANES website, otherwise parsing mortality is skipped.
You need to manually editCSV/nhanes_variables.jsonto add or remove NHANES data fileds which should be parsed. -
parse_activity.ipynbconverts accelerometry .XPT from and 2011-2014 surveys (PAXcategory) and saves to NumPy-readable:
NPZ/nhanes_steps.npz- step counts for 2005-2006 survey;
NPZ/nhanes_counts.npz- activity counts for 2003-2004/2005-2006 surveys;
NPZ/nhanes_triax.npz- activity counts for 2011-2012/2013-2014 surveys;
You need approximately30Gbfree space to store raw accelerometry .XPT files.
Note that 2011-2014 surveys have status prediction for each minute: 0 - Missing, 1 - Wake wear, 2 - Sleep wear, 3 - Non wear, 4 - Unknown -
load_and_plot.ipynbprovides example of loading and handling parsed data stored now in theCSV/subfolder
* parse_codebook.ipynb produces a codebook DataFrame which is a handy tool to convert numerically-encoded values to human-readable labels
** parse_userdata.ipynb may combine several variables into a sinle variable. Normally you would like to do that if:
a) Same data field has alternative names in diffrenet survey years (but be careful since the range of values may have changed -see the codebook):
SMD090, SMD650 - Avg # cigarettes/day during past 30 days
b) It is more reasonable to treat data fields together:
SMQ020, SMQ120, SMQ150 - Smoked at least 100 cigarettes in life / a pipe / cigars at least 20 times in life
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pynhanes-0.0.21.tar.gz.
File metadata
- Download URL: pynhanes-0.0.21.tar.gz
- Upload date:
- Size: 24.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab13a7698b0fc8bc98efeaea1b3aa20e5fd958e427c4c3e2c985e0830a9a804c
|
|
| MD5 |
50bfceee6130a751f49e8d81c6003411
|
|
| BLAKE2b-256 |
1a4443509b9cd0027ab691db6afa3456e76ad2945105e80f00ba60600b374cd6
|