Python parser and scraper for NHANES accelerometry and questionnaire
Project description
NHANES parser
Python parser and scraper for NHANES accelerometry and questionnaire
https://wwwn.cdc.gov/nchs/nhanes/default.aspx
Features
- Scrape .DOC files to Pandas DataFrame
- Parse .XPT and mortality .DAT files and convert to Pandas DataFrame
- Parse accelerometry .XPT files for 2003-2006 and 2011-2014 surveys to NumPy arrays
Installation
pip install pynhanes
Introduction
NHANES website has hierarchical organization of data:
-
Surveys (e.g. "2011-2012") ->
-
Components (e.g. "Questionnaire") ->
-
Categories (e.g. "Occupation") ->
- Data variables (.DOC and .XPT files)
-
-
It is conveninet to have all data in Pandas DataFrame of NumPy arrays for data analysis. This repo is here to help you make it.
NOTE: Please, keep in mind, that some NHANES data fields have been recoded since 1999. Make sure you have reviewed the NHANES website and understand how the code processed and changed the data. Especially pay attention to categorical data. This may have effect on data analysis results.
Quick start
NHANES Parser lib offers tool to get data in Pandas and NumPy:
-
Create a working folder, e.g.
~/work/NHANES/
, copy notebooks from the repository foldersripts
to the working folder and create subfoldersXPT
,CSV
,NPZ
-
Copy
nhanes_variables.json
from the repository foldersripts
to yourCSV
subfolder -
Run
parse_codebook.ipynb
to scrape hierarchical structure of NHANES website to Pandas DataFrame (saves data toCSV
subfolder) -
Use
pywgetxpt
to download needed .XPT category files for all survey years (pywgetxpt DEMO -o XPT
saves DEMO data toXPT
subfolder) -
Run
parse_userdata.ipynb
to get a list of selected data variable fields and converts .XPT and mortality .DAT files to Pandas DataFrame (saves data toCSV
subfolder) -
Optionally run
parse_activity.ipynb
to convert 2003-2006 and 2011-2014 accelerometry data to NumPy arrays (saves data inNPZ
subfolder) -
Run
load_and_plot.ipynb
to see an example of how to load and hadle parsed data
* parse_codebook.ipynb
produces a codebook DataFrame which is a handy tool to convert numerically-encoded values to human-readable labels
** parse_userdata.ipynb
may combine several variables into a sinle variable. Normally you would like to do that if:
a) Same data field has alternative names in diffrenet survey years (but be careful since the range of values may have changed -see the codebook):
SMD090
, SMD650
- Avg # cigarettes/day during past 30 days
b) It is more reasonable to treat data fields together:
SMQ020
, SMQ120
, SMQ150
- Smoked at least 100 cigarettes in life / a pipe / cigars at least 20 times in life
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pynhanes-0.0.20.tar.gz
.
File metadata
- Download URL: pynhanes-0.0.20.tar.gz
- Upload date:
- Size: 22.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 499d71664cc62c8a2374d123cc5e9bd1a3b60cddb2da6022fb629fb96f7d273d |
|
MD5 | 977dfd19f63f30ac28b3a4d22f982ae0 |
|
BLAKE2b-256 | b417d85b5dcd048349f2ef7f2b679b756f317ca03dfa9cbd667c17d52d6adb3a |