Python parser and scraper for NHANES accelerometry and questionnaire

These details have not been verified by PyPI

Project links

Homepage

Project description

NHANES parser

Python parser and scraper for NHANES accelerometry and questionnaire

https://wwwn.cdc.gov/nchs/nhanes/default.aspx

Features

Scrape .DOC files to Pandas DataFrame
Parse .XPT and mortality .DAT files and convert to Pandas DataFrame
Parse accelerometry .XPT files for 2003-2006 and 2011-2014 surveys to NumPy arrays

Installation

pip install pynhanes

Introduction

NHANES website has hierarchical organization of data:

Surveys (e.g. "2011-2012") ->
- Components (e.g. "Questionnaire") ->
  - Categories (e.g. "Occupation") ->
    - Data variables (.DOC and .XPT files)

For data analysis it is, however, more conveninet to have all data in Pandas DataFrame of NumPy arrays. This repo is here to help you make it.

NOTE: Please, keep in mind, that some NHANES data fields have been recoded since 1999. Make sure you have reviewed the NHANES website and understand how the code processed and changed the data. Especially pay attention to categorical data. This may have effect on data analysis results.

Quick start

NHANES Parser lib offers tool to get data in Pandas and NumPy:

Create a working folder, e.g. ~/work/NHANES/, copy notebooks from repository folder sripts to the working folder and create subfolders XPT, CSV, NPZ
parse_codebook.ipynb scrapes hierarchical structure of NHANES website to Pandas DataFrame (default: save data in CSV subfolder)
pywgetxpt can download .XPT category files for all survey years (default: save data in XPT subfolder)
parse_userdata.ipynb gets a list of selected data variable fields and converts .XPT and mortality .DAT files to Pandas DataFrame (default: save data in CSV subfolder)
parse_activity.ipynb converts 2003-2006 and 2011-2014 accelerometry data to NumPy arrays (default: save data in NPZ subfolder)
load_and_plot.ipynb shows an example how to load and hadle parsed data

* parse_codebook.ipynb produces a codebook DataFrame which a handy tool to convert numerically-encoded values to human-readable labels

** parse_userdata.ipynb can combine several variables into a sinle variable. Normally you would like to do that if:

a) Same data field has alternative names in diffrenet survey years (but be careful since the range of values may have changed -see the codebook):

SMD090, SMD650 - Avg # cigarettes/day during past 30 days

b) It is more reasonable to treat data fields together:

SMQ020, SMQ120, SMQ150 - Smoked at least 100 cigarettes in life / a pipe / cigars at least 20 times in life

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.0.20

Jul 10, 2024

0.0.19

Nov 29, 2023

0.0.18

Sep 10, 2023

This version

0.0.17

Sep 9, 2023

0.0.16

Feb 25, 2023

0.0.15

Feb 20, 2023

0.0.14

Feb 20, 2023

0.0.12

Feb 19, 2023

0.0.11

Feb 16, 2023

0.0.10

Jan 8, 2023

0.0.9

Jan 8, 2023

0.0.8

Dec 9, 2022

0.0.7

Dec 9, 2022

0.0.6

Dec 9, 2022

0.0.5

Dec 9, 2022

0.0.4

Nov 29, 2022

0.0.3

Nov 29, 2022

0.0.2

Nov 29, 2022

0.0.1

Nov 26, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pynhanes-0.0.17.tar.gz (22.4 kB view hashes)

Uploaded Sep 9, 2023 Source

Hashes for pynhanes-0.0.17.tar.gz

Hashes for pynhanes-0.0.17.tar.gz
Algorithm	Hash digest
SHA256	`bcea5c0070798646d68f46f91ab913912cb0f9b5ad4306eddd76ed840aebc4ac`
MD5	`ba8322c7b5af95721161810c3bed1c2f`
BLAKE2b-256	`9cb211adc804ac5e46c186c1e9dd46422f924d56d1e3f644f412481be668a1e6`