Python tools working with data from the Healthcare Cost and Utilization Program (http://hcup-us.ahrq.gov).
Project description
PyHCUP is a Python library for parsing and importing data obtained from the Healthcare Cost and Utilization Program (http://hcup-us.ahrq.gov).
In particular, most of the data provided by HCUP is in fixed-width text (ASCII or *.asc) files, with meta data available in separate load files. This library is built to use the SAS format load files (*.sas).
Example Usage
Load a datafile/loadfile combination.
import pyhcup #specify where your data and loadfiles live datafile = 'D:\\Users\\hcup\\sid\\NY_SID_2009_CORE.asc' loadfile = 'D:\\Users\\hcup\\sid\\sasload\\NY_SID_2009_CORE.sas' #pull basic meta from SAS loadfile meta_df = pyhcup.meta_from_sas(loadfile) #use meta knowledge to parse datafile into a pandas DataFrame df = pyhcup.read(datafile, meta_df)
Deal with very large files that cannot be held in memory in two ways.
To import a subset of rows, such as for preliminary work or troubleshooting, specify nrows to read and/or skiprows to skip using pyhcup.read().
#optionally specify nrows and/or skiprows to handle larger files df = pyhcup.read(datafile, meta_df, nrows=500000, skiprows=1000000)
To iterate through chunks of rows, such as for importing into a database, first use the metadata to build lists of column names and widths. Next, pass a chunksize to the pyhcup.read() function above to create a generator yielding manageable-sized chunks.
chunk_size = 500000
reader = pyhcup.read(datafile, meta_df, chunksize=chunk_size)
for df in reader:
#do your business
#such as replacing sentinel values (below)
#or inserting into a database with another Python library
Whether you are pulling in all records or just a chunk of records, you can also replace all those pesky missing/invalid data placeholders from HCUP (this is less useful for generically parsing missing values for non-HCUP files).
#also, this bulldozes through all values in all columns with no per-column control replaced = pyhcup.replace_sentinels(df)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file PyHCUP-0.1.5.6.zip.
File metadata
- Download URL: PyHCUP-0.1.5.6.zip
- Upload date:
- Size: 7.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d90f732129e0cbd29be8afd2d2b7ec7586033823a3913ad8de656445defa386e
|
|
| MD5 |
ff8dbf3c44d760ac325be4b1e9e0b861
|
|
| BLAKE2b-256 |
58cf735ec49d81645347d75e976ebf639de906119982a01b1ee5b2297cf4c4a5
|
File details
Details for the file PyHCUP-0.1.5.6.win-amd64.exe.
File metadata
- Download URL: PyHCUP-0.1.5.6.win-amd64.exe
- Upload date:
- Size: 8.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4053be0e47fcd543e2256ecdc67629e280d62bcc2d00e4568054ad14e83b10d6
|
|
| MD5 |
7663766e696df5d4e5fc9c6989bc64ba
|
|
| BLAKE2b-256 |
4ad174c890ca0abf0809a4607dcaac9fff93ba586003907522c3965429f6d655
|