Skip to main content

Python tools working with data from the Healthcare Cost and Utilization Program (http://hcup-us.ahrq.gov).

Project description

PyHCUP is a Python library for parsing and importing data obtained from the Healthcare Cost and Utilization Program (http://hcup-us.ahrq.gov).

In particular, most of the data provided by HCUP is in fixed-width text (ASCII or *.asc) files, with meta data available in separate load files. This library is built to use the SAS format load files (*.sas).

Example Usage

Load a datafile/loadfile combination.

import pyhcup

#specify where your data and loadfiles live
datafile = 'D:\\Users\\hcup\\sid\\NY_SID_2009_CORE.asc'
loadfile = 'D:\\Users\\hcup\\sid\\sasload\\NY_SID_2009_CORE.sas'

#pull basic meta from SAS loadfile
meta_df = pyhcup.meta_from_sas(loadfile)

#use meta knowledge to parse datafile into a pandas DataFrame
df = pyhcup.read(datafile, meta_df)

Deal with very large files that cannot be held in memory in two ways.

  1. To import a subset of rows, such as for preliminary work or troubleshooting, specify nrows to read and/or skiprows to skip using pyhcup.read().

#optionally specify nrows and/or skiprows to handle larger files
df = pyhcup.read(datafile, meta_df, nrows=500000, skiprows=1000000)
  1. To iterate through chunks of rows, such as for importing into a database, first use the metadata to build lists of column names and widths. Next, pass a chunksize to the pyhcup.read() function above to create a generator yielding manageable-sized chunks.

chunk_size = 500000
reader = pyhcup.read(datafile, meta_df, chunksize=chunk_size)
for df in reader:
    #do your business
    #such as replacing sentinel values (below)
    #or inserting into a database with another Python library

Whether you are pulling in all records or just a chunk of records, you can also replace all those pesky missing/invalid data placeholders from HCUP (this is less useful for generically parsing missing values for non-HCUP files).

#also, this bulldozes through all values in all columns with no per-column control
replaced = pyhcup.replace_sentinels(df)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyHCUP-0.1.5.6.zip (7.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

PyHCUP-0.1.5.6.win-amd64.exe (8.1 MB view details)

Uploaded Source

File details

Details for the file PyHCUP-0.1.5.6.zip.

File metadata

  • Download URL: PyHCUP-0.1.5.6.zip
  • Upload date:
  • Size: 7.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for PyHCUP-0.1.5.6.zip
Algorithm Hash digest
SHA256 d90f732129e0cbd29be8afd2d2b7ec7586033823a3913ad8de656445defa386e
MD5 ff8dbf3c44d760ac325be4b1e9e0b861
BLAKE2b-256 58cf735ec49d81645347d75e976ebf639de906119982a01b1ee5b2297cf4c4a5

See more details on using hashes here.

File details

Details for the file PyHCUP-0.1.5.6.win-amd64.exe.

File metadata

File hashes

Hashes for PyHCUP-0.1.5.6.win-amd64.exe
Algorithm Hash digest
SHA256 4053be0e47fcd543e2256ecdc67629e280d62bcc2d00e4568054ad14e83b10d6
MD5 7663766e696df5d4e5fc9c6989bc64ba
BLAKE2b-256 4ad174c890ca0abf0809a4607dcaac9fff93ba586003907522c3965429f6d655

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page