Skip to main content

HCSC is a python package for developed as a part of interview process.

Project description

Covid - 19 Daily Cumulative Statistics

The below project is a part of HCSC Machine Learning Engineer position.

As part of HCSC's COVID19 response, the Data Science team needs to prepare daily/weekly updates of nationwide infection counts, organized by county. We use numeric FIPS code https://en.wikipedia.org/wiki/FIPS_county_code rather than
state and county name to serve our results.

For every FIPS code and date, the program generates: population, daily cases, daily deaths, cumulative cases to date, and cumulative death counts to date.

Citations The data is supplied by New York Times.

For details on the data extraction please refer https://github.com/nytimes/covid-19-data

Program Execution

The goal of the project is to generate a daily/weekly updates of nationwide infection counts, organized by county. Below is the step by step process of executing this program.
The user import HCSC library from pip by running the following command.
(pip install HCSC ). This opens up a GUI in which the user have to provide

Output Folder Path

Data Files

As a part of this project, there are 2 csvs files provided by New York Times and US Censes Data. The path of the output file directory is given by the user.

Libraries Below are the libraries used as a part of this project.

  • pandas
  • numpy
  • os
  • datetime

Project Files & Folders

  • HCSC
  • This folder just has the init.py file required to initiate the package and program

  • config.py
  • This file initial configuration setting like paths etc.

  • LICENSE
  • This is an MIT license

  • setup.py
  • This is a setup file required by python to package and distribute the code. This file has all the indetail description and specifications.

  • data_process.py
  • This file has all the classes and functions required for the to pre-process the data.

  • data_clean.py
  • This file has all the classes and functions required for the to clean the data.

  • IO_path.py
  • This file has all the functions required to set the output and input paths.

  • merge.py
  • This file has all the functions required to merge the data into a final output on which we can summarize.

  • summary_stats.py
  • This file has all the classes and functions required to generate the summary output to desired location.

  • HCSC.py
  • This is the main file of the project. The user runs this file which will take input path and file and generate the summary table in given output path.

Data Dictionary

covid

Variable Class Description
date date Date of collision death (ymd)
County factor US County Names
State factor US State Names
FIPS factor US FIPS code
Cases integer Covid Cases reported per day
Deaths integer Covid Deaths reported per day

population

We are extracting only the required columns from the US Censes data.

Variable Class Description
STATE factor US State FIPS ID
County factor US County FIPS ID
POPESTIMATE2019 integer US population estimate

Data Cleaning and Preprocessing

Below are the following steps used to clean and preprocess the data.

1. Reading the Data

The path to the input files are given in config.py. These files are read using pandas for analysis purposes.

2. Cleaning the Data Files

Data_Process class has all the necessary functions required to clean the data.

Below are the steps used to clean the data file.

  1. Cleaning and Mapping Columns

I have used a column dictionarys to map the column names correctly which helps in standardizing the column names.

  1. Standardizing the Dates

As a best practice, it is always recommended to standardize Dates columns.

  1. Sort by Dates

As a best practice, it is always recommended to sort data by Dates columns.

  1. Standarizing FIPS columns.

    1. Population: Concatenating State_ID and County_ID to generate FIPS in population data, so that it can be joined with daily covid data.

    2. Covid: Filling the empty and unknown FIPS IDs with a default value to standardize the column.

      .

Merging the Data Frames

After doing the data preprocessing and clean, we obtain clean files that we can merge. merge.final_merge takes in two data frames and output one final data frame on which we can do our analysis.

Generating Summary File

The final step is generate the result. summary_stats.SummaryStats.summarize generates the summary file as a csv because it is very easy to interpret and do custom analysis on csv.

Future Edition

Interactive Plots

We can include interactive plots using pyplot which help the end user analyze the data much more efficiently.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

HCSC-0.0.17.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

HCSC-0.0.17-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file HCSC-0.0.17.tar.gz.

File metadata

  • Download URL: HCSC-0.0.17.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.8

File hashes

Hashes for HCSC-0.0.17.tar.gz
Algorithm Hash digest
SHA256 5e17a502a8830fbc97039b5724c947b83d8c136bb128f4b9079029415c411cd5
MD5 064f69dc047c39e6a98b7d22dccf34a2
BLAKE2b-256 d797c67c0b43c330684c798bc719db142c88106573542d4d27ebcca5d1dbc0f2

See more details on using hashes here.

File details

Details for the file HCSC-0.0.17-py3-none-any.whl.

File metadata

  • Download URL: HCSC-0.0.17-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.8

File hashes

Hashes for HCSC-0.0.17-py3-none-any.whl
Algorithm Hash digest
SHA256 a3b5ddb42c99f9ff01e8201cff9b426b7a3fb29e2d4ad5cc964f413ee2a0b6d4
MD5 082dd680efe7f3ff99eb6e4c559a8c26
BLAKE2b-256 8900be085a8e3406317fa2f059a0464872f372626c80f9a50521019d8ffd6b64

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page