HCSC is a python package for developed as a part of interview process.

These details have not been verified by PyPI

Project links

Homepage

Project description

Covid - 19 Daily Cumulative Statistics

The below project is a part of HCSC Machine Learning Engineer position.

As part of HCSC's COVID19 response, the Data Science team needs to prepare daily/weekly updates of nationwide infection counts, organized by county. We use numeric FIPS code https://en.wikipedia.org/wiki/FIPS_county_code rather than
state and county name to serve our results.

For every FIPS code and date, the program generates: population, daily cases, daily deaths, cumulative cases to date, and cumulative death counts to date.

Citations The data is supplied by New York Times.

For details on the data extraction please refer https://github.com/nytimes/covid-19-data

Program Execution

The goal of the project is to generate a daily/weekly updates of nationwide infection counts, organized by county. Below is the step by step process of executing this program.
The user import HCSC library from pip by running the following command.
(pip install HCSC ). This opens up a GUI in which the user have to provide

Output Folder Path

Data Files

As a part of this project, there are 2 csvs files provided by New York Times and US Censes Data. The path of the output file directory is given by the user.

Libraries Below are the libraries used as a part of this project.

pandas
numpy
os
datetime

Project Files & Folders

HCSC

This folder just has the init.py file required to initiate the package and program

config.py

This file initial configuration setting like paths etc.

LICENSE

This is an MIT license

setup.py

This is a setup file required by python to package and distribute the code. This file has all the indetail description and specifications.

data_process.py

This file has all the classes and functions required for the to pre-process the data.

data_clean.py

This file has all the classes and functions required for the to clean the data.

IO_path.py

This file has all the functions required to set the output and input paths.

merge.py

This file has all the functions required to merge the data into a final output on which we can summarize.

summary_stats.py

This file has all the classes and functions required to generate the summary output to desired location.

HCSC.py

This is the main file of the project. The user runs this file which will take input path and file and generate the summary table in given output path.

Data Dictionary

`covid`

Variable	Class	Description
date	date	Date of collision death (ymd)
County	factor	US County Names
State	factor	US State Names
FIPS	factor	US FIPS code
Cases	integer	Covid Cases reported per day
Deaths	integer	Covid Deaths reported per day

`population`

We are extracting only the required columns from the US Censes data.

Variable	Class	Description
STATE	factor	US State FIPS ID
County	factor	US County FIPS ID
POPESTIMATE2019	integer	US population estimate

Data Cleaning and Preprocessing

Below are the following steps used to clean and preprocess the data.

1. Reading the Data

The path to the input files are given in config.py. These files are read using pandas for analysis purposes.

2. Cleaning the Data Files

Data_Process class has all the necessary functions required to clean the data.

Below are the steps used to clean the data file.

Cleaning and Mapping Columns

I have used a column dictionarys to map the column names correctly which helps in standardizing the column names.

Standardizing the Dates

As a best practice, it is always recommended to standardize Dates columns.

Sort by Dates

As a best practice, it is always recommended to sort data by Dates columns.

Standarizing FIPS columns.
1. Population: Concatenating State_ID and County_ID to generate FIPS in population data, so that it can be joined with daily covid data.
2. Covid: Filling the empty and unknown FIPS IDs with a default value to standardize the column.
  .

Merging the Data Frames

After doing the data preprocessing and clean, we obtain clean files that we can merge. merge.final_merge takes in two data frames and output one final data frame on which we can do our analysis.

Generating Summary File

The final step is generate the result. summary_stats.SummaryStats.summarize generates the summary file as a csv because it is very easy to interpret and do custom analysis on csv.

Future Edition

Interactive Plots

We can include interactive plots using pyplot which help the end user analyze the data much more efficiently.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.17

Oct 16, 2020

0.0.16

Oct 16, 2020

0.0.14

Oct 16, 2020

0.0.1

Oct 16, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

HCSC-0.0.17.tar.gz (6.5 kB view details)

Uploaded Oct 16, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

HCSC-0.0.17-py3-none-any.whl (12.1 kB view details)

Uploaded Oct 16, 2020 Python 3

File details

Details for the file HCSC-0.0.17.tar.gz.

File metadata

Download URL: HCSC-0.0.17.tar.gz
Upload date: Oct 16, 2020
Size: 6.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.8

File hashes

Hashes for HCSC-0.0.17.tar.gz
Algorithm	Hash digest
SHA256	`5e17a502a8830fbc97039b5724c947b83d8c136bb128f4b9079029415c411cd5`
MD5	`064f69dc047c39e6a98b7d22dccf34a2`
BLAKE2b-256	`d797c67c0b43c330684c798bc719db142c88106573542d4d27ebcca5d1dbc0f2`

See more details on using hashes here.

File details

Details for the file HCSC-0.0.17-py3-none-any.whl.

File metadata

Download URL: HCSC-0.0.17-py3-none-any.whl
Upload date: Oct 16, 2020
Size: 12.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.8

File hashes

Hashes for HCSC-0.0.17-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a3b5ddb42c99f9ff01e8201cff9b426b7a3fb29e2d4ad5cc964f413ee2a0b6d4`
MD5	`082dd680efe7f3ff99eb6e4c559a8c26`
BLAKE2b-256	`8900be085a8e3406317fa2f059a0464872f372626c80f9a50521019d8ffd6b64`

See more details on using hashes here.

HCSC 0.0.17

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Covid - 19 Daily Cumulative Statistics

The below project is a part of HCSC Machine Learning Engineer position.

Citations The data is supplied by New York Times.

Program Execution

Data Files

Libraries Below are the libraries used as a part of this project.

Project Files & Folders

Data Dictionary

covid

population

Data Cleaning and Preprocessing

1. Reading the Data

2. Cleaning the Data Files

Cleaning and Mapping Columns

Standardizing the Dates

Sort by Dates

Standarizing FIPS columns.

Merging the Data Frames

Generating Summary File

Future Edition

Interactive Plots

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`covid`

`population`