Tiger_Assessment is a python package for developed as a part of interview process.
Project description
Covid - 19 Daily Cumulative Statistics
The below project is a part of HCSC Machine Learning Engineer position.
As part of HCSC's COVID19 response, the Data Science team needs to prepare daily/weekly updates of nationwide infection counts, organized by county. We use numeric FIPS code https://en.wikipedia.org/wiki/FIPS_county_code rather than
state and county name to serve our results.
For every FIPS code and date, the program generates: population, daily cases, daily deaths, cumulative cases to date, and cumulative death counts to date.
Citations The data is supplied by New York Times.
For details on the data extraction please refer https://github.com/nytimes/covid-19-data
Program Execution
The goal of the project is to generate a daily/weekly updates of nationwide infection counts, organized by county. Below is the step by step process of executing this program.
The user imports Tiger_Assessment library from pip by running the following command.
(pip install Tiger-Assessment ). This opens up a GUI in which the user have to provide
Output Folder Path
Data Files
As a part of this project, there are two csv files provided by New York Times and US Censes Data. The path of the output file directory is given by the user.
Libraries Below are the libraries used as a part of this project.
- pandas
- numpy
- os
- datetime
Project Files & Folders
- Tiger_Assessment
- config.py
- LICENSE
- setup.py
- data_process.py
- data_clean.py
- IO_path.py
- merge.py
- summary_stats.py
- Tiger_Assessment.py
This folder just has the init.py file required to initiate the package and program
This file initial configuration setting like paths etc.
This is an MIT license
This is a setup file required by python to package and distribute the code. This file has all the indetail description and specifications.
This file has all the classes and functions required for the to pre-process the data.
This file has all the classes and functions required for the to clean the data.
This file has all the functions required to set the output and input paths.
This file has all the functions required to merge the data into a final output on which we can summarize.
This file has all the classes and functions required to generate the summary output to desired location.
This is the main file of the project. The user runs this file which will take input path and file and generate the summary table in given output path.
Data Dictionary
covid
| Variable | Class | Description |
|---|---|---|
| date | date | Date of collision death (ymd) |
| County | factor | US County Names |
| State | factor | US State Names |
| FIPS | factor | US FIPS code |
| Cases | integer | Covid Cases reported per day |
| Deaths | integer | Covid Deaths reported per day |
population
We are extracting only the required columns from the US Censes data.
| Variable | Class | Description |
|---|---|---|
| STATE | factor | US State FIPS ID |
| County | factor | US County FIPS ID |
| POPESTIMATE2019 | integer | US population estimate |
Data Cleaning and Preprocessing
Below are the following steps used to clean and preprocess the data.
1. Reading the Data
The path to the input files are given in config.py. These files are read using pandas for analysis purposes.
2. Cleaning the Data Files
Data_Process class has all the necessary functions required to clean the data.
Below are the steps used to clean the data file.
-
Cleaning and Mapping Columns
I have used a column dictionarys to map the column names correctly which helps in standardizing the column names.
-
Standardizing the Dates
As a best practice, it is always recommended to standardize Dates columns.
-
Sort by Dates
As a best practice, it is always recommended to sort data by Dates columns.
-
Standarizing FIPS columns.
-
Population: Concatenating State_ID and County_ID to generate FIPS in population data, so that it can be joined with daily covid data.
- Covid: Filling the empty and unknown FIPS IDs with a default value to standardize the column. .
-
Merging the Data Frames
After doing the data preprocessing and clean, we obtain clean files that we can merge. merge.final_merge takes in two data frames and output one final data frame on which we can do our analysis.
Generating Summary File
The final step is generate the result. summary_stats.SummaryStats.summarize generates the summary file as a csv because it is very easy to interpret and do custom analysis on csv.
Future Edition
Interactive Plots
We can include interactive plots using pyplot which help the end user analyze the data much more efficiently.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file Tiger_Assessment-0.0.17.tar.gz.
File metadata
- Download URL: Tiger_Assessment-0.0.17.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e7c0c734c372aa093bdb18b3ea35a507cb7397d25e85ae8435535e060e9b4e2
|
|
| MD5 |
d650c97eadc9411b4f7327a38c46ad65
|
|
| BLAKE2b-256 |
ce0f5e0b7eae1aaa5a727e0c743659d75eb07d430ecc0995a69d7e56a2c01659
|
File details
Details for the file Tiger_Assessment-0.0.17-py3-none-any.whl.
File metadata
- Download URL: Tiger_Assessment-0.0.17-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc345992d5ab3c6d589793cd9a92efaf8f37f98db01bdc71187b2ece4b23d501
|
|
| MD5 |
f4516beeffa8539c18991cf199bb2d1f
|
|
| BLAKE2b-256 |
cae15e608c760435c94b02d5611aea75900959d6371ee0c1a1541ccffee8a736
|