Skip to main content

Tiger_Assessment is a python package for developed as a part of interview process.

Project description

Covid - 19 Daily Cumulative Statistics

The below project is a part of HCSC Machine Learning Engineer position.

As part of HCSC's COVID19 response, the Data Science team needs to prepare daily/weekly updates of nationwide infection counts, organized by county. We use numeric FIPS code https://en.wikipedia.org/wiki/FIPS_county_code rather than
state and county name to serve our results.

For every FIPS code and date, the program generates: population, daily cases, daily deaths, cumulative cases to date, and cumulative death counts to date.

Citations The data is supplied by New York Times.

For details on the data extraction please refer https://github.com/nytimes/covid-19-data

Program Execution

The goal of the project is to generate a daily/weekly updates of nationwide infection counts, organized by county. Below is the step by step process of executing this program.
The user imports Tiger_Assessment library from pip by running the following command.
(pip install Tiger-Assessment ). This opens up a GUI in which the user have to provide

Output Folder Path

Data Files

As a part of this project, there are two csv files provided by New York Times and US Censes Data. The path of the output file directory is given by the user.

Libraries Below are the libraries used as a part of this project.

  • pandas
  • numpy
  • os
  • datetime

Project Files & Folders

  • Tiger_Assessment
  • This folder just has the init.py file required to initiate the package and program

  • config.py
  • This file initial configuration setting like paths etc.

  • LICENSE
  • This is an MIT license

  • setup.py
  • This is a setup file required by python to package and distribute the code. This file has all the indetail description and specifications.

  • data_process.py
  • This file has all the classes and functions required for the to pre-process the data.

  • data_clean.py
  • This file has all the classes and functions required for the to clean the data.

  • IO_path.py
  • This file has all the functions required to set the output and input paths.

  • merge.py
  • This file has all the functions required to merge the data into a final output on which we can summarize.

  • summary_stats.py
  • This file has all the classes and functions required to generate the summary output to desired location.

  • Tiger_Assessment.py
  • This is the main file of the project. The user runs this file which will take input path and file and generate the summary table in given output path.

Data Dictionary

covid

Variable Class Description
date date Date of collision death (ymd)
County factor US County Names
State factor US State Names
FIPS factor US FIPS code
Cases integer Covid Cases reported per day
Deaths integer Covid Deaths reported per day

population

We are extracting only the required columns from the US Censes data.

Variable Class Description
STATE factor US State FIPS ID
County factor US County FIPS ID
POPESTIMATE2019 integer US population estimate

Data Cleaning and Preprocessing

Below are the following steps used to clean and preprocess the data.

1. Reading the Data

The path to the input files are given in config.py. These files are read using pandas for analysis purposes.

2. Cleaning the Data Files

Data_Process class has all the necessary functions required to clean the data.

Below are the steps used to clean the data file.

  1. Cleaning and Mapping Columns

I have used a column dictionarys to map the column names correctly which helps in standardizing the column names.

  1. Standardizing the Dates

As a best practice, it is always recommended to standardize Dates columns.

  1. Sort by Dates

As a best practice, it is always recommended to sort data by Dates columns.

  1. Standarizing FIPS columns.

    1. Population: Concatenating State_ID and County_ID to generate FIPS in population data, so that it can be joined with daily covid data.

    2. Covid: Filling the empty and unknown FIPS IDs with a default value to standardize the column.

      .

Merging the Data Frames

After doing the data preprocessing and clean, we obtain clean files that we can merge. merge.final_merge takes in two data frames and output one final data frame on which we can do our analysis.

Generating Summary File

The final step is generate the result. summary_stats.SummaryStats.summarize generates the summary file as a csv because it is very easy to interpret and do custom analysis on csv.

Future Edition

Interactive Plots

We can include interactive plots using pyplot which help the end user analyze the data much more efficiently.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Tiger_Assessment-0.0.17.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

Tiger_Assessment-0.0.17-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file Tiger_Assessment-0.0.17.tar.gz.

File metadata

  • Download URL: Tiger_Assessment-0.0.17.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.8

File hashes

Hashes for Tiger_Assessment-0.0.17.tar.gz
Algorithm Hash digest
SHA256 6e7c0c734c372aa093bdb18b3ea35a507cb7397d25e85ae8435535e060e9b4e2
MD5 d650c97eadc9411b4f7327a38c46ad65
BLAKE2b-256 ce0f5e0b7eae1aaa5a727e0c743659d75eb07d430ecc0995a69d7e56a2c01659

See more details on using hashes here.

File details

Details for the file Tiger_Assessment-0.0.17-py3-none-any.whl.

File metadata

  • Download URL: Tiger_Assessment-0.0.17-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.8

File hashes

Hashes for Tiger_Assessment-0.0.17-py3-none-any.whl
Algorithm Hash digest
SHA256 cc345992d5ab3c6d589793cd9a92efaf8f37f98db01bdc71187b2ece4b23d501
MD5 f4516beeffa8539c18991cf199bb2d1f
BLAKE2b-256 cae15e608c760435c94b02d5611aea75900959d6371ee0c1a1541ccffee8a736

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page