COVID-19 Rapid Visualization
Project description
CORVIS
COVID-19 Rapid Visualization
CORVIS is a simple, flexible Python library designed to let small organizations and individuals easily analyze and visualize COVID-19 data. CORVIS has several key pieces of core functionality:
- Automated data acquisition. CORVIS automatically downloads data from two major public repositories: The COVID Tracking Project and the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE. CORVIS minimizes download time by automatically storing the latest versions of each dataset locally, only updating these datasets when new data is available. CORVIS also joins these two datasets together into a single unified dataframe for easy analysis.
- Simple filtering and aggregation. CORVIS provides a simple, powerful function to filter and aggregate data at the county, state/province, and national/regional level.
- Straightforward data manipulation functions. CORVIS gives users a variety of functions to transform and align data. Quickly and easily apply moving averages, calculate per-capita cases, identify a common 'day zero' starting point across multiple areas, calculate daily changes, and more.
- Easy-to-use plotting tool. Quickly plot and compare data using a single plotting function
- Standard data formats. All data is stored as
pandas
DataFrames, so advanced users can use their favorite tools and applications for deeper research.
Installing CORVIS
Install CORVIS quickly and easily using pip
:
pip install corvis
Examples
Load, filter, analyze, and plot data with just a few lines of code!
from corvis.corvis import *
unifiedDataCORVIS = LoadCORVISData(verbose=True)
corvisDataToPlot = FilterCORVISData(unifiedDataCORVIS, country='US', aggregateBy=CORVISAggregations.COUNTRY, metric=CORVISMetrics.CONFIRMED, sourceData=CORVISDatasources.ALL, combineDatasources=CORVISCombineDatasourcesBy.MAX)
CreateCORVISPlot(corvisDataToPlot, graphTitle='United States: Confirmed Cases')
A more advanced example:
from corvis.corvis import *
unifiedDataCORVIS = LoadCORVISData(verbose=True)
corvisDataToPlot = FilterCORVISData(unifiedDataCORVIS, country='US', state=['NY', 'NJ'], aggregateBy=CORVISAggregations.COUNTRY, metric=CORVISMetrics.CONFIRMED, sourceData=CORVISDatasources.ALL, combineDatasources=CORVISCombineDatasourcesBy.MAX)
stayHomeCorvisDataToPlot = FilterCORVISData(unifiedDataCORVIS, country='US', state=['!NY', '!NJ', '!IA', '!NE', '!ND', '!SD', '!AR','!WY', '!UT', '!OK'], aggregateBy=CORVISAggregations.COUNTRY, metric=CORVISMetrics.CONFIRMED, sourceData=CORVISDatasources.ALL, combineDatasources=CORVISCombineDatasourcesBy.MAX)
stayHomePartialCorvisDataToPlot = FilterCORVISData(unifiedDataCORVIS, country='US', state=['WY', 'UT', 'OK'], aggregateBy=CORVISAggregations.COUNTRY, metric=CORVISMetrics.CONFIRMED, sourceData=CORVISDatasources.ALL, combineDatasources=CORVISCombineDatasourcesBy.MAX)
StayHomeNoneCorvisDataToPlot = FilterCORVISData(unifiedDataCORVIS, country='US', state=['IA', 'NE', 'ND', 'SD', 'AR'], aggregateBy=CORVISAggregations.COUNTRY, metric=CORVISMetrics.CONFIRMED, sourceData=CORVISDatasources.ALL, combineDatasources=CORVISCombineDatasourcesBy.MAX)
corvisDataToPlot = corvisDataToPlot.append(stayHomeCorvisDataToPlot, ignore_index=True)
corvisDataToPlot = corvisDataToPlot.append(stayHomePartialCorvisDataToPlot, ignore_index=True)
corvisDataToPlot = corvisDataToPlot.append(StayHomeNoneCorvisDataToPlot, ignore_index=True)
corvisDataToPlot = ComputeCORVISPerCapita(corvisDataToPlot, 100000)
corvisDataToPlot = ComputeCORVISMovingAverage(corvisDataToPlot, 14)
corvisDataToPlot = ComputeCORVISMovingAverage(corvisDataToPlot, 14)
corvisDataToPlot = ComputeCORVISDailyChange(corvisDataToPlot)
corvisDataToPlot = ComputeCORVISDailyChange(corvisDataToPlot)
graphLegend = ['New York and New Jersey', 'Stay-at-home order, statewide (minus NY/NJ)', 'Stay-at-home order, some areas', 'No stay-at-home order']
CreateCORVISPlot(corvisDataToPlot, graphLegend, 'United States: Confirmed Cases – Per-Capita Rate of Change (14-day double moving average)', xLabel='Date', yLabel='Daily change per 100k people', startGraphAtThreshold=0.05)
Functions
LoadCORVISData()
LoadCORVISData()
allows us to quickly and easily load the latest COVID-19 data directly from the server. Once loaded, it stores a copy of the data on our local server, along with the fingerprint for that data. On subsequent calls, it only downloads the data from the server again if the server has updated its fingerprint, meaning there is new data.
LoadCORVISData()
also performs some basic data cleaning, manipulation, and collation. It selects fields of primary interest to data researchers and discards others (such as ISO and FIPS codes.) It also aligns data from different datasets to a single unified structure. Finally, it uses a lookup table to populate missing Population
values in the dataset.
Parameters:
datasourceToLoad
: a singleCORVISDatasources
enumerated value. The datasource to load. Default isCORVISDatasources.ALL
(load data from all available sources.)dataPath
: a raw string representing a file path. The location to which to save data files. Defaults to the home directory (~/
). Note: all saved data files are hidden.forceDownload
: a boolean value. WhenTrue
, forces the application to download data from remote servers, bypassing the local saved data files. Default isFalse
.verbose
: a boolean value. Provides verbose output whenTrue
. Default isFalse
.
Returns:
- a single
pandas
DataFrame
containing a valid CORVIS dataset.
##FilterCORVISData()
FilterCORVISData()
allows users to quickly capture data for specific criteria, such as country, state, county, and metric.
###Parameters:
sourceCORVISDataframe
: a CORVIS dataframe generated by this library.country
: a string or list of strings, used to filter by Country/Region. To exclude a country from a filter, add an exclamation point!
to the beginning of the string.state
: a string or list of strings, used to filter by State/Province. To exclude a state from a filter, add an exclamation point!
to the beginning of the string.county
: a string or list of strings, used to filter by County. To exclude a county from a filter, add an exclamation point!
to the beginning of the string.region
: an alias forcountry
province
: an alias forstate
aggregateBy
: a singleCORVISAggregations
value. determines how to - aggregate your data: globally, nationally, by state, or not at all.metric
: a singleCORVISMetrics
value or list ofCORVISMetrics
values. Determines which metrics to filter (e.g. CORVISMetrics.POSITIVE, CORVISMetrics.RECOVERED)filterMissingPopulation
: boolean, defaultsTrue
. Determines whether or not to filter out records that do not have a population associated with them (e.g. cruise ships, special departments.) This is important when performing per-capita analysis.sourceData
: a singleDatasource
value, defaults toCORVISDatasources.ALL
. The datasource to filter on.allowStateCodesInFilters
: a boolean. IfTrue
, then state codes (e.g.NY
) will work when identifying US states. IfFalse
, then states must be spelled out (e.g.New York
.) Defaults toTrue
.
Returns:
- a single
pandas
DataFrame
containing a valid CORVIS dataset.
TransformCORVISDataToDayZero()
We can use the TransformCORVISDataToDayZero()
function to transform any of our CORVIS 'calendar day' datasets to a 'Day Zero' format. the threshold
parameter indicates the threshold in cases/deaths/recoveries that an area needs to exceed in order to begin counting from day zero.
For example, if a dataframe of confirmed infections is fed into this function with a threshold of 200, then day zero for any given record will be the first day with 200 or more confirmed cases.
This function makes it easy to align different areas to a common starting point for an area-by-area comparison.
Note: use caution when using dataframes containing more than one metric. Dataframes with more than one metric will use the same threshold value for all metrics. As a result, a single location will likely identify a different day zero for each metric associated with that location.
Parameters:
sourceCORVISDataframe
: a CORVIS dataframe generated by this library.thresholdValue
: a single number. The minimum threshold value that must be met or exceeded to determine day zero.dropNAColumns
: a boolean. ifTrue
, drops all trailing columns that contain only NA values. Defaults toTrue
.
Returns:
- a single
pandas
DataFrame
containing a valid CORVIS dataset.
ComputeCORVISMovingAverage()
This function allows us to compute a moving average over a period of days. This can be useful to eliminate noise or variances introduced to our data by poor reporting or day-of-week effects.
The built-in rolling()
method in pandas
does a great job of calculating the rolling average, with one caveat: it either handles the front end of our window as N/A
or pushes the tail end of our window into the future, neither of which we really want. To get around this, I'm adding dummy columns to the start of our dataframe and copying our first column values into those dummy dataframes. Then, I'm dropping them after calculating our moving average.
This lets us have a bit of a lead-in on our front end. It isn't perfect; using this function on 'day zero' dataframes will have a slightly inaccurate start-up, but will quickly normalize once the moving average window is fully over our live data. This is an issue we can live with.
Parameters:
sourceCORVISDataframe
: a CORVIS dataframe generated by this library.windowRange
: the range of days the moving average should cover. Default =7
(average data over one week.)
Returns:
- a single
pandas
DataFrame
containing a valid CORVIS dataset.
##ComputeCORVISDailyChange()
Computes the day-by-day change for a CORVIS dataframe as the difference from the previous day's total.
Parameters:
sourceCORVISDataframe
: a CORVIS dataframe generated by this library.
Returns:
- a single
pandas
DataFrame
containing a valid CORVIS dataset.
ComputeCORVISPerCapita()
Computes per-capita values for a CORVIS dataframe.
Parameters:
sourceCORVISDataframe
: a CORVIS dataframe generated by this library.denominator
: the "per" in "number of cases per". For example, a denominator of 1000 will return results for "number of cases per 1000 people". Defaults to 1.
Returns:
- a single
pandas
DataFrame
containing a valid CORVIS dataset.
##GetCORVISHighestValues()
Gets the numberToGet
records containing the highest values in the given dataframe.
Parameters:
sourceCORVISDataframe
: a CORVIS dataframe generated by this library.numberToGet
: the number of highest value records to get.
Returns:
- a single
pandas
DataFrame
containing a valid CORVIS dataset.
CreateCORVISPlot()
A convenience method for quickly creating a line graph from a CORVIS dataframe. This function will plot all records in the CORVIS dataframe, so it is strongly recommended the user filters and aggregates their data to their liking before using this plotting function.
Parameters
sourceCORVISDataframe
: a CORVIS dataframe generated by this library.valuesForLegend
: a singleCORVISPlotValues
enumerated value, or a list of strings: the values to use in the graph's legend. Defaults toNone
, which does not show any legend.graphTitle
: a single string, the title of the graph.xLabel
: a single string, the label for the x-axis. Optional; auto-generates by defaultyLabel
: a single string, the label for the y-axis. Optional; auto-generates by defaultyScale
: a single string, indicating what kind of scale to use on the y-axis. Main options are 'linear' (default) or 'log'. (Also supports any other axis scale supported bymatplotlib.pyplot
, but these two should be all you need.)startGraphAtThreshold
: a number. If notNone
, the x-axis will begin the graph once a value greater thanstartGraphAtThreshold
has been reached in the graph data. Default isNone
.saveToFile
: a single string. Saves the generated plot to the file path/name provided. If not provided, the generated graph will be displayed in an interactive window.
Returns:
This function has no return value.
Data Acquisition and Standardization
At present, we have two major sources of data: The COVID Tracking Project, the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE. Each source provides its own tallies of daily data, each source provides different levels of granularity, and each source provides different metrics.
To facilitate analysis, CORVIS creates a standardized dataset from each of these sources.
Enumerated Types
Where practical, CORVIS uses enumerated types for parameter inputs. It is strongly recommended that you use these enumerated types wherever they are called for in the documentation. This helps to avoid confusion and invalid inputs, and helps protect your scripts from changes in future versions.
For example, instead of using FilterCORVISData( ... aggregateBy='country' ... )
, use FilterCORVISData( ... aggregateBy=CORVISAggregations.COUNTRY ... )
.
The enumerated types are as follows:
class CORVISDatasources(Enum):
ALL = 'All Datasources'
JHU = 'Johns Hopkins University'
COVID_TRACKING = 'The Covid Tracking Project'
CTP = 'The Covid Tracking Project' # alias for COVID_TRACKING
class CORVISMetrics(Enum):
ALL = 'all'
CONFIRMED = 'Confirmed'
DEATH = 'Death'
RECOVERED = 'Recovered'
NEGATIVE = 'Negative'
HOSPITALIZED = 'Hospitalized'
ICU = 'ICU'
VENTILATOR = 'Ventilator'
class CORVISCombineDatasourcesBy(Enum):
MIN = 'min'
MAX = 'max'
MEAN = 'mean'
NONE = None
class CORVISAggregations(Enum):
ALL = 'global'
GLOBAL = 'global'
COUNTRY = 'country'
REGION = 'country'
STATE = 'state'
PROVINCE = 'state'
COUNTY = None
NONE = None
class CORVISPlotValues(Enum):
SOURCE = 'Source'
METRIC = 'Metric'
COUNTRY = 'Country/Region'
REGION = 'Country/Region'
STATE = 'Province/State'
PROVINCE = 'Province/State'
COUNTY = 'County'
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file corvis-0.0.13.tar.gz
.
File metadata
- Download URL: corvis-0.0.13.tar.gz
- Upload date:
- Size: 20.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0774635093495f0539690914012242a301747cca4d84faad6215fc9295d9b167 |
|
MD5 | fce7d62ea6f9e119f64a7ce902869f0b |
|
BLAKE2b-256 | 0f964aacc3706bab044b430f3edc030b230fb644134e053cb78f4ac8b2359db4 |
File details
Details for the file corvis-0.0.13-py3-none-any.whl
.
File metadata
- Download URL: corvis-0.0.13-py3-none-any.whl
- Upload date:
- Size: 16.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2849986789449e31cb7fe3807976715e7b503a180663bb84f53a76ee53b5096b |
|
MD5 | a50205ee02c43ca8b3ddcd5b46e07fc2 |
|
BLAKE2b-256 | 8303bbdf7974daa410857f7e1c882f2d28dd74e0d5fc8ec6deab008e24016fd2 |