An interface for visualizing and analysing the see19 dataset
Project description
see19 Guide
A dataset and interface for visualizing and analyzing the epidemiology of Coronavirus Disease 2019 aka COVID19 aka C19
Current with version 0.4.0
Analysis
Please read my various deep dives with see19
exploring different aspects of COVID19.
How Effective Is Social Distancing?
What Factors Are Correlated With COVID19 Fatality Rates?
Contents
- Purpose
- Getting Started
- the Data
3.1 Data Sources
3.2 Dataset Characteristics
3.3 The Testset
3.4 Disclaimer - the CaseStudy Interface
4.1 Basics
4.2 Filtering
4.3 Smoothing
4.4 Available Factors
4.5 Additional Flags
4.6 RayStudy v BaseStudy
4.7 Chart Objects - compchart - Visualizing Regional Impacts
5.1 Daily Fatalities Comparison - Italy
5.2 Daily Fatalities Comparison - 10 Most Impacted Regions
5.3 Varying the Categories - compchart4D - Visualizing Factors in 4D
6.1 From 3D to 4D
6.2 More on the X-Axis
6.3 How Far Can We Take It? - heatmap - Visualizing with Color Maps
7.1 Count Category v Single Factor
7.2 Count Category v Multiple Factors - barcharts - Comparing Regional Factors
- ScatterFlow for Large Sets
9.1 substrinscat - for Strindex Sub-Categories
9.2 scatterflow
1. Purpose
See19 is the single most comprehensive international COVID-19 dataset available.
Ease-of-use is paramount, thus, all data from all sources have been compiled into a single structure, readily consumed and manipulated in the ubiquitous csv
format.
Along with the root data, a module is included with analysis and visualizations tools.
2. Getting Started
See19 is a dataset and a python package.
The dataset can be accessed directly here. Files are timestamped with creation date.
The package can be installed via pip.
pip install see19
3. the Data
3.1 Data Sources
3.2 Dataset Characteristics
3.3 The Testset
3.4 Disclaimer
The See19 dataset aggregates global data on COVID19 in various regions, as available data allows, and marries that data with available datasets on exogenous regional factors that might impact the epidemiology of the virus.
The dataset is compiled using Selenium
, Django
, SQLite
, and Pandas
.
COVID19 Data Characteristics:
- Cumulative Cases for each region on each date
- Cumulative Fatalities for each region on each date
- State / Provincial-level data available for:
- Australia
- Brazil
- Canada
- China
- Italy
- United States
- Country-level available for all other regions
Factor Data Characteristics available for most regions:
- Longitude / Latitude
- I just wrote a script that searched the region name on this website and pulled the coordinates from the resulting url
- Population
- Population demographic segmentation
- Land Density
- City Density (typically the density of the largest city in the region)
- Climate Characteristics including:
- Average daily temperature
- Average daily dewpoint temperate
- Average daily relative humidity (derived from temperature and dewpoint temperature)
- Total daily UV-B Radiation
- Air quality measures
- Historical Health Outcomes
- Travel Popularity
- Social Distancing Implementation
Updated each morning.
3.1 Data Sources
COVID Case, Fatality, and Testing Data:
-
cases
anddeaths
andtests
- Brazil Regional Data compiled via the great from Wesley Cota and team.
- Note: Brazil data was previously available directly from the federal government, however, the fulsome CSV was removed from the site and a new source was required.
- Italy Regional Data from the government github repo
- Note: Italian testing has two categories that complicate the data somewhat
tamponi
refers to swabs. Swabs have been recorded since very early on. There are generally multiple swabs per individual whereas most test counts are one test per individual.casi_testati
refers to the more standard one test per person. This metric was not reliably tract before mid-April- for metrics prior to mid-April,
see19
adjusts thetamponi
counts by finding the averagetamponi
percase_testati
across the all data then dividing the tampons by the average to estimate casi_testati
- Note: Italian testing has two categories that complicate the data somewhat
-
cases
anddeaths
-
tests
Other Data:
- Longitude & Latitude
- I just wrote a script that searched each region name on this site
- Any errors were fixed manually
- Population, Demographics, and Density from SEDAC
- Matched to regional case data by name, often manually
- Climate Data from European Centre for Medium-Range Weather Forecasts
- Climate data pulled from nearest matching longitude & latitude coordinate in the dataset
- Air Quality Data from the World Air Quality Project
- Air quality data recorded at city-level, with limited number of cities available
- City data is aggregated to the regional or country-level
- So, where a region has mutiple cities reporting AQ data, the region value is aggregate of the cities
- Where a region has only a single city, that city represents the whole region
- Where a region has no cities, NADA
- Social Distancing Stringency Index and Policy Indicators via Oxford Covid Government Response Tracker
- Google Mobility Data
- Apple Mobility Index
- GDP Per Capita via the OECD and WorldBank
- utilizing real 2016 Purchasing Power Parity figures indexed to 2015 US dollars
- Causes of Death
- Travel Popularity
- An even messier hodgepodge of data pulled from the World Tourism Organization via indexmundi
- State/Provincial data were derived from the country-level and other various sources in an ad-hoc fashion
- Good travel data is surprisingly difficult to come by. There are a number of services that offer data on flight statistics, however, it is prohibitively expensive
3.2 Dataset Characteristics
With see19
installed, we can download the dataset via get_baseframe
import numpy as np
import pandas as pd
# from see19 import get_baseframe
from casestudy.see19.see19 import get_baseframe
bf = get_baseframe()
HBox(children=(FloatProgress(value=0.0, description='Find latest dataset...', layout=Layout(flex='2'), max=3.0…
The dataset is arranged such that each row is a unique entry for each region_id
on each date
All other columns are the value of that particular factor in that particular region on that particular date
bf.head(3)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | country_id | region_code | region_name | country_code | country | date | cases | deaths | tests | ... | genito | childbirth | perinatal | congenital | other | external | visitors | travel_year | gdp | gdp_year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 282 | 110 | ABR | Abruzzo | ITA | Italy | 2020-01-01 | NaN | NaN | NaN | ... | 442.0 | 1.0 | 16.0 | 19.0 | 384.0 | 2059 | 181458.0 | 2017.0 | 4.560860e+10 | 2016.0 |
1 | 282 | 110 | ABR | Abruzzo | ITA | Italy | 2020-01-02 | NaN | NaN | NaN | ... | 442.0 | 1.0 | 16.0 | 19.0 | 384.0 | 2059 | 181458.0 | 2017.0 | 4.560860e+10 | 2016.0 |
2 | 282 | 110 | ABR | Abruzzo | ITA | Italy | 2020-01-03 | NaN | NaN | NaN | ... | 442.0 | 1.0 | 16.0 | 19.0 | 384.0 | 2059 | 181458.0 | 2017.0 | 4.560860e+10 | 2016.0 |
3 rows × 132 columns
This could perhaps be more appropriately structured as a multi-index frame, however, I find such indexes cumbersome to work with.
'There are {} unique regions in the dataset'.format(bf.region_id.unique().size)
'There are 325 unique regions in the dataset'
Australia, Brazil, Canada, China, Italy, and the US have state/provincial level data.
For example, regions within Italy and Brazil are as follows:
bf[bf.country.isin(['Italy', 'Brazil'])].region_name.unique()
array(['Abruzzo', 'Acre', 'Alagoas', 'Amapa', 'Amazonas', 'Bahia',
'Basilicata', 'Calabria', 'Campania', 'Ceara', 'Distrito Federal',
'Emilia-Romagna', 'Espirito Santo', 'Friuli Venezia Giulia',
'Goias', 'Lazio', 'Liguria', 'Lombardia', 'Maranhao', 'Marche',
'Mato Grosso', 'Mato Grosso Do Sul', 'Minas Gerais', 'Molise',
'P.A. Bolzano', 'P.A. Trento', 'Para', 'Paraiba', 'Parana',
'Pernambuco', 'Piaui', 'Piemonte', 'Puglia', 'Rio De Janeiro',
'Rio Grande Do Norte', 'Rio Grande Do Sul', 'Rondonia', 'Roraima',
'Santa Catarina', 'Sao Paulo', 'Sardegna', 'Sergipe', 'Sicilia',
'Tocantins', 'Toscana', 'Umbria', "Valle d'Aosta", 'Veneto'],
dtype=object)
'Each region has {} dates in the dataset'.format(bf.date.unique().size)
'Each region has 202 dates in the dataset'
"""Thus, there are {:,.0f} rows in the dataset, with one row for each unique `region_id`-`date` combination""" \
.format(bf.date.shape[0])
'Thus, there are 65,650 rows in the dataset, with one row for each unique `region_id`-`date` combination'
"""There are currently {} columns in the dataset, most of which are observable factors""".format(bf.columns.size)
'There are currently 132 columns in the dataset, most of which are observable factors'
The factors can be seen as split between two types:
-
Time-static factors, i.e. do not change by the date.
- population, density, population demographic ranges, cause of death outcomes, travel popularity
-
Time-dynamic factors, i.e. change with each date.
- fatalities, climate, pollution, mobility, and the Oxford stringency index
They can be found as follows:
ny = bf[bf.region_name == 'New York']
static = []
dynamic = []
for col in ny.columns:
if ny[col].unique().size > 1:
dynamic.append(col)
else:
static.append(col)
bold = '\033[1m'
end = '\033[0m'
print ('{}***STATIC***{}\n'.format(bold, end), static)
print ('\n')
print ('{}***DYNAMIC***{}\n'.format(bold, end), dynamic)
[1m***STATIC***[0m
['region_id', 'country_id', 'region_code', 'region_name', 'country_code', 'country', 'population', 'land_KM2', 'land_dens', 'city_KM2', 'city_dens', 'A00_04B', 'A05_09B', 'A10_14B', 'A15_19B', 'A20_24B', 'A25_29B', 'A30_34B', 'A35_39B', 'A40_44B', 'A45_49B', 'A50_54B', 'A55_59B', 'A60_64B', 'A65_69B', 'A70_74B', 'A75_79B', 'A80_84B', 'A09UNDERB', 'A14UNDERB', 'A19UNDERB', 'A24UNDERB', 'A29UNDERB', 'A34UNDERB', 'A65PLUSB', 'A70PLUSB', 'A75PLUSB', 'A80PLUSB', 'A85PLUSB', 'A05_19B', 'A05_24B', 'A05_29B', 'A05_34B', 'A15_24B', 'A15_29B', 'A15_34B', 'A20_29B', 'A20_34B', 'A35_54B', 'A40_54B', 'A45_54B', 'A35_64B', 'A40_64B', 'A45_64B', 'pm10', 'precipitation', 'wd', 'uvi', 'aqi', 'pol', 'mepaqi', 'pm1', 'e3', 'e4', 'h4', 'h5', 'transit_apple', 'walking_apple', 'year', 'neoplasms', 'blood', 'endo', 'mental', 'nervous', 'circul', 'infectious', 'respir', 'digest', 'skin', 'musculo', 'genito', 'childbirth', 'perinatal', 'congenital', 'other', 'external', 'visitors', 'travel_year', 'gdp', 'gdp_year']
[1m***DYNAMIC***[0m
['date', 'cases', 'deaths', 'tests', 'co', 'dew', 'humidity', 'no2', 'o3', 'pm25', 'pressure', 'so2', 'temperature', 'wind gust', 'wind speed', 'wind-gust', 'wind-speed', 'temp', 'dewpoint', 'uvb', 'rhum', 'c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7', 'c8', 'e1', 'e2', 'h1', 'h2', 'h3', 'strindex', 'retail_n_rec', 'groc_n_pharm', 'parks', 'transit', 'workplaces', 'residential', 'driving_apple']
'The entire set has {:,.0f} different data points'.format(bf.size)
'The entire set has 8,665,800 different data points'
3.3 The Testset
A separate dataset, referred to as the testset
, is housed in the see19
repo in the testset
folder.
The testset
will include new data (either additional factors or new regions) that has not yet been incorporated in the see19
interface. The goal is to integrate the new data into the interface over time. The testset
will be update concurrently with the main dataset on an adhoc basis.
The existing see19
package is NOT be compatiable with the testset
, HOWEVER you can download the testset
via get_baseframe
by setting test=True
.
See the readme
for additional data currently available in the testset
.
bf_test = get_baseframe(test=True)
HBox(children=(FloatProgress(value=0.0, description='Find latest testset...', layout=Layout(flex='2'), max=3.0…
3.4 Disclaimer
I have said before and it bears repeating: This is an imperfect dataset. Specific problems are highlighted here.
GENERAL ISSUES
-
Not all factors have available measurements for each region or each date.
- These are typically expressed as
NaN
- These are typically expressed as
-
Some factors are available at regional levels while others are not
- Measurements for a region are often compared to other measurements at the country level. This isn't necessarily problematic ... for large geographic and populous countries like the US, it is likely better that state-level data is used to compare to other smaller countries.
- State-level measurements are often estimate by mixing separate data sources. For instance, Visitor data for the provinces of Brazil was estimated by taking the country-level data from the World Tourism Organization and weighting it by the province's proportionate share in visitor travel from separate data from the Brazilian government.
-
Some data is outdated.
- GDP data lags signficantly particularly for large groups of countries, so 2016 figures have been used, presuming that the relative mix among countries has remained constant
DENSITY
Population density is oft-cited as a potential explanatory factor in COVID19 infection rates. And I couldn't agree more that it is important to consider. However, the study of density suffers from many issues.
-
Denisty is highly variable within regions. And case and fatality rates have been highly variable within regions and across densities. In New York City, for example, some of the least dense regions have had the highest infection rates.
-
With only regional data available, to be rigourous the safest option is to simple choose the density of the region. However, this is often a poor reflection of reality. New York State actually has signficant land mass despite most of its population residing on a tiny island on the southeastern edge.
-
To account for this, See19 includes a factor
city_dens
.city_dens
is the density of the largest city in the region, so :- for New York State,
city_dens
is the density of New York City, - for Taiwan,
city_dens
is the density of Taipei, - for Japan,
city_dens
is the density of Tokyo, and so on.
This approach results in its own issues. For instance, at present, for all of Russia,
city_dens
reflects the density of Moscow. - for New York State,
Other geographic measurements, such as temperature
and uvb radiation
suffer from similar issues.
The only true way to address these shortcomings is for daily case and fatality statistics to be released at the county-level (or equivalent) in every country around the globe.
CASE DATA
Aside from just the difficulties of aggregating data, there are well-documented issues with the underlying case and fatality counts as well.
-
Confirmed cases are likely well below actual cases given up to 50% of all COVID19 cases may be asymptomatic and limited testing in the early stages led to many symptomatic cases going unreported.
-
The rapid improvement in testing likely exaggerated the growth of infections over time
-
Fatalities were unreported at peak periods due to lack of health care capacity
-
Fatalities have been retroactively added to data, without adjusting back to the days the fatalities actually occured, so for regions like Hubei and New York state, there are massive spikes in fatalities that don't reflect the actual experience.
-
China has been heavily criticized for under-reporting, late-reporting, and recently added ~20% increase in cumulative fatalities on a random day in March. For these reasons, throughout this tutorial, you will see that China is often excluded from the dataset.
TESTING
Testing statistics are still a bit of a mess internationally. For instance, many European countries only report cumulative test counts on a weekly basis and many have only begun reporting in the vary recent past. Different methods of interpolation are available in the CaseStudy
interface.
- Brazil is not currently included in
tests
data. Brazil test counts are only currently available on the country level whereas case and fatality data is available on a regional level. Methods are being considered to allocate aggregate tests among the regions (perhaps simply as percentage of population or cases counts).
4. the Casestudy Interface
4.1 Basics
4.2 Filtering
4.3 Smoothing
4.4 Available Factors
4.5 Additional Flags
4.6 RayStudy v BaseStudy
4.7 Chart Objects
See19 Visualization and Data analysis is completed via the CaseStudy
class. CaseStudy
provides attributes and methods for filtering, manipulating, appending, and visualizing data in the baseframe.
CaseStudy
can be accessed directly from the see19
module. To initialize, simply pass the baseframe.
# from see19 import CaseStudy
from casestudy.see19.see19 import CaseStudy
casestudy = CaseStudy(bf)
4.1 Basics
The original baseframe can be accessed via the baseframe
attribute
casestudy.baseframe.head(2)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | country_id | region_code | region_name | country_code | country | date | cases | deaths | tests | ... | genito | childbirth | perinatal | congenital | other | external | visitors | travel_year | gdp | gdp_year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 282 | 110 | ABR | Abruzzo | ITA | Italy | 2020-01-01 | NaN | NaN | NaN | ... | 442.0 | 1.0 | 16.0 | 19.0 | 384.0 | 2059 | 181458.0 | 2017.0 | 4.560860e+10 | 2016.0 |
1 | 282 | 110 | ABR | Abruzzo | ITA | Italy | 2020-01-02 | NaN | NaN | NaN | ... | 442.0 | 1.0 | 16.0 | 19.0 | 384.0 | 2059 | 181458.0 | 2017.0 | 4.560860e+10 | 2016.0 |
2 rows × 132 columns
CaseStudy
automatically computes different adjustments including:
- Daily new cases, fatalities, and tests (called
count_types
) - Daily Moving Average (DMA) for new and cumulative count_types
- Population and density adjustments for new and cumulative count_types
- Daily growth or change in 1. thru 3. above
These adjustments are referred to as count_categories
. Additional adjustments are available via kwargs to be discussed below.
Ajustments are added to the dataset by calling the make
method. The amended dataset is the accessible via the df
attribute.
casestudy.make()
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=502.0, style=ProgressStyle(description_widt…
HBox(children=(FloatProgress(value=0.0, max=285.0), HTML(value='')))
The amended dataframe can be accessed via the df
attribute:
casestudy.df.head(2)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | country_id | region_code | region_name | country_code | country | date | cases | deaths | tests | ... | growth_cases_per_person_per_city_KM2 | growth_deaths_per_1K | growth_deaths_per_1M | growth_deaths_per_person_per_land_KM2 | growth_deaths_per_person_per_city_KM2 | growth_tests_per_1K | growth_tests_per_1M | growth_tests_per_person_per_land_KM2 | growth_tests_per_person_per_city_KM2 | days | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
43906 | 32 | 110 | TRE | P.A. Trento | ITA | Italy | 2020-03-13 | 216.699585 | 1.87999 | 803.712436 | ... | 1.523364 | 2.0 | 2.0 | 2.0 | 2.0 | 1.426644 | 1.426644 | 1.426644 | 1.426644 | 0 days |
43907 | 32 | 110 | TRE | P.A. Trento | ITA | Italy | 2020-03-14 | 273.865733 | 1.87999 | 955.714788 | ... | 1.263804 | 1.0 | 1.0 | 1.0 | 1.0 | 1.189125 | 1.189125 | 1.189125 | 1.189125 | 1 days |
2 rows × 140 columns
NOTE: Ray and Numba are utilized to significantly improve the speed of make
. Ray is not compatible with Windows. CaseStudy
will attempt to detect incompatibility and revert to a single-process method where applicable.
More in Section 4.5
For ease of selection, CaseStudy
has a number of class attributes with different groupings of count categories: BASECOUNT_CATS
, PER_CATS
, LOGNAT_CATS
, LOG_CATS
, ALL_CATS
, DMA_COUNT_CATS
, PER_COUNT_CATS
.
DMA_COUNT_CATS
is shown as an example:
CaseStudy.DMA_COUNT_CATS[:10]
['cases_dma',
'cases_new_dma',
'deaths_dma',
'deaths_new_dma',
'tests_dma',
'tests_new_dma',
'cases_dma_per_1K',
'cases_dma_per_1M',
'cases_dma_per_person_per_land_KM2',
'cases_dma_per_person_per_city_KM2']
Both the log10 and natural of each of 1. thru 3. above are available for presentation purposes. Simply provide log=True
and/or lognat=True
and/or .
casestudy.log = True
casestudy.lognat = True
casestudy.make()
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=502.0, style=ProgressStyle(description_widt…
HBox(children=(FloatProgress(value=0.0, max=285.0), HTML(value='')))
casestudy.df[['region_name', 'date'] + [col for col in casestudy.df if 'log' in col]].head(2)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_name | date | cases_dma_log | cases_new_log | cases_new_dma_log | deaths_dma_log | deaths_new_log | deaths_new_dma_log | tests_dma_log | tests_new_log | ... | growth_cases_per_person_per_land_KM2_lognat | growth_cases_per_person_per_city_KM2_lognat | growth_deaths_per_1K_lognat | growth_deaths_per_1M_lognat | growth_deaths_per_person_per_land_KM2_lognat | growth_deaths_per_person_per_city_KM2_lognat | growth_tests_per_1K_lognat | growth_tests_per_1M_lognat | growth_tests_per_person_per_land_KM2_lognat | growth_tests_per_person_per_city_KM2_lognat | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
43906 | P.A. Trento | 2020-03-13 | 2.186879 | 1.871859 | 1.691872 | -0.026874 | -0.026874 | -0.202966 | 2.794193 | 2.380851 | ... | -1.014299 | -1.014299 | 0.890089 | 2.152714 | 0.867427 | 0.867427 | 4.976355 | 1.050782 | 1.304384 | 1.304384 |
43907 | P.A. Trento | 2020-03-14 | 2.324156 | 1.757139 | 1.757139 | 0.194974 | NaN | -0.202966 | 2.888888 | 2.181850 | ... | 2.104604 | 2.104604 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.389530 | 1.023559 | 1.113758 | 1.113758 |
2 rows × 242 columns
'In total, there are {} different `count_categories` to choose from.'.format(len(CaseStudy.ALL_COUNT_CATS))
'In total, there are 180 different `count_categories` to choose from.'
4.2 Filtering
Thankfully, casestudy.df
can be limited to specific count categories via the count_categories
attribute:
casestudy.count_categories = ['tests_new_dma_per_person_per_land_KM2']
casestudy.make()
casestudy.df.head(2)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=502.0, style=ProgressStyle(description_widt…
HBox(children=(FloatProgress(value=0.0, max=285.0), HTML(value='')))
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | country_id | region_code | region_name | country_code | country | date | cases | deaths | tests | population | land_KM2 | land_dens | city_KM2 | city_dens | tests_new_dma_per_person_per_land_KM2 | days | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
43906 | 32 | 110 | TRE | P.A. Trento | ITA | Italy | 2020-03-13 | 216.699585 | 1.87999 | 803.712436 | 515201.0 | 2938.79544 | 175.310262 | 2938.79544 | 175.310262 | 0.807438 | 0 days |
43907 | 32 | 110 | TRE | P.A. Trento | ITA | Italy | 2020-03-14 | 273.865733 | 1.87999 | 955.714788 | 515201.0 | 2938.79544 | 175.310262 | 2938.79544 | 175.310262 | 0.865241 | 1 days |
When passing kwargs to CaseStudy at initialization, most kwargs will accept either a string for a single category or a list (or other iterable) for multiple. When assigning to an instance attribute, an interable must be passed
casestudy = CaseStudy(bf, count_categories='tests_new_dma_per_person_per_land_KM2')
casestudy.make()
casestudy.df[['region_name', 'date', 'tests_new_dma_per_person_per_land_KM2']].head(2)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=502.0, style=ProgressStyle(description_widt…
HBox(children=(FloatProgress(value=0.0, max=285.0), HTML(value='')))
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_name | date | tests_new_dma_per_person_per_land_KM2 | |
---|---|---|---|
43906 | P.A. Trento | 2020-03-13 | 0.807438 |
43907 | P.A. Trento | 2020-03-14 | 0.865241 |
casestudy.count_categories = ['deaths_new_dma_per_person_per_land_KM2', 'growth_cases_new_per_1M']
casestudy.make()
casestudy.df.head(2)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=502.0, style=ProgressStyle(description_widt…
HBox(children=(FloatProgress(value=0.0, max=285.0), HTML(value='')))
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | country_id | region_code | region_name | country_code | country | date | cases | deaths | tests | population | land_KM2 | land_dens | city_KM2 | city_dens | deaths_new_dma_per_person_per_land_KM2 | growth_cases_new_per_1M | days | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
43906 | 32 | 110 | TRE | P.A. Trento | ITA | Italy | 2020-03-13 | 216.699585 | 1.87999 | 803.712436 | 515201.0 | 2938.79544 | 175.310262 | 2938.79544 | 175.310262 | 0.003575 | 1.866667 | 0 days |
43907 | 32 | 110 | TRE | P.A. Trento | ITA | Italy | 2020-03-14 | 273.865733 | 1.87999 | 955.714788 | 515201.0 | 2938.79544 | 175.310262 | 2938.79544 | 175.310262 | 0.003575 | 0.767857 | 1 days |
CaseStudy
can further filter baseframe
as follows:
regions
to limit the frame to certain regionscountries
to limit the frame to certain countriesexclude_regions
to exclude certain regionsexclude_countries
to exclude certain countries
Specific regions can be included or excluded by providing the region_name
, region_code
, or region_id
.
Specific countries can be included or excluded by providing the country
, country_code
, or country_id
.
Each of the four parameters can accept a single region as a str
object or multiple regions via several common iterables.
Below we select three regions:
regions = ['New York', 'FL', 35]
casestudy = CaseStudy(
bf, regions=regions, count_categories=CaseStudy.BASECOUNT_CATS,
)
casestudy.make()
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=5.0, style=ProgressStyle(description_width=…
HBox(children=(FloatProgress(value=0.0, max=3.0), HTML(value='')))
We can see that all three regions are indeed in the object by grouping:
pd.concat([df_group.iloc[:1] for region_id, df_group in casestudy.df.groupby('region_id')]).head(3)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | country_id | region_code | region_name | country_code | country | date | cases | deaths | tests | ... | cases_dma | cases_new | cases_new_dma | deaths_dma | deaths_new | deaths_new_dma | tests_dma | tests_new | tests_new_dma | days | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
53399 | 35 | 110 | SIC | Sicilia | ITA | Italy | 2020-03-12 | 102.712067 | 2.000000 | 973.321711 | ... | 77.406196 | 28.580749 | 15.778955 | 0.666667 | 2.000000 | 0.666667 | 796.493912 | 186.492921 | 140.803254 | 0 days |
17846 | 64 | 236 | FL | Florida | USA | United States of America (the) | 2020-03-11 | 28.000000 | 2.526828 | 329.000000 | ... | 21.666667 | 9.000000 | 3.666667 | 0.842276 | 2.526828 | 0.842276 | 242.666667 | 88.000000 | 64.666667 | 0 days |
40070 | 75 | 236 | NY | New York | USA | United States of America (the) | 2020-03-15 | 729.000000 | 3.143533 | 6916.080830 | ... | 558.000000 | 205.000000 | 171.000000 | 1.047844 | 3.143533 | 1.047844 | 5149.016931 | 2583.035500 | 2170.676861 | 0 days |
3 rows × 25 columns
The region and country filters are important mechanisms for isolating data.
Here, we focus on US regions only, but exclude some of the most impacted ones:
casestudy.countries = ['USA']
casestudy.excluded_regions = ['NY', 'NJ']
casestudy.regions = None
casestudy.make()
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=120.0, style=ProgressStyle(description_widt…
HBox(children=(FloatProgress(value=0.0, max=48.0), HTML(value='')))
Because certain regions were assigned in the previous CaseStudy instantiation, we must set regions=None
above in order to ask ALL the regions of the baseframe.
And below we can see that we have various US states in the dataset and that New York or New Jersey are not included.
casestudy.df.region_name.unique()
array(['Alabama', 'Wyoming', 'Alaska', 'Arkansas', 'Delaware', 'Idaho',
'Maine', 'Mississippi', 'Montana', 'New Mexico', 'North Dakota',
'South Dakota', 'West Virginia', 'Michigan', 'Vermont', 'Georgia',
'Colorado', 'Florida', 'Oregon', 'Texas', 'Illinois',
'Pennsylvania', 'Iowa', 'Maryland', 'North Carolina', 'Washington',
'California', 'Massachusetts', 'Oklahoma', 'Arizona',
'Connecticut', 'Minnesota', 'Virginia', 'New Hampshire', 'Hawaii',
'Nevada', 'Indiana', 'Kentucky', 'District of Columbia',
'Missouri', 'Louisiana', 'Ohio', 'Wisconsin', 'Kansas', 'Utah',
'Tennessee', 'South Carolina', 'Nebraska'], dtype=object)
pd.concat([df_group.iloc[:1] for region_id, df_group in casestudy.df.groupby('region_id')]).head(3)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | country_id | region_code | region_name | country_code | country | date | cases | deaths | tests | ... | cases_dma | cases_new | cases_new_dma | deaths_dma | deaths_new | deaths_new_dma | tests_dma | tests_new | tests_new_dma | days | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
691 | 44 | 236 | AL | Alabama | USA | United States of America (the) | 2020-03-26 | 558.514091 | 1.26695 | 10468.861581 | ... | 369.399307 | 246.143562 | 124.727455 | 0.422317 | 1.26695 | 0.422317 | 7859.521030 | 3287.002892 | 1929.975539 | 0 days |
64339 | 48 | 236 | WY | Wyoming | USA | United States of America (the) | 2020-04-13 | 316.114653 | 1.00000 | 9715.352851 | ... | 305.385913 | 16.093110 | 8.429724 | 0.333333 | 1.00000 | 0.333333 | 9166.923029 | 822.644733 | 529.424828 | 0 days |
1094 | 49 | 236 | AK | Alaska | USA | United States of America (the) | 2020-03-25 | 53.977249 | 1.00000 | 3783.772189 | ... | 42.839087 | 7.711036 | 8.567817 | 0.333333 | 1.00000 | 0.333333 | 2745.528371 | 1496.950677 | 539.260259 | 0 days |
3 rows × 25 columns
casestudy.df[casestudy.df.region_name.isin(['NY', 'NJ'])]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | country_id | region_code | region_name | country_code | country | date | cases | deaths | tests | ... | cases_dma | cases_new | cases_new_dma | deaths_dma | deaths_new | deaths_new_dma | tests_dma | tests_new | tests_new_dma | days |
---|
0 rows × 25 columns
Limiting data via different start and tail hurdles
Parameters exist that allow you to filter the dataset such that regions and days appear only if they meet certain criteria.
start_factor
and start_hurdle
provide the ability to effectively crop the beginning of region's period of data.
tail_factor
and tail_hurdle
do the same for the end of a region's period.
start_factor
and tail_factor
accept any dynamic factor in the dataset (including date
).
The hurdle
is the level of the specified factor the region must reach to be included. For instance, if start_factor=cases_new_per_1M
and start_hurdle=100
, each region's first row in casestudy.df
will be the day that the region met or exceeded 100 new cases per 1 million people.
These options are a convenient way to compare regions that have been impacted to a similar extent or, perhaps, to fairly compare regions that were impacted at different times.
The default parameters for start_factor
and start_hurdle
limit the data to regions with at least one cumulative fatality.
NOTE: a days
column is added to casestudy.df
. This is a count of the number of days from the current date back to the first date in the casestudy. When a start_factor
is provided, this is the first date that the start_hurdle
is met. When start_factor
is not provided, this is the first date in the dataset.
Examples are show below.
casestudy = CaseStudy(
bf, regions='Spain', count_categories=CaseStudy.BASECOUNT_CATS,
start_factor='cases', start_hurdle=1000
)
casestudy.make()
casestudy.df.head(2)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=2.0, style=ProgressStyle(description_width=…
HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | country_id | region_code | region_name | country_code | country | date | cases | deaths | tests | ... | cases_dma | cases_new | cases_new_dma | deaths_dma | deaths_new | deaths_new_dma | tests_dma | tests_new | tests_new_dma | days | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
55820 | 491 | 209 | ESP | Spain | ESP | Spain | 2020-03-09 | 1057.840245 | 27.344784 | NaN | ... | 738.089217 | 394.348647 | 221.163866 | 17.904323 | 10.742594 | 7.487262 | NaN | NaN | NaN | 0 days |
55821 | 491 | 209 | ESP | Spain | ESP | Spain | 2020-03-10 | 1671.052390 | 34.180981 | NaN | ... | 1130.794744 | 613.212146 | 392.705527 | 26.042652 | 6.836196 | 8.138329 | NaN | NaN | NaN | 1 days |
2 rows × 25 columns
casestudy = CaseStudy(
bf, countries='Sweden',
count_categories='deaths_new', start_factor='deaths_new', start_hurdle=100
)
casestudy.make()
casestudy.df.head(2)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=2.0, style=ProgressStyle(description_width=…
HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | country_id | region_code | region_name | country_code | country | date | cases | deaths | tests | population | land_KM2 | land_dens | city_KM2 | city_dens | deaths_new | days | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
56656 | 495 | 214 | SWE | Sweden | SWE | Sweden | 2020-04-06 | 7438.936775 | 675.770207 | NaN | 9415570.0 | 415314.854224 | 22.67092 | 2150.411192 | 4378.497486 | 107.669886 | 0 days |
56657 | 495 | 214 | SWE | Sweden | SWE | Sweden | 2020-04-07 | 7941.679240 | 837.275037 | NaN | 9415570.0 | 415314.854224 | 22.67092 | 2150.411192 | 4378.497486 | 161.504829 | 1 days |
To see the earliest dates in the dataframe, prior to any deaths being recorded, set start_factor
to ''
.
casestudy.countries = None
casestudy.regions = ['RJ']
casestudy.count_categories = ['tests_new_dma']
casestudy.factors = ['temp', 'strindex']
casestudy.start_factor = ''
casestudy.make()
casestudy.df.head(2)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=3.0, style=ProgressStyle(description_width=…
HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | country_id | region_code | region_name | country_code | country | date | cases | deaths | tests | population | land_KM2 | land_dens | city_KM2 | city_dens | tests_new_dma | temp | strindex | days | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
48480 | 557 | 31 | RJ | Rio De Janeiro | BRA | Brazil | 2020-01-01 | NaN | NaN | NaN | 15962668.0 | 42269.311478 | 377.642016 | 2203.766328 | 7243.357792 | NaN | 294.134674 | 0.0 | 0 days |
48481 | 557 | 31 | RJ | Rio De Janeiro | BRA | Brazil | 2020-01-02 | NaN | NaN | NaN | 15962668.0 | 42269.311478 | 377.642016 | 2203.766328 | 7243.357792 | NaN | 294.375153 | 0.0 | 1 days |
4.3 Smoothing
Smoothing is applied two ways within the make
method.
The first addresses NaN values within the count_type
time-series. Sometimes there are artifacts and one-offs within the set. Other times, as with test
counts in many regions, the count is only update periodically and NaNs fill the gaps.
In these instances, make
interpolates between the real values to fill in the gaps. The default method is linear interpolation, but this can be overriden by providing interpolation_method
(see Pandas docs for options).
For instance, below we see that Spain testing data as follows:
casestudy = CaseStudy(bf, regions='Spain')
casestudy.make()
casestudy.df.tests.tail(20)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=3.0, st…
2020-08-02 06:17:58,268 INFO resource_spec.py:212 -- Starting Ray with 12.84 GiB memory available for workers and up to 6.44 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-08-02 06:17:58,495 WARNING services.py:923 -- Redis failed to start, retrying now.
2020-08-02 06:17:58,792 INFO services.py:1165 -- View the Ray dashboard at [1m[32mlocalhost:8265[39m[22m
HBox(children=(FloatProgress(value=0.0, description='changes', max=2.0, style=ProgressStyle(description_width=…
HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))
55934 3.619554e+06
55935 3.644458e+06
55936 3.673778e+06
55937 3.703099e+06
55938 3.732419e+06
55939 3.761740e+06
55940 3.791060e+06
55941 3.820381e+06
55942 3.849701e+06
55943 3.881696e+06
55944 3.913690e+06
55945 3.945685e+06
55946 3.977680e+06
55947 4.009675e+06
55948 4.041669e+06
55949 4.073664e+06
55950 4.073664e+06
55951 4.073664e+06
55952 4.073664e+06
55953 4.073664e+06
Name: tests, dtype: float64
But when we set interpolate=Flase
, we can see that in fact Spain updates its testing only weekly.
casestudy = CaseStudy(bf, regions='Spain', interpolate=False)
casestudy.make()
casestudy.df.tests.tail(20)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=2.0, style=ProgressStyle(description_width=…
HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))
55934 NaN
55935 3644458.0
55936 NaN
55937 NaN
55938 NaN
55939 NaN
55940 NaN
55941 NaN
55942 3849701.0
55943 NaN
55944 NaN
55945 NaN
55946 NaN
55947 NaN
55948 NaN
55949 4073664.0
55950 NaN
55951 NaN
55952 NaN
55953 NaN
Name: tests, dtype: float64
The second approach is new in 0.3.6. CaseStudy automatically applies smoothing to negative values and large outliers in the main count_categories
(cases, deaths, and tests).
Many regions have chosen to "adjust" or "catch up" their case or fatality counts, not be adjusting the actual dates that the outcome occured, but instead on a seemingly random reporting date. This creates strange artifacts in the time series.
For example, Spain has dip in daily case counts to the negative in late April 2020:
casestudy = CaseStudy(bf, regions='Spain', smooth=False)
casestudy.make()
casestudy.compchart.make(x_category='date', y_category='deaths_new', figsize=(8,4))
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=1.0, st…
HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))
Daily Deaths
With smooth=True
(the default setting), this deep negative value is redistributed through prior dates according to the distribution of counts up to the date with the negative value.
This is a somewhat nieve approach but has the benefit of maintaining a consistent shape to the time-series.
casestudy = CaseStudy(bf, regions='Spain', smooth=True)
casestudy.make()
casestudy.compchart.make(x_category='date', y_category='deaths_new', figsize=(8,4))
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=2.0, style=ProgressStyle(description_width=…
HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))
Daily Deaths
The same adjustment is made for VERY large increases in counts relative to the cumulative total and to the daily rate. For example, see New York below:
casestudy = CaseStudy(bf, regions='NY', smooth=False)
casestudy.make()
casestudy.compchart.make(x_category='date', y_category='deaths_new', figsize=(8,4))
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=1.0, st…
HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))
Daily Deaths
casestudy = CaseStudy(bf, regions='NY', smooth=True)
casestudy.make()
casestudy.compchart.make(x_category='date', y_category='deaths_new', figsize=(8,4))
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=2.0, style=ProgressStyle(description_width=…
HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))
Daily Deaths
4.4 Available Factors
The remaining columns in the baseframe
can be included in a CaseStudy
instance on an opt-in basis via the factors
attribute:
casestudy = CaseStudy(bf, count_categories='cases_new_per_person_per_land_KM2', factors=['no2', 'strindex'])
casestudy.make()
casestudy.df.head(2)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=659.0, style=ProgressStyle(description_widt…
HBox(children=(FloatProgress(value=0.0, max=285.0), HTML(value='')))
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | country_id | region_code | region_name | country_code | country | date | cases | deaths | tests | population | land_KM2 | land_dens | city_KM2 | city_dens | cases_new_per_person_per_land_KM2 | no2 | strindex | days | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
43905 | 32 | 110 | TRE | P.A. Trento | ITA | Italy | 2020-03-12 | 131.523112 | 1.096661 | 652.429603 | 515201.0 | 2938.79544 | 175.310262 | 2938.79544 | 175.310262 | 0.210345 | NaN | 85.19 | 0 days |
43906 | 32 | 110 | TRE | P.A. Trento | ITA | Italy | 2020-03-13 | 200.357639 | 2.193322 | 930.784897 | 515201.0 | 2938.79544 | 175.310262 | 2938.79544 | 175.310262 | 0.392644 | NaN | 85.19 | 1 days |
For convenience, a number of factor groupings can be accessed via CaseStudy
attributes:
GMOBIS
,AMOBIS
,CAUSES
,MAJOR_CAUSES
,POLLUTS
,TEMP_MSMTS
,MSMTS
- various groupings for factor data
GMOBIS
refer to Google Mobility data.AMOBIS
refer to Apple Mobility data.
STRINDEX_CATS
,CONTAIN_CATS
,ECON_CATS
,HEALTH_CATS
- groupings for the Oxford Stringency Index
print (CaseStudy.MSMTS)
print (CaseStudy.MAJOR_CAUSES)
['uvb', 'rhum', 'temp', 'dewpoint']
['circul', 'infectious', 'respir', 'endo']
Different demographic population age groupings can be accessed as well:
ALL_RANGES
- all the possible demographic age rangesRANGES
- a dictionary of various groupings of age ranges
from see19 import RANGES
RANGES.keys()
dict_keys(['UNDERS', 'OVERS', 'SCHOOL_GOERS', 'Y_MILLS', 'MILLS', 'MID', 'MID_PLUS'])
overs = RANGES['OVERS']['ranges']
casestudy = CaseStudy(bf, regions='Lombardia', count_categories='deaths_new_per_person_per_land_KM2', factors=overs)
casestudy.make()
casestudy.df.head(2)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=2.0, style=ProgressStyle(description_width=…
HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | country_id | region_code | region_name | country_code | country | date | cases | deaths | tests | ... | A70PLUSB | A75PLUSB | A80PLUSB | A85PLUSB | A65PLUSB_% | A70PLUSB_% | A75PLUSB_% | A80PLUSB_% | A85PLUSB_% | days | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
31566 | 36 | 110 | LOM | Lombardia | ITA | Italy | 2020-02-24 | 216.225177 | 6.0 | 943.732875 | ... | 1490749.0 | 963768.0 | 0.0 | 0.0 | 0.208224 | 0.154784 | 0.100068 | 0.0 | 0.0 | 0 days |
31567 | 36 | 110 | LOM | Lombardia | ITA | Italy | 2020-02-25 | 301.709549 | 9.0 | 2386.747531 | ... | 1490749.0 | 963768.0 | 0.0 | 0.0 | 0.208224 | 0.154784 | 0.100068 | 0.0 | 0.0 | 1 days |
2 rows × 27 columns
casestudy = CaseStudy(bf, regions='LOM', count_categories='deaths_new_per_person_per_land_KM2', factors=CaseStudy.MAJOR_CAUSES)
casestudy.make()
casestudy.df.head(2)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=2.0, style=ProgressStyle(description_width=…
HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | country_id | region_code | region_name | country_code | country | date | cases | deaths | tests | ... | deaths_new_per_person_per_land_KM2 | circul | infectious | respir | endo | circul_% | infectious_% | respir_% | endo_% | days | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
31566 | 36 | 110 | LOM | Lombardia | ITA | Italy | 2020-02-24 | 216.225177 | 6.0 | 943.732875 | ... | NaN | 74695 | 4630 | 20185 | 6566.0 | 0.007756 | 0.000481 | 0.002096 | 0.000682 | 0 days |
31567 | 36 | 110 | LOM | Lombardia | ITA | Italy | 2020-02-25 | 301.709549 | 9.0 | 2386.747531 | ... | 0.00507 | 74695 | 4630 | 20185 | 6566.0 | 0.007756 | 0.000481 | 0.002096 | 0.000682 | 1 days |
2 rows × 25 columns
Some factors are only available at a country level.
By setting country_level=True
, casestudy
will aggregate most data among the subregions up to the country level to allow for proper comparison across the broad range of countries.
The Oxford Stringency Index and its derivatives is one such data group only available at the country level.
casestudy = CaseStudy(bf,
count_categories='deaths_new_per_person_per_land_KM2',
factors='strindex',
country_level=True,
)
casestudy.make()
casestudy.df.tail(2)
/Users/spindicate/Documents/programming/zooscraper/casestudy/see19/see19/study/ray.py:16: UserWarning: smoothing is unavailable when country_level=True
super().__init__(*args, **kwargs)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, max=155.0), HTML(value='')))
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | country_id | region_code | region_name | country_code | country | date | cases | deaths | tests | population | land_KM2 | land_dens | city_KM2 | city_dens | deaths_new_per_person_per_land_KM2 | strindex | days | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
36560 | id_for_USA | 236 | USA | name_for_USA | USA | United States of America (the) | 2020-07-19 | 3725463.0 | 131737.0 | 45313502.0 | 307692971.0 | 9.087502e+06 | 33.858916 | 710152.024025 | 433.277609 | 15.446448 | 68.98 | 144 days |
36561 | id_for_USA | 236 | USA | name_for_USA | USA | United States of America (the) | 2020-07-20 | 3782891.0 | 132095.0 | 46043131.0 | 307692971.0 | 9.087502e+06 | 33.858916 | 710152.024025 | 433.277609 | 10.573286 | 68.98 | 145 days |
Above you can see that all US states have been aggregated into a single region with an region_id
With respect to the STRINDEX_CATS
subgroups, if all the required categories are provided, CaseStudy
will sum the individual category values.
For example, if CONTAIN_CATS
are provided, the aggregate of the eight categories will be included in the c_sum
column.
Note if all five h
indicators are provided, CaseStudy
will also tabulate a key3_sum
, which aggregates the scores on the h1
, h2
, and h3
indicators.
casestudy = CaseStudy(bf,
count_categories='deaths_new_per_person_per_land_KM2',
factors=CaseStudy.CONTAIN_CATS,
country_level=True,
)
casestudy.make()
casestudy.df.tail(2)
/Users/spindicate/Documents/programming/zooscraper/casestudy/see19/see19/study/ray.py:16: UserWarning: smoothing is unavailable when country_level=True
super().__init__(*args, **kwargs)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, max=155.0), HTML(value='')))
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | country_id | region_code | region_name | country_code | country | date | cases | deaths | tests | ... | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c_sum | days | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
36560 | id_for_USA | 236 | USA | name_for_USA | USA | United States of America (the) | 2020-07-19 | 3725463.0 | 131737.0 | 45313502.0 | ... | 3.0 | 2.0 | 2.0 | 4.0 | 1.0 | 2.0 | 2.0 | 3.0 | 19.0 | 144 days |
36561 | id_for_USA | 236 | USA | name_for_USA | USA | United States of America (the) | 2020-07-20 | 3782891.0 | 132095.0 | 46043131.0 | ... | 3.0 | 2.0 | 2.0 | 4.0 | 1.0 | 2.0 | 2.0 | 3.0 | 19.0 | 145 days |
2 rows × 26 columns
Additional computations can be added for each factor via the factor_dmas
attribute.
The attribute is a dictionary of the form str(factor_name): int(dma)
.
When provided, CaseStudy
will automatically add _dma
, _growth
, and _growth_dma
computations
casestudy = CaseStudy(bf, count_categories='deaths_new_dma_per_1M',
factors=['temp', 'c1', 'strindex'],
factor_dmas={'temp': 7, 'c1': 14},
country_level=True,
)
casestudy.make()
casestudy.df.head(2)
/Users/spindicate/Documents/programming/zooscraper/casestudy/see19/see19/study/ray.py:16: UserWarning: smoothing is unavailable when country_level=True
super().__init__(*args, **kwargs)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, max=155.0), HTML(value='')))
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | country_id | region_code | region_name | country_code | country | date | cases | deaths | tests | ... | temp | c1 | strindex | temp_dma | temp_growth | temp_growth_dma | c1_dma | c1_growth | c1_growth_dma | days | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
81 | 293 | 1 | AFG | Afghanistan | AFG | Afghanistan | 2020-03-22 | 40.0 | 1.0 | NaN | ... | 10.778741 | 3.0 | 41.67 | 7.908977 | 1.067747 | 1.384819 | 1.928571 | 1.0 | NaN | 0 days |
82 | 293 | 1 | AFG | Afghanistan | AFG | Afghanistan | 2020-03-23 | 40.0 | 1.0 | NaN | ... | 8.560785 | 3.0 | 41.67 | 8.784692 | 0.794229 | 1.150845 | 2.142857 | 1.0 | NaN | 1 days |
2 rows × 26 columns
NOTE: When country_level=True
, smooth
is currently NOT available as per warning and Ray multi-processing is also NOT available.
To provide a single dma for all the factors submitted, build the dictionary ahead of time:
factor_dmas = {msmt: 14 for msmt in CaseStudy.MSMTS}
casestudy = CaseStudy(
bf, count_categories='tests_new_per_1M',
factors=CaseStudy.MSMTS, factor_dmas=factor_dmas
)
casestudy.make()
casestudy.df.head(2)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=659.0, style=ProgressStyle(description_widt…
HBox(children=(FloatProgress(value=0.0, max=285.0), HTML(value='')))
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | country_id | region_code | region_name | country_code | country | date | cases | deaths | tests | ... | rhum_dma | rhum_growth | rhum_growth_dma | temp_dma | temp_growth | temp_growth_dma | dewpoint_dma | dewpoint_growth | dewpoint_growth_dma | days | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
43905 | 32 | 110 | TRE | P.A. Trento | ITA | Italy | 2020-03-12 | 131.523112 | 1.096661 | 652.429603 | ... | 90.025840 | 1.050915 | 0.996733 | 3.513738 | 0.959184 | 1.105750 | -3.142554 | 1.896068 | -0.635699 | 0 days |
43906 | 32 | 110 | TRE | P.A. Trento | ITA | Italy | 2020-03-13 | 200.357639 | 2.193322 | 930.784897 | ... | 89.967379 | 0.995192 | 1.001809 | 3.242550 | 1.053689 | 1.114479 | -3.447804 | 1.026207 | -0.735813 | 1 days |
2 rows × 33 columns
Other factors are adjusted to population. These factors are appended with _%
and can be seen via the pop_cats
attribute.
These are typically time-static factors.
casestudy = CaseStudy(bf, count_categories='deaths_new_dma_per_1M', factors=['visitors', 'gdp', 'A65PLUSB' ])
print (casestudy.pop_cats)
casestudy.make()
casestudy.df[['region_name', 'date', 'visitors_%', 'gdp_%', 'A65PLUSB_%']].head(2)
['A65PLUSB', 'visitors', 'gdp']
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=659.0, style=ProgressStyle(description_widt…
HBox(children=(FloatProgress(value=0.0, max=285.0), HTML(value='')))
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_name | date | visitors_% | gdp_% | A65PLUSB_% | |
---|---|---|---|---|---|
43905 | P.A. Trento | 2020-03-12 | 19.864474 | 54504.746691 | 0.203018 |
43906 | P.A. Trento | 2020-03-13 | 19.864474 | 54504.746691 | 0.203018 |
4.5 Additional Flags
There are several additional flags and methods that will be touched on briefly, however, you are encouraged to read the analysis pages to see them in action.
-
world_averages
: when set toTrue
, averages each date in the dataset across all the regions, to provide a per_region statistic for each factor -
favor_earlier
: when set toTrue
, scales any selected rows such that values earlier in the dataset receive more weight than later ones. A new column is added with the_earlier
suffix. This is helpful when attempting to study the impacts of early moves to, say, social distance. Factors are selected by passing a list to thefactors_to_favor_earlier
parameter.
4.6 RayStudy v BaseStudy
The default implementation of make
utilizes both Ray and Numba to significantly improve the performance.
Ray is a 3rd party multi-processing package. For see19 purposes, Ray's key feature is the ability to share (albeit read-only) large objects among different live processes. Python's standard multi-processing module does not allow for simple access to the baseframe and, therefore, did not provide any performance benefits.
Numba provides just-in-time compiling of certain numpy implementations. The custom Numba function typically provides 10x speed improvement versus the same built-in Pandas method.
Ray is not compatible with Windows. CaseStudy
will attempt to detect incompatibility and revert to a single-process method where necessary.*
To support this, a root BaseStudy
implementation provides single process functionality and a RayStudy
child that implements Ray functionality. CaseStudy
inherits from either class automatically based on operating system.
You can see which class is inherited as per below (this is on a Macbook)
CaseStudy.__bases__
(casestudy.see19.see19.study.ray.RayStudy,)
To use the non-Ray implementation, you can either import BaseStudy
directly or set use_ray=False
on CaseStudy
.
We can see both approaches provide similar results below.
# from see19.study.base import BaseStudy
from casestudy.see19.see19.study.base import BaseStudy
from datetime import datetime as dt
def clockwrap(func):
def wrapper(*args, **kwargs):
start = dt.now()
func()
end = dt.now()
return end - start
return wrapper()
casestudy = BaseStudy(bf)
dur1 = clockwrap(casestudy.make)
print (dur1)
/Users/spindicate/Documents/programming/envs/zooenv/lib/python3.7/site-packages/ipykernel_launcher.py:1: UserWarning: It looks like you called BaseStudy directly. This is not recommended. Ray provides significant performance improvements and certain BaseStudy methods are not optimized.
"""Entry point for launching an IPython kernel.
HBox(children=(FloatProgress(value=0.0, max=537.0), HTML(value='')))
HBox(children=(FloatProgress(value=0.0, max=298.0), HTML(value='')))
HBox(children=(FloatProgress(value=0.0, max=285.0), HTML(value='')))
0:00:28.674439
casestudy = CaseStudy(bf, use_ray=False)
dur2 = clockwrap(casestudy.make)
print (dur2)
/Users/spindicate/Documents/programming/envs/zooenv/lib/python3.7/site-packages/ipykernel_launcher.py:1: UserWarning: use_ray set to False. This is not recommended. Ray provides significant performance improvements and certain BaseStudy methods are not optimized.
"""Entry point for launching an IPython kernel.
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, max=537.0), HTML(value='')))
HBox(children=(FloatProgress(value=0.0, max=298.0), HTML(value='')))
HBox(children=(FloatProgress(value=0.0, max=285.0), HTML(value='')))
0:00:27.573194
Now we'll compare that with the default Ray implemenation on an 8-core MacBook Pro.
casestudy = CaseStudy(bf)
dur3 = clockwrap(casestudy.make)
print (dur3)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=659.0, style=ProgressStyle(description_widt…
HBox(children=(FloatProgress(value=0.0, max=285.0), HTML(value='')))
0:00:06.225569
diff = 1 - dur3 / (np.mean([dur1, dur2]))
print ('You can see that the Ray implementation is \033[4m\033[1m{:.2%}\033[0m faster.'.format(diff))
You can see that the Ray implementation is [4m[1m77.86%[0m faster.
Note: Both Numba and Ray perform caching on the first call of a function. Thus, on the first session call to make() method, there will be additional delay (due to many functions being cached). All subsequent calls will experience the significant performance improvements.
4.7 Chart Objects
Each casestudy object currently contains 6 different chart objects, that provide visual tools for analysising, assessing and comparing COVID-19s impact on different regions and factors. Each chart is created via matplotlib. Details of each chart object are provided in future sections.
The chart classes can be found in the chart
module, along with the BaseChart
root which provides common functionality.
compchart from CompChart2D
compchart4d from CompChart4D
heatmap from HeatMap
barcharts from BarCharts
scatterflow from ScatterFlow
substrinscat from SubStrindexScatter
Each chart has been designed to align closely with the CaseStudy
functionality and with the underlying functionality of matplotlib.
For instance, each chart is called via the make
method.
casestudy.regions = ['NY', 'NJ']
casestudy.make()
leg = {'fontsize': 12, 'handlelength': 1}
casestudy.compchart.make(x_category='days', y_category='cases', figsize=(8,4), legend_params=leg)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=5.0, style=ProgressStyle(description_width=…
HBox(children=(FloatProgress(value=0.0, max=2.0), HTML(value='')))
Cumulative Cases
Each chart object is automatically updated on each make
call, so any changes to the casestudy
object, will also be reflected in the charts.
casestudy.regions = ['AB', 'ON']
casestudy.make()
casestudy.compchart.make(x_category='days', y_category='cases', figsize=(8,4), legend_params=leg)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=4.0, style=ProgressStyle(description_width=…
HBox(children=(FloatProgress(value=0.0, max=2.0), HTML(value='')))
Cumulative Cases
Note a prior version of see19 implemented compchart using Bokeh. This chart is deprecated and replaced with a matplotlib version but is still avialable under CompChart2DBokeh.
5. compchart - Visualizing Regional Impacts
5.1 Daily Fatalities Comparison - Italy
5.2 Daily Fatalities Comparison - 5 Most Impacted Regions
5.3 Varying the Categories
compchart
attribute is an instance of the CompChart2D
class and provides standard line graphs comparing regions on different categories provided to x_category
& y_category
. Time-series is supported when x_category='date'
.
Charts are available in multi-line format with optional overlay of a second factor on a separate y-axis.
5.1 Daily Fatalities Comparison - Italy
We will illustrate with an example, focusing on only the three most impacted regions in Italy.
itaregs = bf[bf['country'] == 'Italy'] \
.sort_values(by='deaths', ascending=False).region_name.unique().tolist()[:3]
casestudy = CaseStudy(bf, regions=itaregs, start_hurdle=3, start_factor='deaths', smooth=False)
casestudy.make()
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=1.0, st…
HBox(children=(FloatProgress(value=0.0, max=3.0), HTML(value='')))
When CaseStudy
is instantiated, compchart
is also instantiated with its own attributes.
print (casestudy.compchart)
<casestudy.see19.see19.charts.CompChart2D object at 0x32dee3950>
In particular, all the various available categories are automatically provided labels via the label
attribute. A few are shown below for illustration purposes.
for k,v in casestudy.compchart.labels.items():
print ('{}: {}'.format(k, v))
if k == 'temp':
break
cases_dma: Cumulative Cases (3DMA)
cases_new: Daily Cases
cases_new_dma: Daily Cases (3DMA)
deaths_dma: Cumulative Deaths (3DMA)
deaths_new: Daily Deaths
deaths_new_dma: Daily Deaths (3DMA)
tests_dma: Cumulative Tests (3DMA)
tests_new: Daily Tests
tests_new_dma: Daily Tests (3DMA)
cases: Cumulative Cases
deaths: Cumulative Deaths
tests: Cumulative Tests
cases_dma_per_1K: Cumulative Cases per 1K (3DMA)
cases_dma_per_1M: Cumulative Cases per 1M (3DMA)
cases_dma_per_person_per_land_KM2: Cumulative Cases / Person / Land KM² (3DMA)
cases_dma_per_person_per_city_KM2: Cumulative Cases / Person / City KM² (3DMA)
cases_new_per_1K: Daily Cases per 1K
cases_new_per_1M: Daily Cases per 1M
cases_new_per_person_per_land_KM2: Daily Cases / Person / Land KM²
cases_new_per_person_per_city_KM2: Daily Cases / Person / City KM²
cases_new_dma_per_1K: Daily Cases per 1K (3DMA)
cases_new_dma_per_1M: Daily Cases per 1M (3DMA)
cases_new_dma_per_person_per_land_KM2: Daily Cases / Person / Land KM² (3DMA)
cases_new_dma_per_person_per_city_KM2: Daily Cases / Person / City KM² (3DMA)
deaths_dma_per_1K: Cumulative Deaths per 1K (3DMA)
deaths_dma_per_1M: Cumulative Deaths per 1M (3DMA)
deaths_dma_per_person_per_land_KM2: Cumulative Deaths / Person / Land KM² (3DMA)
deaths_dma_per_person_per_city_KM2: Cumulative Deaths / Person / City KM² (3DMA)
deaths_new_per_1K: Daily Deaths per 1K
deaths_new_per_1M: Daily Deaths per 1M
deaths_new_per_person_per_land_KM2: Daily Deaths / Person / Land KM²
deaths_new_per_person_per_city_KM2: Daily Deaths / Person / City KM²
deaths_new_dma_per_1K: Daily Deaths per 1K (3DMA)
deaths_new_dma_per_1M: Daily Deaths per 1M (3DMA)
deaths_new_dma_per_person_per_land_KM2: Daily Deaths / Person / Land KM² (3DMA)
deaths_new_dma_per_person_per_city_KM2: Daily Deaths / Person / City KM² (3DMA)
tests_dma_per_1K: Cumulative Tests per 1K (3DMA)
tests_dma_per_1M: Cumulative Tests per 1M (3DMA)
tests_dma_per_person_per_land_KM2: Cumulative Tests / Person / Land KM² (3DMA)
tests_dma_per_person_per_city_KM2: Cumulative Tests / Person / City KM² (3DMA)
tests_new_per_1K: Daily Tests per 1K
tests_new_per_1M: Daily Tests per 1M
tests_new_per_person_per_land_KM2: Daily Tests / Person / Land KM²
tests_new_per_person_per_city_KM2: Daily Tests / Person / City KM²
tests_new_dma_per_1K: Daily Tests per 1K (3DMA)
tests_new_dma_per_1M: Daily Tests per 1M (3DMA)
tests_new_dma_per_person_per_land_KM2: Daily Tests / Person / Land KM² (3DMA)
tests_new_dma_per_person_per_city_KM2: Daily Tests / Person / City KM² (3DMA)
cases_per_1K: Cumulative Cases per 1K
cases_per_1M: Cumulative Cases per 1M
cases_per_person_per_land_KM2: Cumulative Cases / Person / Land KM²
cases_per_person_per_city_KM2: Cumulative Cases / Person / City KM²
deaths_per_1K: Cumulative Deaths per 1K
deaths_per_1M: Cumulative Deaths per 1M
deaths_per_person_per_land_KM2: Cumulative Deaths / Person / Land KM²
deaths_per_person_per_city_KM2: Cumulative Deaths / Person / City KM²
tests_per_1K: Cumulative Tests per 1K
tests_per_1M: Cumulative Tests per 1M
tests_per_person_per_land_KM2: Cumulative Tests / Person / Land KM²
tests_per_person_per_city_KM2: Cumulative Tests / Person / City KM²
cases_dma_lognat: Cumulative Cases (3DMA)
(Natural Log)
cases_new_lognat: Daily Cases
(Natural Log)
cases_new_dma_lognat: Daily Cases (3DMA)
(Natural Log)
deaths_dma_lognat: Cumulative Deaths (3DMA)
(Natural Log)
deaths_new_lognat: Daily Deaths
(Natural Log)
deaths_new_dma_lognat: Daily Deaths (3DMA)
(Natural Log)
tests_dma_lognat: Cumulative Tests (3DMA)
(Natural Log)
tests_new_lognat: Daily Tests
(Natural Log)
tests_new_dma_lognat: Daily Tests (3DMA)
(Natural Log)
cases_lognat: Cumulative Cases
(Natural Log)
deaths_lognat: Cumulative Deaths
(Natural Log)
tests_lognat: Cumulative Tests
(Natural Log)
cases_dma_per_1K_lognat: Cumulative Cases per 1K (3DMA)
(Natural Log)
cases_dma_per_1M_lognat: Cumulative Cases per 1M (3DMA)
(Natural Log)
cases_dma_per_person_per_land_KM2_lognat: Cumulative Cases / Person / Land KM² (3DMA)
(Natural Log)
cases_dma_per_person_per_city_KM2_lognat: Cumulative Cases / Person / City KM² (3DMA)
(Natural Log)
cases_new_per_1K_lognat: Daily Cases per 1K
(Natural Log)
cases_new_per_1M_lognat: Daily Cases per 1M
(Natural Log)
cases_new_per_person_per_land_KM2_lognat: Daily Cases / Person / Land KM²
(Natural Log)
cases_new_per_person_per_city_KM2_lognat: Daily Cases / Person / City KM²
(Natural Log)
cases_new_dma_per_1K_lognat: Daily Cases per 1K (3DMA)
(Natural Log)
cases_new_dma_per_1M_lognat: Daily Cases per 1M (3DMA)
(Natural Log)
cases_new_dma_per_person_per_land_KM2_lognat: Daily Cases / Person / Land KM² (3DMA)
(Natural Log)
cases_new_dma_per_person_per_city_KM2_lognat: Daily Cases / Person / City KM² (3DMA)
(Natural Log)
deaths_dma_per_1K_lognat: Cumulative Deaths per 1K (3DMA)
(Natural Log)
deaths_dma_per_1M_lognat: Cumulative Deaths per 1M (3DMA)
(Natural Log)
deaths_dma_per_person_per_land_KM2_lognat: Cumulative Deaths / Person / Land KM² (3DMA)
(Natural Log)
deaths_dma_per_person_per_city_KM2_lognat: Cumulative Deaths / Person / City KM² (3DMA)
(Natural Log)
deaths_new_per_1K_lognat: Daily Deaths per 1K
(Natural Log)
deaths_new_per_1M_lognat: Daily Deaths per 1M
(Natural Log)
deaths_new_per_person_per_land_KM2_lognat: Daily Deaths / Person / Land KM²
(Natural Log)
deaths_new_per_person_per_city_KM2_lognat: Daily Deaths / Person / City KM²
(Natural Log)
deaths_new_dma_per_1K_lognat: Daily Deaths per 1K (3DMA)
(Natural Log)
deaths_new_dma_per_1M_lognat: Daily Deaths per 1M (3DMA)
(Natural Log)
deaths_new_dma_per_person_per_land_KM2_lognat: Daily Deaths / Person / Land KM² (3DMA)
(Natural Log)
deaths_new_dma_per_person_per_city_KM2_lognat: Daily Deaths / Person / City KM² (3DMA)
(Natural Log)
tests_dma_per_1K_lognat: Cumulative Tests per 1K (3DMA)
(Natural Log)
tests_dma_per_1M_lognat: Cumulative Tests per 1M (3DMA)
(Natural Log)
tests_dma_per_person_per_land_KM2_lognat: Cumulative Tests / Person / Land KM² (3DMA)
(Natural Log)
tests_dma_per_person_per_city_KM2_lognat: Cumulative Tests / Person / City KM² (3DMA)
(Natural Log)
tests_new_per_1K_lognat: Daily Tests per 1K
(Natural Log)
tests_new_per_1M_lognat: Daily Tests per 1M
(Natural Log)
tests_new_per_person_per_land_KM2_lognat: Daily Tests / Person / Land KM²
(Natural Log)
tests_new_per_person_per_city_KM2_lognat: Daily Tests / Person / City KM²
(Natural Log)
tests_new_dma_per_1K_lognat: Daily Tests per 1K (3DMA)
(Natural Log)
tests_new_dma_per_1M_lognat: Daily Tests per 1M (3DMA)
(Natural Log)
tests_new_dma_per_person_per_land_KM2_lognat: Daily Tests / Person / Land KM² (3DMA)
(Natural Log)
tests_new_dma_per_person_per_city_KM2_lognat: Daily Tests / Person / City KM² (3DMA)
(Natural Log)
cases_per_1K_lognat: Cumulative Cases per 1K
(Natural Log)
cases_per_1M_lognat: Cumulative Cases per 1M
(Natural Log)
cases_per_person_per_land_KM2_lognat: Cumulative Cases / Person / Land KM²
(Natural Log)
cases_per_person_per_city_KM2_lognat: Cumulative Cases / Person / City KM²
(Natural Log)
deaths_per_1K_lognat: Cumulative Deaths per 1K
(Natural Log)
deaths_per_1M_lognat: Cumulative Deaths per 1M
(Natural Log)
deaths_per_person_per_land_KM2_lognat: Cumulative Deaths / Person / Land KM²
(Natural Log)
deaths_per_person_per_city_KM2_lognat: Cumulative Deaths / Person / City KM²
(Natural Log)
tests_per_1K_lognat: Cumulative Tests per 1K
(Natural Log)
tests_per_1M_lognat: Cumulative Tests per 1M
(Natural Log)
tests_per_person_per_land_KM2_lognat: Cumulative Tests / Person / Land KM²
(Natural Log)
tests_per_person_per_city_KM2_lognat: Cumulative Tests / Person / City KM²
(Natural Log)
cases_dma_log: Cumulative Cases (3DMA)
(Log Base 10)
cases_new_log: Daily Cases
(Log Base 10)
cases_new_dma_log: Daily Cases (3DMA)
(Log Base 10)
deaths_dma_log: Cumulative Deaths (3DMA)
(Log Base 10)
deaths_new_log: Daily Deaths
(Log Base 10)
deaths_new_dma_log: Daily Deaths (3DMA)
(Log Base 10)
tests_dma_log: Cumulative Tests (3DMA)
(Log Base 10)
tests_new_log: Daily Tests
(Log Base 10)
tests_new_dma_log: Daily Tests (3DMA)
(Log Base 10)
cases_log: Cumulative Cases
(Log Base 10)
deaths_log: Cumulative Deaths
(Log Base 10)
tests_log: Cumulative Tests
(Log Base 10)
cases_dma_per_1K_log: Cumulative Cases per 1K (3DMA)
(Log Base 10)
cases_dma_per_1M_log: Cumulative Cases per 1M (3DMA)
(Log Base 10)
cases_dma_per_person_per_land_KM2_log: Cumulative Cases / Person / Land KM² (3DMA)
(Log Base 10)
cases_dma_per_person_per_city_KM2_log: Cumulative Cases / Person / City KM² (3DMA)
(Log Base 10)
cases_new_per_1K_log: Daily Cases per 1K
(Log Base 10)
cases_new_per_1M_log: Daily Cases per 1M
(Log Base 10)
cases_new_per_person_per_land_KM2_log: Daily Cases / Person / Land KM²
(Log Base 10)
cases_new_per_person_per_city_KM2_log: Daily Cases / Person / City KM²
(Log Base 10)
cases_new_dma_per_1K_log: Daily Cases per 1K (3DMA)
(Log Base 10)
cases_new_dma_per_1M_log: Daily Cases per 1M (3DMA)
(Log Base 10)
cases_new_dma_per_person_per_land_KM2_log: Daily Cases / Person / Land KM² (3DMA)
(Log Base 10)
cases_new_dma_per_person_per_city_KM2_log: Daily Cases / Person / City KM² (3DMA)
(Log Base 10)
deaths_dma_per_1K_log: Cumulative Deaths per 1K (3DMA)
(Log Base 10)
deaths_dma_per_1M_log: Cumulative Deaths per 1M (3DMA)
(Log Base 10)
deaths_dma_per_person_per_land_KM2_log: Cumulative Deaths / Person / Land KM² (3DMA)
(Log Base 10)
deaths_dma_per_person_per_city_KM2_log: Cumulative Deaths / Person / City KM² (3DMA)
(Log Base 10)
deaths_new_per_1K_log: Daily Deaths per 1K
(Log Base 10)
deaths_new_per_1M_log: Daily Deaths per 1M
(Log Base 10)
deaths_new_per_person_per_land_KM2_log: Daily Deaths / Person / Land KM²
(Log Base 10)
deaths_new_per_person_per_city_KM2_log: Daily Deaths / Person / City KM²
(Log Base 10)
deaths_new_dma_per_1K_log: Daily Deaths per 1K (3DMA)
(Log Base 10)
deaths_new_dma_per_1M_log: Daily Deaths per 1M (3DMA)
(Log Base 10)
deaths_new_dma_per_person_per_land_KM2_log: Daily Deaths / Person / Land KM² (3DMA)
(Log Base 10)
deaths_new_dma_per_person_per_city_KM2_log: Daily Deaths / Person / City KM² (3DMA)
(Log Base 10)
tests_dma_per_1K_log: Cumulative Tests per 1K (3DMA)
(Log Base 10)
tests_dma_per_1M_log: Cumulative Tests per 1M (3DMA)
(Log Base 10)
tests_dma_per_person_per_land_KM2_log: Cumulative Tests / Person / Land KM² (3DMA)
(Log Base 10)
tests_dma_per_person_per_city_KM2_log: Cumulative Tests / Person / City KM² (3DMA)
(Log Base 10)
tests_new_per_1K_log: Daily Tests per 1K
(Log Base 10)
tests_new_per_1M_log: Daily Tests per 1M
(Log Base 10)
tests_new_per_person_per_land_KM2_log: Daily Tests / Person / Land KM²
(Log Base 10)
tests_new_per_person_per_city_KM2_log: Daily Tests / Person / City KM²
(Log Base 10)
tests_new_dma_per_1K_log: Daily Tests per 1K (3DMA)
(Log Base 10)
tests_new_dma_per_1M_log: Daily Tests per 1M (3DMA)
(Log Base 10)
tests_new_dma_per_person_per_land_KM2_log: Daily Tests / Person / Land KM² (3DMA)
(Log Base 10)
tests_new_dma_per_person_per_city_KM2_log: Daily Tests / Person / City KM² (3DMA)
(Log Base 10)
cases_per_1K_log: Cumulative Cases per 1K
(Log Base 10)
cases_per_1M_log: Cumulative Cases per 1M
(Log Base 10)
cases_per_person_per_land_KM2_log: Cumulative Cases / Person / Land KM²
(Log Base 10)
cases_per_person_per_city_KM2_log: Cumulative Cases / Person / City KM²
(Log Base 10)
deaths_per_1K_log: Cumulative Deaths per 1K
(Log Base 10)
deaths_per_1M_log: Cumulative Deaths per 1M
(Log Base 10)
deaths_per_person_per_land_KM2_log: Cumulative Deaths / Person / Land KM²
(Log Base 10)
deaths_per_person_per_city_KM2_log: Cumulative Deaths / Person / City KM²
(Log Base 10)
tests_per_1K_log: Cumulative Tests per 1K
(Log Base 10)
tests_per_1M_log: Cumulative Tests per 1M
(Log Base 10)
tests_per_person_per_land_KM2_log: Cumulative Tests / Person / Land KM²
(Log Base 10)
tests_per_person_per_city_KM2_log: Cumulative Tests / Person / City KM²
(Log Base 10)
: January 2020
population: Population
land_dens: Density of Land Area
city_dens: Population Density of Largest City
uvb: UV-B Radiation in J / M²
rhum: Relative Humidity
strindex: Oxford Stringency Index
visitors: Annual Visitors
visitors_%: Annual Visitors as % of Population
gdp: Gross Domestic Product
gdp_%: Gross Domestic Product per Capita
retail_n_rec: Change in Retail n Recreation Mobility
transit: Change in Transit Mobility
workplaces: Change in WorkPlace Mobility
residential: Change in Residential Mobility
parks: Change in Parks Mobility
groc_n_pharm: Change in Grocery & Pharmacy Mobility
transit_apple: Change in Transit Mobility - Apple
driving_apple: Change in Driving Mobility - Apple
walking_apple: Change in Walking Mobility - Apple
c1: School Closing
c2: Workplace Closing
c3: Cancel Public Events
c4: Restrictions on Gatherings
c5: Close Public Transport
c6: Stay-at-Home Requirements
c7: Restrictions on Internal Movement
c8: International Travel Controls
e1: Income Support
e2: Debt / Contract Relief
e3: Fiscal Measures
e4: International Support
h1: Public Information Campaigns
h2: Testing Policy
h3: Contact Tracing
h4: Emergency Investment in Health Care
h5: Investment in Vaccines
key3_sum: Sum of Key 3 Categories
key3_sum_earlier: Sum of Key 3 Oxford Stingency Factor Weighted to Earlier Dates
make_sum: Custom Stringency Aggregate
neoplasms: NeoPlasms Fatalities
blood: Blood-based Fatalities
endo: Endocrine Fatalities
mental: Mental Fatalities
nervous: Nervous System Fatalities
circul: Circulatory Fatalities
infectious: Infectious Fatalities
respir: Respiratory Fatalities
digest: Digestive Fatalities
skin: Skin-related Fatalities
musculo: Musculo-skeletal Fatalities
genito: Genitourinary Fatalities
childbirth: Maternal and Childbirth Fatalities
perinatal: Perinatal Fatalities
congenital: Congenital Fatalities
other: Other Fatalities
external: External Fatalities
date: Date
temp: Temperature (°C)
make()
Similar to the main casestudy object, charts are rendered with the make
method.
x_category
and y_category
accept any column header in casestudy.df
.
make
accepts many optional kwargs. Every effort is made to align these options with matplotlib standards. Appropriate options can be found via the matplotlib api. For example:
title
: https://matplotlib.org/api/_as_gen/matplotlib.pyplot.suptitle.html (except for CompCharts4D)line_params
: https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.plot.htmllegend_params
: https://matplotlib.org/api/_as_gen/matplotlib.pyplot.legend.htmlxlabel_params
: https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.set_xlabel.htmlxtick_params
: https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.tick_params.htmlpalette_base
: https://matplotlib.org/1.2.1/examples/pylab_examples/show_colormaps.html
All of the above kwargs and many others are share amongst ALL the different see19 Chart Classes.
kwargs = {
'x_category': 'days',
'y_category': 'cases_new',
'width': 12,
'height': 8,
'title': {'t': 'Most Impacted Regions in Italy', 'fontsize': 24, 'weight': 'demi'},
'line_params': {'lw': 4},
'legend_params': {'fontsize': 14, 'handlelength': 1},
'xlabel_params': {'fontsize': 18, 'labelpad': 10},
'ylabel_params': {'fontsize': 18, 'labelpad': 10},
'xtick_params': {'labelsize': 14},
'ytick_params': {'labelsize': 14},
'colors': ['red', 'green', 'blue']
}
casestudy.compchart.make(**kwargs)
Daily Cases
An optional regions
parameter exists that allows you to further reduce the number of regions presented in the chart. regions
accepts a list of region_id
, region_code
, or region_name
in any combination.
Below, we also show that a matplotlib colormap can be provided via palette_base
and that the x-axis label can be removed by setting xlabel=False
kwargs = {
'regions': ['LOM', 'EMI'],
'x_category': 'date',
'y_category': 'deaths_new',
'width': 12,
'height': 8,
'title': {'t': 'Lombardia v Emilia-Romagna', 'fontsize': 24, 'weight': 'demi'},
'line_params': {'lw': 6},
'legend_params': {'fontsize': 14, 'handlelength': 1},
'xlabel': False,
'ylabel_params': {'fontsize': 18, 'labelpad': 10},
'xtick_params': {'labelsize': 14},
'ytick_params': {'labelsize': 14},
'palette_base': 'Accent',
}
casestudy.compchart.make(**kwargs)
Daily Deaths
5.2 Daily Fatalities Comparison - 5 Most Impacted Regions
Now we'll look at new cases in the 5 most impacted regions globally in terms of total fatalities.
regions = list(bf.sort_values(by='deaths', ascending=False).region_name.unique())[:5]
casestudy = CaseStudy(bf, regions=regions, start_hurdle=3, start_factor='deaths', count_dma=21, log=True)
casestudy.make()
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=12.0, style=ProgressStyle(description_width…
HBox(children=(FloatProgress(value=0.0, max=5.0), HTML(value='')))
title='5 Most Impacted Regions'
kwargs = {
'x_category': 'days',
'y_category': 'deaths_new',
'width': 12,
'height': 8,
'title': {'t': title, 'fontsize': 24, 'weight': 'demi'},
'line_params': {'lw': 3},
'legend_params': {'fontsize': 14},
'xlabel_params': {'fontsize': 18, 'labelpad': 10},
'ylabel_params': {'fontsize': 18, 'labelpad': 10},
'xtick_params': {'labelsize': 14},
'ytick_params': {'labelsize': 14},
'palette_base': 'Accent',
}
p = casestudy.compchart.make(**kwargs)
Daily Deaths
There are major outliers, certainly in the early days that make the graph difficult to read. The lognat
adjusted category comes in handy here.
Below we also demonstrate that the regions
parameter can be provided to each make
to further reduce the regions covered in the chart (for convenience)
kwargs['y_category']= 'deaths_new_dma_per_1M_log'
kwargs['ylabel_params']= {'fontsize': 18, 'labelpad': 10}
kwargs['regions'] = ['France', 'India', 'United Kingdom']
p = casestudy.compchart.make(**kwargs)
Daily Deaths per 1M (21DMA)
(Log Base 10)
5.3 Varying the Categories
Oxford Stringency Index
compchart
can be used to compare any category
or factor
in casestudy.df
with days
or date
on the x-axis.
The below chart compares the Oxford Stringency Index for each selected region
regions = ['Germany', 'Spain', 'Taiwan']
casestudy = CaseStudy(
bf, count_categories='cases_new_per_1M', regions=regions,
start_factor='', factors=['strindex']
)
casestudy.make()
kwargs = {
'x_category': 'date',
'y_category': 'strindex',
'width': 12,
'height': 8,
'line_params': {'lw': 3},
'legend_params': {'fontsize': 14},
'xlabel_params': {'fontsize': 18, 'labelpad': 10},
'ylabel_params': {'fontsize': 18, 'labelpad': 10},
'xtick_params': {'labelsize': 14},
'ytick_params': {'labelsize': 14},
'palette_base': 'Accent',
}
p = casestudy.compchart.make(**kwargs)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=6.0, style=ProgressStyle(description_width=…
HBox(children=(FloatProgress(value=0.0, max=3.0), HTML(value='')))
Oxford Stringency Index
These graphs work best as time-series but the x_category
can also be any other category in casestudy.df
. Below we can see that in New York, positive cases have steadily declined even as testing has increased. Texas and Arizona have not had the same success.
regions = ['New York', 'Texas', 'Arizona']
casestudy = CaseStudy(bf, regions=regions, count_dma=21)
casestudy.make()
kwargs = {
'x_category': 'tests_new_dma_per_1M',
'y_category': 'cases_new_dma_per_1M',
'width': 12,
'height': 8,
'line_params': {'lw': 3},
'legend_params': {'fontsize': 14},
'xlabel_params': {'fontsize': 18, 'labelpad': 10},
'ylabel_params': {'fontsize': 18, 'labelpad': 10},
'xtick_params': {'labelsize': 14},
'ytick_params': {'labelsize': 14},
'palette_base': 'Accent',
}
p = casestudy.compchart.make(**kwargs)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=8.0, style=ProgressStyle(description_width=…
HBox(children=(FloatProgress(value=0.0, max=3.0), HTML(value='')))
Daily Cases per 1M (21DMA)
Saving Files
All chart instances in see19
have a save_file
option. Simply set that option to True
and provide a filename
and the file will be saved to yor location of choice.
6. compchart4D - Visualizing Factors in 4D
6.1 From 3D to 4D
6.2 More on the X-Axis
6.3 How Far Can We Take It?
3D charts with color-mapping can be used to explore the impact of various factors in different regions at different times.
Such '4D' maps are often criticized for lack of readability, but they have been a valuable tool for recognizing patterns.
These charts are available in CaseStudy
via the compchart4d
attribute, which is an instance of the CompChart4D
class. The 3D representation shows the count_category
for each region on z-axis with each day from the start_hurdle
on the y-axis and the individual regions separated on the x-axis.
The 3D chart is a cute trick, but the real power is derived from the color_factor
. This maps the color of each 3D bar to the factor one wants to investigate.
CompChart4D
object utilizes matplotlib
for chart creation.
6.1 From 3D to 4D
Most Impacted Regions - Brazil
First, we get region names from the baseframe, sorting as required.
Then we create the casestudy
instance, including several factors that we'll cover in our analysis.
from casestudy.see19.see19 import CaseStudy
regions = bf[bf['country'] == 'Brazil'] \
.sort_values(by='population', ascending=False) \
.region_name.unique().tolist()[:20]
factor_dmas={'temp': 3}
casestudy = CaseStudy(
bf, count_dma=5,
factors=['temp', 'c1', 'A65PLUSB', 'A75PLUSB'], factor_dmas=factor_dmas,
regions=regions, start_hurdle=10, start_factor='cases', lognat=True,
)
casestudy.make()
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=59.0, style=ProgressStyle(description_width…
HBox(children=(FloatProgress(value=0.0, max=20.0), HTML(value='')))
4D charts are customizable in precisely the same way as CompChart2D
, sharing many of the same keywords. compchart4D
utilizes a couple of its own unique keywords as per below:
z_category
is utilized to determine the z-axis (vertical). x- and y-axis are automatically set to regions and days.comp_size
will further trim the number of regions by ranking them on thecomp_category
.- a separate
rank_category
can be provided for this process if preferred
kwargs = {
'title': {'s': 'Most Impacted Regions in Brazil', 'x': .47, 'y': .74, 'fontsize': 24, 'rotation': -9, 'weight': 'demi'},
'ylabel_params': {'fontsize': 18, 'labelpad': 12},
'zlabel_params': {'fontsize': 18, 'labelpad': 10},
'xtick_params': {'labelsize': 18},
'ytick_params': {'labelsize': 12},
'tight': True, 'comp_size': 10,
}
p = casestudy.compchart4d.make(z_category='deaths_new_dma_per_1M', **kwargs)
df_chart
: for most charts, the casestudy dataframe is morphed for presentation purposes. This morphed data is avaliable via the df_chart attribute.
casestudy.compchart4d.df_chart.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | region_name | region_code | country | date | days | deaths_new_dma_per_1M | |
---|---|---|---|---|---|---|---|
10585 | 566 | Ceara | CE | Brazil | 2020-03-22 | 6 days | 0.000000 |
10586 | 566 | Ceara | CE | Brazil | 2020-03-23 | 7 days | 0.000000 |
10587 | 566 | Ceara | CE | Brazil | 2020-03-24 | 8 days | 0.000000 |
10588 | 566 | Ceara | CE | Brazil | 2020-03-25 | 9 days | 0.000000 |
10589 | 566 | Ceara | CE | Brazil | 2020-03-26 | 10 days | 0.169566 |
Adding a Color Factor
By adding the color_factor
attribute, we can see the impact, if any, of an exogenous factor on the comp_category
over time.
We will start with A65PLUSB_%
. As this a time-static factor, the color for each region will be the same regardless of the day.
You must provide additional options to position the color bar.
kwargs = {
**kwargs,
'color_category': 'A65PLUSB_%',
'xy_cbar': (0.09, .225), 'wh_cbar': (.015, 14),
'cblabel_params': {'labelpad': -55},
}
p = casestudy.compchart4d.make(z_category='deaths_new_dma_per_1M', **kwargs)
Now we'll use temp
, which is a time-dynamic factor and will provide a different color for each region on each day.
kwargs = {**kwargs,
'color_category': 'temp',
}
p = casestudy.compchart4d.make(z_category='deaths_new_dma_per_1M', **kwargs)
Fixing the Color Range
NOTE: The range of colors is automatically set by make
. This can be somewhat misleading when:
- comparing multiple charts
- when a single chart has temperatures in a narrow range. In the above example, for instance, temperatures range only between 18C - 28C and, yet, the color map runs almost the entire red-blue spectrum.
Thus, there is a color_interval
option that allows you to fix the color interval. color_interval
expects a tuple, where the first item is the low-end of the range and the second item is the high-end.
Fixing the color interval provides a very different picture of Brazil's impacted regions.
kwargs = {**kwargs, 'color_interval': (20,30)}
p = casestudy.compchart4d.make(z_category='deaths_new_dma_per_1M', **kwargs)
6.2 More on the X-Axis
Top 30 US States
Now we investigate the Top 30 most impacted US states.
regions = bf[bf['country_code'] == 'USA'] \
.sort_values('cases', ascending='False') \
.region_name.unique().tolist()[:50]
countries = 'USA'
casestudy = CaseStudy(
bf, regions=regions, countries=countries, count_dma=14,
factors=['temp', 'uvb', 'rhum', 'A65PLUSB', 'A75PLUSB', 'A05_24B'], factor_dmas={'temp': 14, 'uvb': 14},
start_hurdle=10, start_factor='cases',
)
casestudy.make()
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=139.0, style=ProgressStyle(description_widt…
HBox(children=(FloatProgress(value=0.0, max=50.0), HTML(value='')))
Here 4 charts are prepared in quick succession.
Additional options are shown for editing the background grey and removing gridlines.
NOTE: CompChart4D
automatically sorts the regions on the x-axis such that the regions with the greatest z-axis values are furthest away. This improves readability.
kwargs = {
'regions': '',
'ylabel_params': {'fontsize': 18, 'labelpad': 12},
'zlabel_params': {'fontsize': 18, 'labelpad': 10},
'xtick_params': {'labelsize': 12},
'ytick_params': {'labelsize': 12},
'ztick_params': {'labelsize': 12},
'title': {'x': 0.58, 'y': 0.825,'s': 'Daily Deaths in Select US States', 'fontsize': 22, 'rotation': -10.7},
'xy_cbar': (0.09, .225), 'wh_cbar': (.01, 20),
'title': {'s': 'Most Impacted States in US', 'x': .47, 'y': .74, 'fontsize': 24, 'rotation': -9, 'weight': 'demi'},
'cblabel_params': {'labelpad': -55},
'color_category': 'temp_dma', 'color_interval': (20,30),
'tight': True,
'comp_size': 30,
'rank_category': 'deaths_new_dma_per_1M',
}
p = casestudy.compchart4d.make(z_category='deaths_new_dma_per_1M', **kwargs)
p = casestudy.compchart4d.make(z_category='deaths_new_dma_per_person_per_city_KM2', **kwargs)
kwargs['color_category'] = 'uvb_dma'
kwargs['color_interval'] = ()
kwargs['gridlines'] = False
p = casestudy.compchart4d.make(z_category='deaths_new_dma_per_1M', **kwargs)
p = casestudy.compchart4d.make(z_category='deaths_new_dma_per_person_per_city_KM2', **kwargs)
6.3 How Far Can We Take It?
101 Most Impacted Regions Globally
I acknowledge that using the chart in this way stretches its value, however, it is has been a great way for me to consider trends globally. Try not to look at each individual region ... look at it more like a scatter plot and see what patterns you can identify, if any.
NOTE: If the number of regions exceeds 100, the region labels are removed automatically.
First, we sort the regions in the baseframe
to find the 101 most populous.
Then, those regions are ranked on the comp_category
.
compsize = 102
regions = bf[~(bf['country'] == 'China')].sort_values(by='population', ascending=False).region_name.unique().tolist()[:compsize]
factors = ['temp']
factor_dmas = {'temp': 7}
casestudy = CaseStudy(
bf, regions=regions, factors=factors, factor_dmas=factor_dmas,
start_hurdle=10, start_factor='cases', count_dma=3, lognat=True
)
casestudy.make()
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=226.0, style=ProgressStyle(description_widt…
HBox(children=(FloatProgress(value=0.0, max=103.0), HTML(value='')))
kwargs = {
'ylabel_params': {'fontsize': 18, 'labelpad': 12},
'zlabel_params': {'fontsize': 18, 'labelpad': 10},
'xtick_params': {'labelsize': 12},
'ytick_params': {'labelsize': 12},
'ztick_params': {'labelsize': 12},
'title': {'x': 0.58, 'y': 0.825,'s': 'Daily Deaths Globally', 'fontsize': 22, 'rotation': -10.7},
'xy_cbar': (0.09, .225), 'wh_cbar': (.01, 20),
'title': {'s': 'Most Impacted Regions Totally', 'x': .47, 'y': .74, 'fontsize': 24, 'rotation': -9, 'weight': 'demi'},
'cblabel_params': {'labelpad': -55},
'color_category': 'temp_dma', 'color_interval': (20,30),
'tight': True,
'comp_size': 102,
'rank_category': 'deaths_new_dma_per_1M',
}
p = casestudy.compchart4d.make(z_category='deaths_new_dma_per_1M', **kwargs)
Now, if temperature for some reason did impact the fatality rate associated with COVID19, what we would expect to see is regions at the far end of the x-axis would tend toward the blue end of the color spectrum and regions at the near end of the x-axis would tend towards red.
We would also expect to see regions with higher peaks to have more blue bars on the near-end of the y-axis, or at times earlier in the outbreak.
7. heatmap - Visualizing with Color Maps
7.1 Count Category v Single Factor
7.2 Count Category v Multiple Factors
Hexbins?
See19 utilizes the hexbin
module of matplotlib
to generate HeatMap-style charts to investigate the impact of different factors on COVID19 virulence.
This is a bit of a repurpose or basterdization from hexbin
's intended usage. hexbin
is more commonly used as a 2D histogram for very large datasets, counting the appearance of datapoints within a range of certain (x,y)
coordinates (called bins
) and then mapping a color scheme to the range of counts.
For our purposes, use of hexbin
is a stylistic choice, with the patterns developed more interesting and a bit more revealing than a scatter plot. The intention is for each bin
to contain only one datapoint and the color is mapped to either the x-axis values or a 3rd dimension of values.
Structure
As with previous charts, heatmaps are available in CaseStudy
via the heatmap
attribute, which is in turn an instance of the HeatMap
class.
Charts are generated via the make
method, which further morphs casestudy.df
to arrange data for visualization.
Average over Time v Daily Points
All of the analysis to this point has considered each daily datapoint for each region separately. heatmap
is different. heatmap
takes (at this point) a simple mean of the x_category
and y_category
in question. This is a sufficient method to explore potential relationships, but true time series analysis must also be considered to project COVID19 virulence forward.
While the average is used, the timing of such average can still have an impact on the relevance of the analysis. At this stage, heatmap
is capable of utilizing the daily moving average from the date of the peak of the x_category
or from the date the region clears the start_hurdle
.
This option is denoted as the x_start
and color_start
parameters in the make
method.
For this analysis, we need a large dataset, so will start with the top 250 regions in terms of population and we will add many different factors.
excluded_countries = ['China']
excluded_regions = []
frame_filter = (~bf['country'].isin(excluded_countries)) & (~bf['region_name'].isin(excluded_regions))
regions = bf[frame_filter] \
.sort_values('population', ascending=False) \
.region_name.unique().tolist()[:250]
factors_with_dmas = CaseStudy.MSMTS + ['strindex']
factor_dmas = {factor: 28 for factor in factors_with_dmas}
factor_dmas['strindex'] = 14
factors = factors_with_dmas + CaseStudy.MAJOR_CAUSES + ['visitors', 'A75PLUSB', 'A65PLUSB', 'gdp']
casestudy = CaseStudy(
bf, regions=regions, count_dma=14, factors=factors,
factor_dmas=factor_dmas, start_hurdle=1, start_factor='deaths', log=True, lognat=True,
)
casestudy.make()
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=548.0, style=ProgressStyle(description_widt…
HBox(children=(FloatProgress(value=0.0, max=230.0), HTML(value='')))
7.1 Count Category v Single Factor
`heatmap` takes a similar set of options as `comp_chart` and `comp_chart4d`. The biggest difference in approach relates to text annotations:- In
comp_chart
andcomp_chart4d
, specific variables fortitle
,subtitle
, etc. generate text boxes for specific purposes. - In
heatmap
this is replaced in favor of a more flexible approach of ad-hoc text annotations via theannotations
parameter. heatmap
has tended to require more lengthy notations / explanations and so this approach seemed more appropriate.
In addition to the standard comp_category
, the x-axis of heatmap
is now provided by the comp_factor
parameter.
The below chart is completed on a linear scale of daily fatalities. It hints at a potential relationship between fatalities and temperature for the most impacted regions, however, the scaling is negatively impacted by a handful of outliers.
NOTE: color_factor
is not provided, therefore, the color map is a function of the comp_factor
values (on the x-axis).
Max Fatalities v Temperature
title = 'Max Daily Fatalities v Temperature by Region'
subtitle = '*Average temperature for two weeks prior to day of 3rd fatality'
note = '**{} Regions considered excluding mainland China'.format(casestudy.df.region_id.unique().shape[0])
kwargs = {
'x_category': 'deaths_new_dma_per_1M',
'y_category': 'temp_dma',
'annotations': [
[0, 1.09, title, {'color': 'black', 'fontsize': 16, 'ha': 'left', 'va': 'center',}],
[0, 1.05, subtitle, {'color': 'black', 'fontsize': 12, 'ha': 'left', 'va': 'center', 'style': 'italic'}],
[0, 1.01, note, {'color': 'black', 'fontsize': 12, 'ha': 'left', 'va': 'center', 'style': 'italic'}],
],
'xtick_params': {'size': 12},
'ytick_params': {'size': 12},
'xlabel_params': {'size': 12},
'ylabel_params': {'size': 16},
'width': 12, 'height': 8,
}
plt = casestudy.heatmap.make(**kwargs)
The root data for the chart is available via df_chart
attribute.
casestudy.heatmap.df_chart.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | region_name | temp_dma | deaths_new_dma_per_1M | |
---|---|---|---|---|
9 | 52 | Idaho | 20.192015 | 0.428860 |
69 | 312 | Bahrain | 33.111273 | 0.274820 |
48 | 98 | Nebraska | 26.321220 | 0.240344 |
214 | 563 | Mato Grosso Do Sul | 23.137148 | 0.224056 |
219 | 568 | Sergipe | 26.239815 | 0.215220 |
Natural Log of Max Fatalities v Temperature
By taking the natural log of the fatality rate, we can scale the figure to reveal a more (potentially) clear relationship.
Viewers often struggle to understand the scaling of a natural log, so an hlines
option has been provided that will create horizontal lines at the y-values input. hlines
requires a list
of y-values
.
Text annotations are then included to inform of the unscaled comp_category
value at each hline
.
We also provide comp_factor_start:
as max
, which puts to use the 28DMA on the day of peak fatalitiy rate for each region.
title = 'Max Daily Fatalities v Temperature by Region'
kwargs = {
'x_category': 'deaths_new_dma_per_1M_log',
'y_category': 'temp_dma',
'x_start': 'start_hurdle',
'annotations': [
[0, 1.09, title, {'color': 'black', 'fontsize': 16, 'ha': 'left', 'va': 'center',}],
[0, 1.05, subtitle, {'color': 'black', 'fontsize': 12, 'ha': 'left', 'va': 'center', 'style': 'italic'}],
[0, 1.01, note, {'color': 'black', 'fontsize': 12, 'ha': 'left', 'va': 'center', 'style': 'italic'}],
],
'xtick_params': {'size': 12},
'ytick_params': {'size': 12},
'xlabel_params': {'size': 12, 'labelpad': 10},
'ylabel_params': {'size': 16},
'width': 12, 'height': 8,
}
plt = casestudy.heatmap.make(**kwargs)
As with the other chart instances, a chart-specific dataframe can be access for heatmap
via the df_hm
attribute.
casestudy.heatmap.df_chart.head(4)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_id | region_name | temp_dma | deaths_new_dma_per_1M_log | |
---|---|---|---|---|
9 | 52 | Idaho | 20.192015 | -0.367684 |
69 | 312 | Bahrain | 33.111273 | -0.560952 |
48 | 98 | Nebraska | 26.321220 | -0.619168 |
214 | 563 | Mato Grosso Do Sul | 23.137148 | -0.649644 |
Lognat of Max Daily New Fatalities and UVB Radition
title = 'Max Daily Fatalities v UVB Radiation by Region'
subtitle = '*Color-mapped by average daily uvb radiation for two weeks prior to the day of max fatalities'
kwargs = {
'x_category': 'cases_new_dma_per_person_per_city_KM2_log',
'y_category': 'uvb_dma',
'x_start': 'max',
'annotations': [
[0, 1.09, title, {'color': 'black', 'fontsize': 16, 'ha': 'left', 'va': 'center',}],
[0, 1.05, subtitle, {'color': 'black', 'fontsize': 12, 'ha': 'left', 'va': 'center', 'style': 'italic'}],
],
'xtick_params': {'size': 12},
'ytick_params': {'size': 12},
'xlabel_params': {'size': 12, 'labelpad': 10},
'ylabel_params': {'size': 16},
'width': 12, 'height': 8,
}
plt = casestudy.heatmap.make(**kwargs)
7.2 Count Category v Multiple Factors (w one factor color-mapped)
The heatmap
is made all the more powerful when a second factor is used to map the color space of the chart.
This is done via the color_factor
parameter, which can be adapted via the color_factor_start
parameter to take place on the day the start_hurdle
is cleared or the day of max count category.
title = 'Max Daily Fatalities v UVB Radiation v Oxford Stringency Index'
subtitle = '*Average UVB radiation and Oxford Stringency Index for two weeks prior to day of 1st fatality'
kwargs = {
'x_category': 'cases_new_dma_per_1M_lognat',
'color_category': 'strindex_dma',
'color_start': 'start_hurdle',
'y_category': 'uvb_dma',
'annotations': [
[0, 1.09, title, {'color': 'black', 'fontsize': 16, 'ha': 'left', 'va': 'center',}],
[0, 1.05, subtitle, {'color': 'black', 'fontsize': 12, 'ha': 'left', 'va': 'center', 'style': 'italic'}],
],
'xtick_params': {'size': 12},
'ytick_params': {'size': 12},
'xlabel_params': {'size': 12, 'labelpad': 10},
'ylabel_params': {'size': 16},
'width': 12, 'height': 8,
}
plt = casestudy.heatmap.make(**kwargs)
The heatmap
approach is even better suited to time-static variables like demographic age ranges, given they are not susceptible to issues around averages over time.
Below we compare A75PLUBB_%
against the average strindex
for the 14 days prior to the max fatalitiy rate.
We can see that social distancing stringency was quite common across the spectrum and that population age was a much more important variable impacting fatalities.
title = 'Max Daily Fatalities v UVB Radiation v Oxford Stringency Index'
subtitle = '*Average UVB radiation and Oxford Stringency Index for two weeks prior to day of 1st fatality'
note = '**Excludes mainland China'
kwargs = {
'x_category': 'deaths_new_dma_per_person_per_city_KM2_lognat',
'y_category': 'A75PLUSB_%',
'color_category': 'strindex_dma',
'color_start': 'max',
'annotations': [
[0, 1.095, title, {'color': 'black', 'fontsize': 16, 'ha': 'left', 'va': 'center',}],
[0, 1.055, subtitle, {'color': 'black', 'fontsize': 12, 'ha': 'left', 'va': 'center', 'style': 'italic'}],
[0, 1.015, note, {'color': 'black', 'fontsize': 12, 'ha': 'left', 'va': 'center', 'style': 'italic'}],
],
'xtick_params': {'size': 12},
'ytick_params': {'size': 12},
'xlabel_params': {'size': 12, 'labelpad': 10},
'ylabel_params': {'size': 16},
'width': 12, 'height': 8,
}
plt = casestudy.heatmap.make(**kwargs)
8. barcharts - Comparing Regional Factors
A barcharts
attribute is available (via BarCharts
class) as another handy feature for comparing the impact in different regions across different categories.
The object plots a single category on a single plot comparing multiple regions. You can provide multiple categories and multiple subplots will be returned!
barcharts
object utilizes matplotlib
.
First instantiate the casestudy. We will consider a couple of the more successful Asian regions.
dragons = ['Hong Kong', 'Taiwan', 'Korea, South', 'Japan']
notables = [ 'Texas', 'New York', 'Lombardia', 'Sao Paulo']
regions = notables + dragons
factors_with_dmas = ['uvb', 'temp'] + CaseStudy.STRINDEX_CATS
factor_dmas = {factor: 28 for factor in factors_with_dmas}
mobi_dmas = {'transit': 28, 'retail_n_rec': 28, 'parks': 28, 'workplaces': 28}
factors = factors_with_dmas + CaseStudy.GMOBIS + ['A15_34B', 'A65PLUSB'] \
+ ['visitors', 'gdp'] + CaseStudy.MAJOR_CAUSES
casestudy = CaseStudy(
bf, regions=regions, count_dma=21, factors=factors, factor_dmas=factor_dmas,
mobi_dmas=mobi_dmas, start_hurdle=1, start_factor='deaths',
favor_earlier=True, factors_to_favor_earlier='key3_sum',
)
casestudy.make()
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=20.0, style=ProgressStyle(description_width…
HBox(children=(FloatProgress(value=0.0, max=8.0), HTML(value='')))
Barcharts
accepts any category in the see19 dataset bar_colors
provides different coloring of groups in the chart. You can further indicate some feature regions. Below we see a start difference among the regions selected.
factors1 = ['cases_per_1M', 'deaths_per_1M']
kwargs = {'categories': factors1, 'height': 5, 'bar_colors': ['#3D7068', '#D4AFB9', '#529FD7']}
kwargs['feature_regions'] = ['HKG', 'TWN', 'KOR']
plt = casestudy.barcharts.make(**kwargs)
Once again, the chart data is available via df_chart
:
casestudy.barcharts.df_chart
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
region_code | NY | SP | LOM | TX | JPN | KOR | HKG | TWN |
---|---|---|---|---|---|---|---|---|
region_id | 75 | 556 | 36 | 67 | 429 | 433 | 353 | 497 |
region_code | NY | SP | LOM | TX | JPN | KOR | HKG | TWN |
cases | 407326 | 416434 | 95548 | 332434 | 25706 | 13816 | 1655 | 451 |
deaths | 25056 | 19788 | 16796 | 4020 | 988 | 296 | 10 | 7 |
tests | 5.16481e+06 | 1.15885e+06 | 724365 | 2.98455e+06 | 639821 | 1.44335e+06 | 442256 | 79506 |
population | 1.93781e+07 | 4.1142e+07 | 9.63118e+06 | 2.51456e+07 | 1.28057e+08 | 4.79908e+07 | 7.02728e+06 | 2.25314e+07 |
city_dens | 13978.1 | 8184.1 | 2316.88 | 924.007 | 8440.43 | 5032.81 | 9261.85 | 7919.49 |
cases_per_1M | 21019.9 | 10121.9 | 9920.7 | 13220.4 | 200.738 | 287.889 | 235.511 | 20.0165 |
deaths_per_1M | 1293.01 | 480.969 | 1743.92 | 159.869 | 7.71529 | 6.16785 | 1.42303 | 0.310678 |
barcharts
can compare daily case and fatality rates. When a daily figure is selected, barcharts
will find the maximum value in the time-series.
factors2 = ['deaths_new_dma_per_1M', 'deaths_new_dma_per_person_per_city_KM2']
kwargs = {'categories': factors2, 'height': 5, 'bar_colors': ['#3D7068', '#D4AFB9', '#529FD7']}
kwargs['feature_regions'] = ['HKG', 'TWN', 'KOR']
plt = casestudy.barcharts.make(**kwargs)
As a matter of convenience, barcharts
will automatically structure a subplot grid for any number of categories greater than 2.
factors = [
'strindex_dma', 'tests_new_dma_per_1M',
'population', 'city_dens',
'A15_34B_%', 'A65PLUSB_%',
'temp_dma', 'uvb_dma',
'circul_%', 'endo_%',
'visitors_%'
]
factors = factors1 + factors2 + factors
kwargs = {'categories': factors, 'height': 50, 'bar_colors': ['#3D7068', '#D4AFB9', '#529FD7']}
kwargs['title'] = {'t': 'COVID Dragons v Other Regions', 'y': .895, 'fontsize': 20, 'fontweight': 'demi'}
kwargs['feature_regions'] = ['HKG', 'TWN', 'KOR']
plt = casestudy.barcharts.make(**kwargs)
9. Scatterflow for Large Sets
9.1 SubStrindexScatter
9.2 ScatterFlow
The plots investigated above have limitations when investigating a large set of subjects. Multi-line plots tend to become unreadable when using more than, say, 5 lines, and bar charts have dimensionality limitations, etc.
The scatterflow
and substrinscat
charts were created to improve visualization in this case.
9.1 substrinscat - for Strindex Sub-Categories
We will start with substrinscat
, which is a more specific case of a scatterflow
that focuses on the Oxford Stringency Index (you can think of it as being short for "Sub-Strindex Category Scatterflow").
We can generate a single substrinscat
for one region that shows each stringency
indicator. The value of the indicator is denoted by the color at each point.
The strindex
and its subcategories are tracked at the country-level
, so we will instantiate a casestudy
setting the country_level
flag to true
. This aggregates all the see19
data up from the province/state level to the country level (where province/state data exists). As previously noted, smoothing
is not available when country_level=True
.
NOTE we will also instantiate with start_factor: ''
. This creates a dataset beginning on 2020-01-01.
factors = CaseStudy.STRINDEX_CATS
factor_dmas = {factor: 28 for factor in factors}
countries = ['United States of America (the)', 'Canada', 'Mexico', 'Brazil', 'Australia', 'Russia',
'Italy', 'Germany', 'Spain', 'Singapore', 'Japan', 'Hong Kong', 'TWN', 'KOR', 'Malaysia'
]
custom_sum = ['h1', 'h2', 'h3', 'c1', 'c8']
casestudy = CaseStudy(
bf, countries=countries, count_dma=21, factors=factors, factor_dmas=factor_dmas,
start_hurdle=1, start_factor='', lognat=True, country_level=True, custom_sum=custom_sum,
)
casestudy.make()
/Users/spindicate/Documents/programming/zooscraper/casestudy/see19/see19/study/ray.py:16: UserWarning: smoothing is unavailable when country_level=True
super().__init__(*args, **kwargs)
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, max=14.0), HTML(value='')))
First, we'll demonstrate a single region, using Japan.
kwargs = {
'regions': 'Japan', 'width': 6, 'height': 4.5,
'title': {'t': 'Japan Stringency Categories', 'x': .57, 'y': 1.07, 'fontsize': 20},
'xlabel_params': {'fontsize': 18, 'labelpad': 12},
'cblabel_params': {'fontsize': 14, 'labelpad': 6},
'palette_base': 'RdPu',
'xy_cbar': (1.05, .15), 'wh_cbar': (.35, .5),
}
plt = casestudy.substrinscat.make(**kwargs)
The single plot above expands to multi-plot simply by adding more regions.
kwargs = {
'regions': ['name_for_USA', 'Hong Kong', 'Taiwan', 'Korea, South', 'Malaysia'],
'width': 14, 'height': 8,
'palette_base': 'RdPu',
'xy_cbar': (1.05, .3), 'wh_cbar': (.35, .5),
'xy_legend': (-.04, .49),
'legend': {'title': {'fontsize': 12}, 'text': {'fontsize': 12}},
}
plt = casestudy.substrinscat.make(**kwargs)
And the plot automatically rescales based on the number of regions considered:
kwargs = {
'width': 20, 'height': 18,
'palette_base': 'RdPu',
'xy_cbar': (1.05, .3), 'wh_cbar': (.35, .5),
'xy_legend': (-.04, .51),
'legend': {'title': {'fontsize': 12}, 'text': {'fontsize': 12}},
}
plt = casestudy.substrinscat.make(**kwargs)
9.2 scatterflow
ScatterFlow
, available as the scatterflow
attribute, is a generalization of the SubStrinScatter
chart. It is best suited for comparing many regions along a single dimension. For example, we can compare countries on the core Oxford Stringency Index:
kwargs = {
'y_category': 'strindex',
'title': {'t': 'Oxford Stringency Index Over Time', 'y': 0.94, 'fontsize': 16},
'width': 8, 'height': 6,
'xy_cbar': (.7, .24), 'wh_cbar': (.35, 1),
'palette_base': 'Blues',
'xlabel_params': {'fontsize': 15, 'labelpad': 12},
}
plt = casestudy.scatterflow.make(**kwargs)
We can very clearly above the trends in stringency in the different regions above and isolate quickly the outliers.
Scatterflow
accepts any category in the see19 database.
Here we show the sum of the Key3 strindex subcategories.
kwargs = {
'y_category': 'key3_sum',
'title': {
't': 'The Key 3: Information, Contact Tracing, and Testing Over Time',
'fontsize': 16,
'y': 0.94
},
'xlabel_params': {'fontsize': 14},
'width': 8, 'height': 6,
'xy_cbar': (.7, .24), 'wh_cbar': (.35, 1),
'palette_base': 'Blues'
}
plt = casestudy.scatterflow.make(**kwargs)
And below we compare US states on new fatalities.
First, we will select the 25 most impacted States in terms of total fatalities. Then, we instantiate a new CaseStudy to do so.
region_ids = bf[bf.country_code == 'USA'].groupby('region_id').deaths.max().sort_values(ascending=False).index.values[:25]
casestudy = CaseStudy(bf, regions=region_ids, count_dma=3,
start_factor='date', start_hurdle=dt(2020, 3, 1)
)
casestudy.make()
HBox(children=(FloatProgress(value=0.0, description='Creating CaseStudy', layout=Layout(flex='2'), max=2.0, st…
HBox(children=(FloatProgress(value=0.0, description='changes', max=66.0, style=ProgressStyle(description_width…
HBox(children=(FloatProgress(value=0.0, max=25.0), HTML(value='')))
kwargs = {
'y_category': 'deaths_new_dma_per_1M',
'title': {
't': 'Daily Fatalities in US States',
'fontsize': 16,
'y': 0.94
},
'marker': 's',
'ms': 225,
'width': 5,
'height': 4,
'xlabel_params': {'fontsize': 14},
'width': 8, 'height': 6,
'xy_cbar': (.7, .24), 'wh_cbar': (.35, 1),
'palette_base': 'RdYlGn_r'
}
casestudy.scatterflow.make(**kwargs)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.