Skip to main content

Unified data hub for a better understanding of COVID-19 https://covid19datahub.io

Project description

Python Interface to COVID-19 Data Hub

DOI

The goal of COVID-19 Data Hub is to provide the research community with a unified dataset by collecting worldwide fine-grained case data, merged with exogenous variables helpful for a better understanding of COVID-19. Please agree to the Terms of Use and cite the following reference when using it:

Reference

Guidotti, E., Ardia, D., (2020).
COVID-19 Data Hub
Journal of Open Source Software, 5(51):2376
https://doi.org/10.21105/joss.02376

Setup and usage

Install from pip with

pip install covid19dh

Importing main covid19() function with

from covid19dh import covid19

x,src = covid19("ITA") # load data

Package is regularly updated. Update with

pip install --upgrade covid19dh

Return values

Call of covid19() returns in all cases 2 arguments, pandas dataframes,

  • the data and
  • references to the sources.

Parametrization

Country

Country specifies an administrative region, that the data are fetched from. This is connected with source data comes from. It can be given as ISO3, ISO2, numeric ISO or country name (case-insensitively).

Fetching data from a particular country is done with

x,src = covid19("ESP")

List of ISO codes can be found here.

Filter can also specify multiple countries at the same time

x,src = covid19(["ESP","PT","andorra",250])

Country can be omitted, then whole world data is used.

x,src = covid19()

Date filter

Date can be specified with datetime.datetime, datetime.date or as a str in format YYYY-mm-dd.

from datetime import datetime

x,src = covid19("SWE", start = datetime(2020,4,1), end = "2020-05-01")

Level

Levels work the same way as in all the other our data fetchers.

  1. Country level
  2. State, region or canton level
  3. City or municipality level
from datetime import date

x,src = covid19("USA", level = 2, start = date(2020,5,1))

Cache

Library keeps downloaded data and sources in simple way during runtime. By default, using the cached data is enabled.

Caching can be disabled (e.g. for long running programs) by

x,src = covid19("FRA", cache=False)

More advanced caching is coming.

Vintage

Data Hub enables to fetch the vintage data, data archive collected on each day. The data collecting is stable.

To fetch e.g. US data that were accessible on 10th April 2020 type

x,src = covid19("USA", end = "2020-04-22", vintage = True)

The vintage data are collected at the end of the day, but published with approximately 48 hour delay, once the day is completed in all the timezones.

Hence if vintage = True, but end is not set, warning is raised and None is returned.

x,src = covid19("USA", vintage=True) # too early to get today's vintage
UserWarning: vintage data not available yet

Citations

Sources to data is returned as a second value.

from covid19dh import covid19
x,src = covid19("CZE")

Apart from that a following message is printed on covid19() call.

We have invested a lot of time and effort in creating COVID-19 Data Hub, please cite the following when using it:

        Guidotti, E., Ardia, D., (2020), "COVID-19 Data Hub", Journal of Open Source Software 5(51):2376, doi: 10.21105/joss.02376.

A BibTeX entry for LaTeX users is

        @Article{,
                title = {COVID-19 Data Hub},
                year = {2020},
                doi = {10.21105/joss.02376},
                author = {Emanuele Guidotti and David Ardia},
                journal = {Journal of Open Source Software},
                volume = {5},
                number = {51},
                pages = {2376},
        }

To hide this message use 'verbose = FALSE'.

This feature can be turned off by setting verbose to False.

from covid19dh import covid19
x,src = covid19("CZE", verbose = False) 

To get the data sources of the data that has been acquired use function cite(). It will return only the relevant sources, filtering the irrelevant out.

src = cite(x) # get sources of x

Function prints formatted citations to stdout.

Data References:
        Czech Statistical Office (2018), https://www.czso.cz/

        Johns Hopkins Center for Systems Science and Engineering (2020), https://github.com/

        Ministery of Health of Czech Republic (2020), https://onemocneni-aktualne.mzcr.cz/

        Our World in Data (2020), https://github.com/

        Hale Thomas, Sam Webster, Anna Petherick, Toby Phillips, and Beatriz Kira (2020). Oxford COVID-19 Government Response Tracker, Blavatnik School of Government.

        World Bank Open Data (2018), https://data.worldbank.org/

Switch the printing off by setting verbose parameter to False.

src = cite(x, verbose = False)

Pandas dataframe src has following structure

    iso_alpha_3  administrative_area_level  ...                     institution                                        textVersion
137         CZE                        1.0  ...                             NaN                                                NaN
138         CZE                        1.0  ...                             NaN                                                NaN
139         CZE                        2.0  ...                             NaN                                                NaN
140         CZE                        2.0  ...                             NaN                                                NaN
141         CZE                        2.0  ...                             NaN                                                NaN
142         CZE                        2.0  ...                             NaN                                                NaN
143         CZE                        3.0  ...                             NaN                                                NaN
144         CZE                        3.0  ...                             NaN                                                NaN
145         CZE                        3.0  ...                             NaN                                                NaN
539         NaN                        NaN  ...                             NaN                                                NaN
540         NaN                        NaN  ...                             NaN                                                NaN
541         NaN                        NaN  ...                             NaN                                                NaN
542         NaN                        NaN  ...                             NaN                                                NaN
543         NaN                        NaN  ...                             NaN                                                NaN
544         NaN                        NaN  ...                             NaN                                                NaN
545         NaN                        NaN  ...  Blavatnik School of Government  Hale Thomas, Sam Webster, Anna Petherick, Toby...
546         NaN                        NaN  ...  Blavatnik School of Government  Hale Thomas, Sam Webster, Anna Petherick, Toby...
547         NaN                        NaN  ...  Blavatnik School of Government  Hale Thomas, Sam Webster, Anna Petherick, Toby...
548         NaN                        NaN  ...  Blavatnik School of Government  Hale Thomas, Sam Webster, Anna Petherick, Toby...
549         NaN                        NaN  ...  Blavatnik School of Government  Hale Thomas, Sam Webster, Anna Petherick, Toby...
550         NaN                        NaN  ...  Blavatnik School of Government  Hale Thomas, Sam Webster, Anna Petherick, Toby...
551         NaN                        NaN  ...  Blavatnik School of Government  Hale Thomas, Sam Webster, Anna Petherick, Toby...
552         NaN                        NaN  ...  Blavatnik School of Government  Hale Thomas, Sam Webster, Anna Petherick, Toby...
553         NaN                        NaN  ...  Blavatnik School of Government  Hale Thomas, Sam Webster, Anna Petherick, Toby...
554         NaN                        NaN  ...  Blavatnik School of Government  Hale Thomas, Sam Webster, Anna Petherick, Toby...
555         NaN                        NaN  ...                             NaN                                                NaN

Dataframe columns are

  • iso_alpha_3, administrative_area_level,
  • data_type
  • url
  • title, author, institution
  • year
  • bibtype, textVersion

All sources can be acquired with

src_all = get_sources() # get all sources

Contribution

Developed by Martin Benes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

covid19dh-1.1.2.tar.gz (11.0 kB view hashes)

Uploaded Source

Built Distribution

covid19dh-1.1.2-py3-none-any.whl (21.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page