Web Scraper for Poland COVID19 data.
Project description
Web Scraper of COVID-19 data for Poland
Python package covid19poland is part of MFRatio project.
It provides access to death data in Poland due to COVID-19 as well as overall deaths data.
Setup and usage
Install from pip with
pip install covid19poland
Several data sources are in current version
- Covid-19 deaths in Poland (offline) - manually checked
- Parser of Twitter of Polish Ministery of Health
- Covid-19 deaths from Wikipedia
Package is regularly updated. Update with
pip install --upgrade covid19poland
Covid-19 deaths
Deaths can be acquired as dataframe of separate death cases with attributes
import covid19poland as PL
x = PL.covid_death_cases()
or as death counts aggregated over 5y age groups, sex and region.
x = PL.covid_deaths()
Granularity of the region is parametrizable as 0 (whole Poland), 2 (NUTS-2) or 3 (NUTS-3, default).
x = PL.covid_deaths(level = 2) # setting region to be NUTS-2
The NUTS-2 and NUTS-3 classification is done using offline clone of file from https://ec.europa.eu/eurostat/web/nuts/local-administrative-units.
Online reading
It is recommended to use the offline data, since they have been acquired
this way and manually checked. The data is offline acquirable with the package covid19poland
.
If online data from Twitter is wanted, it can be downloaded and parsed as well.
data,filtered,checklist = PL.twitter(start = "2020-06-01", end = "2020-07-01")
Turn on logs by typing following code before the twitter()
function call.
import logging
logging.basicConfig(level = logging.INFO)
The result of the twitter()
call are three values
- data - containing the deceased people with their place and date of death
- filtered - tweets, that were filtered out. Just for validation that nothing was missed.
- checklist - list of dates that the parser is not sure about
The data can be saved to output files with
with open("data/6_in.json", "w") as fd:
json.dump(data, fd)
with open("data/6_out.json", "w") as fd:
json.dump(filtered, fd)
print(checklist)
Offline data can be validated towards deaths from covid19dh
package,
the mismatching days are acquired by
x = PL.mismatching_days()
Covid-19 tests
The test counts come from two sources and are merged together:
- Parsed from Polish Ministery of Health (@MZ_GOV_PL)
- Wayback Machine (NUTS-3 data) from government pages. (not connected yet)
Fetch the data with
x = PL.covid_tests()
Local copy of the data in the package is used. To live-parse the data from the source
x = PL.covid_tests(offline = False)
Deaths
The covid19poland
can also fetch death data from GUS (Główny Urząd Statystyczny
or Central Statistical Office of Poland). The data is taken from http://demografia.stat.gov.pl/bazademografia/Tables.aspx
and it is deaths per month and gender in years 2010 - 2018.
x = PL.deaths()
Local copy of the data in the package is used. To live-parse the data from the source, type
x = PL.deaths(offline = False)
Wikipedia
Obsolete
The table comes from version from beginning of June on Wikipedia page https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Poland
x = PL.wiki()
Once better tabular source is found, it will replace the current one.
Level is a setting for granularity of data
- Country level (default)
- State level
# country level
x1 = PL.fetch(level = 1)
# state level
x2 = PL.fetch(level = 2)
Contribution
Developed by Martin Benes.
Join on GitHub.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file covid19poland-0.9.0.tar.gz
.
File metadata
- Download URL: covid19poland-0.9.0.tar.gz
- Upload date:
- Size: 17.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ea277d5d22d40f81eff7594c11942f6e6d6e02ee1a5d97453253edff4d3db4b |
|
MD5 | 67db9eb12b87cf7687bf4ab0205c094b |
|
BLAKE2b-256 | 9b059b832978aa265769fb9df9d51b8f46cb88bd51e4c43fff3f7befbdcac4d8 |
File details
Details for the file covid19poland-0.9.0-py3-none-any.whl
.
File metadata
- Download URL: covid19poland-0.9.0-py3-none-any.whl
- Upload date:
- Size: 123.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9701e7612e8092dabfb42654d37ef449a70174787fc317509c678b4602e796b4 |
|
MD5 | 0b722d2e323276b83654711f765980a3 |
|
BLAKE2b-256 | 0cd8150a097f1eba9f4883a400c229435a3445d7f0161580676856621db96666 |