Skip to main content

Web Scraper for Poland COVID19 data.

Project description

Web Scraper of COVID-19 data for Poland

Python package covid19poland is part of MFRatio project.

It provides access to death data in Poland due to COVID-19 as well as overall deaths data.

Setup and usage

Install from pip with

pip install covid19poland

Several data sources are in current version

  • Covid-19 deaths in Poland (offline) - manually checked
  • Parser of Twitter of Polish Ministery of Health
  • Covid-19 deaths from Wikipedia

Package is regularly updated. Update with

pip install --upgrade covid19poland

Covid-19 deaths

Deaths can be acquired as dataframe of separate death cases with attributes

import covid19poland as PL

x = PL.covid_death_cases()

or as death counts aggregated over 5y age groups, sex and region.

x = PL.covid_deaths()

Granularity of the region is parametrizable as 0 (whole Poland), 2 (NUTS-2) or 3 (NUTS-3, default).

x = PL.covid_deaths(level = 2) # setting region to be NUTS-2

The NUTS-2 and NUTS-3 classification is done using offline clone of file from https://ec.europa.eu/eurostat/web/nuts/local-administrative-units.

Online reading

It is recommended to use the offline data, since they have been acquired this way and manually checked. The data is offline acquirable with the package covid19poland.

If online data from Twitter is wanted, it can be downloaded and parsed as well.

data,filtered,checklist = PL.twitter(start = "2020-06-01", end = "2020-07-01")

Turn on logs by typing following code before the twitter() function call.

import logging
logging.basicConfig(level = logging.INFO)

The result of the twitter() call are three values

  • data - containing the deceased people with their place and date of death
  • filtered - tweets, that were filtered out. Just for validation that nothing was missed.
  • checklist - list of dates that the parser is not sure about

The data can be saved to output files with

with open("data/6_in.json", "w") as fd:
    json.dump(data, fd)
with open("data/6_out.json", "w") as fd:
    json.dump(filtered, fd)
print(checklist)

Offline data can be validated towards deaths from covid19dh package, the mismatching days are acquired by

x = PL.mismatching_days()

Covid-19 tests

The test counts come from two sources and are merged together:

  • Parsed from Polish Ministery of Health (@MZ_GOV_PL)
  • Wayback Machine (NUTS-3 data) from government pages. (not connected yet)

Fetch the data with

x = PL.covid_tests()

Local copy of the data in the package is used. To live-parse the data from the source

x = PL.covid_tests(offline = False)

Deaths

The covid19poland can also fetch death data from GUS (Główny Urząd Statystyczny or Central Statistical Office of Poland). The data is taken from http://demografia.stat.gov.pl/bazademografia/Tables.aspx and it is deaths per month and gender in years 2010 - 2018.

x = PL.deaths()

Local copy of the data in the package is used. To live-parse the data from the source, type

x = PL.deaths(offline = False)

Wikipedia

Obsolete

The table comes from version from beginning of June on Wikipedia page https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Poland

x = PL.wiki()

Once better tabular source is found, it will replace the current one.

Level is a setting for granularity of data

  1. Country level (default)
  2. State level
# country level
x1 = PL.fetch(level = 1)
# state level
x2 = PL.fetch(level = 2)

Contribution

Developed by Martin Benes.

Join on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

covid19poland-0.8.7.tar.gz (17.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

covid19poland-0.8.7-py3-none-any.whl (122.6 kB view details)

Uploaded Python 3

File details

Details for the file covid19poland-0.8.7.tar.gz.

File metadata

  • Download URL: covid19poland-0.8.7.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.9

File hashes

Hashes for covid19poland-0.8.7.tar.gz
Algorithm Hash digest
SHA256 d55310952a9bf4d7f6caab2dd3345bf2746b2575a2265276c5a89b283c8a4e17
MD5 4a5cea937ca4597b2d39e7a755f763a1
BLAKE2b-256 825c8e8c69fd7504381cd23dbd52964203cd21861ac2efae88976820522e1c8a

See more details on using hashes here.

File details

Details for the file covid19poland-0.8.7-py3-none-any.whl.

File metadata

  • Download URL: covid19poland-0.8.7-py3-none-any.whl
  • Upload date:
  • Size: 122.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.9

File hashes

Hashes for covid19poland-0.8.7-py3-none-any.whl
Algorithm Hash digest
SHA256 0592499ba70658d83ac2f62b17a77b37cf69da4a0ecc4af5a8e4ccd22f10e685
MD5 f2b5e5af7e4d28d9b4411fb3c3c0203e
BLAKE2b-256 1ddb8154099012260fba059a8ebf6a6528045b00a190a9d1feffbb07f330ef8f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page