Web Scraper for Poland COVID19 data.
Project description
Web Scraper of COVID-19 data for Poland
Python package covid19poland is part of MFRatio project.
It provides access to death data in Poland due to COVID-19 as well as overall deaths data.
Setup and usage
Install from pip with
pip install covid19poland
Several data sources are in current version
- Covid-19 deaths from Wikipedia
- Online parser of Twitter of Polish Ministry of Health
- Offline manually checked data from online parser
Package is regularly updated. Update with
pip install --upgrade covid19poland
Wikipedia
The table comes from version from beginning of June on Wikipedia page https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Poland
import covid19poland as PL
x = PL.wiki()
Once better tabular source is found, it will replace the current one.
Parametrization
Level is a setting for granularity of data
- Country level (default)
- State level
import covid19poland as PL
# country level
x1 = PL.fetch(level = 1)
# state level
x2 = PL.fetch(level = 2)
Twitter data
The data from twitter can be downloaded and parsed with
data,filtered,checklist = PL.twitter(start = "2020-06-01", end = "2020-07-01")
Turn on logs by typing following code before the twitter()
function call.
import logging
logging.basicConfig(level = logging.INFO)
The result of the twitter()
call are three values
- data - containing the deceased people with their place and date of death
- filtered - tweets, that were filtered out. Just for validation that nothing was missed.
- checklist - list of dates that the parser is not sure about
The data can be saved to output files with
with open("data/6_in.json", "w") as fd:
json.dump(data, fd)
with open("data/6_out.json", "w") as fd:
json.dump(filtered, fd)
print(checklist)
Offline data
The twitter data has already been manually checked and it is part of the package.
Use function read()
from offline
submodule to get them
import covid19poland as PL
x = PL.offline.read()
Here the result is pandas.DataFrame
with rows being each deceased person.
TODO
- Transform place of death to NUTS codes
Contribution
Developed by Martin Benes.
Join on GitHub.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for covid19poland-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba1c12ec6f9758d96de9c2f544aef4e38e6b0a5ec2bc34c06ae7dc4132b62fb1 |
|
MD5 | 841c172bdc60fbb1dc60257672101d38 |
|
BLAKE2b-256 | d3673727f59c4a990d9d29305827c1411377b0e4ea9c4f683edf781d1067c566 |