A scraper of statistical data from Vantetider.se built on top of Statscraper.
Project description
This is a scraper for statistical data from http://www.vantetider.se built on top of the Statscraper package <https://github.com/jplusplus/statscraper>.
Install
pip install -r requirements.txt
The scraper has to do a lot of requests and uses requests-cache <https://pypi.python.org/pypi/requests-cache> to store queries.
Example usage
from vantetider import VantetiderScraper
scraper = VantetiderScraper()
scraper.items # List _implemeted_ datasets
# [<VantetiderDataset: VantatKortareAn60Dagar (Väntat kortare än 60 dagar )>, <VantetiderDataset: Overbelaggning (Överbeläggningar)>, <VantetiderDataset: PrimarvardTelefon (Telefontillgänglighet)>, <VantetiderDataset: PrimarvardBesok (Läkarbesök)>, <VantetiderDataset: SpecialiseradBesok (Förstabesök)>, <VantetiderDataset: SpecialiseradOperation (Operation/åtgärd)>]
dataset = scraper.get("Overbelaggning") # Get a specific dataset
# List all available dimensions
print dataset.dimensions
print datatset.regions # List available region
print datatset.years # List available years
# Make a query, you have to explicitly define all dimension values you want
# to query. By default the scraper will fetch default values.
res = dataset.fetch({
"region": "Blekinge",
"year": "2016",
"period": "Februari",
# Currenty we can only query by id of dimension value
"type_of_overbelaggning": ["0", "1"], # "Somatik" and "Psykiatri"
})
# Do something with the result
df = res.pandas
Practical application, using dataset.py for storege.
from vantetider import VantetiderScraper
from vantetider.allowed_values import TYPE_OF_OVERBELAGGNING, PERIODS
import dataset
db = dataset.connect('sqlite:///vantetider.db')
TOPIC = "Overbelaggning"
# Set up local db
table = db.create_table(TOPIC)
scraper = VantetiderScraper()
dataset = scraper.get(TOPIC)
# Get all available regions and years for query
years = [x.value for x in dataset.years]
regions = [x.value for x in dataset.regions]
# Query in chunks to be able to store to database on the run
for region in regions:
for year in years:
res = dataset.fetch({
"year": year,
"type_of_overbelaggning": [x[0] for x in TYPE_OF_OVERBELAGGNING],
"period": PERIODS,
"region": region,
})
df = res.pandas
data = res.list_of_dicts
table.insert_many(data)
TODO
Implement scraping of “Aterbesok”, “Undersokningar”, “BUPdetalj”, “BUP”.
Enable querying on label names on all dimensions
Add more allowed values to vantetider/allowed_values.py
Make requests-cache optional.
Devlop
Run tests:
make tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file vantetider_scraper-0.2.0.tar.gz
.
File metadata
- Download URL: vantetider_scraper-0.2.0.tar.gz
- Upload date:
- Size: 9.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84e303e815e9a58f0da359be447b8667a761b1c23489aac7de70d3579253686a |
|
MD5 | dff669616331e6bb7f8eaceee3bd0aed |
|
BLAKE2b-256 | e123576f9fc9b758001e27f7c78ce013bb636188c9a291f04165c9f77a64025f |
File details
Details for the file vantetider_scraper-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: vantetider_scraper-0.2.0-py3-none-any.whl
- Upload date:
- Size: 10.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | db45b01c31335a236e80abb6e4362b4942780ca46f44ad78e617887ea70e539c |
|
MD5 | 8b4d44c5822c34f038d504a9b0b46f54 |
|
BLAKE2b-256 | b6ccfbcd047897177fe4759cef6c07d5d3f7188166bca84988bb709472365b41 |