A scraper of statistical data from Vantetider.se built on top of Statscraper.
Project description
This is a scraper for statistical data from http://www.vantetider.se built on top of the Statscraper package <https://github.com/jplusplus/statscraper>.
Install
pip install -r requirements.txt
The scraper has to do a lot of requests and uses requests-cache <https://pypi.python.org/pypi/requests-cache> to store queries.
Example usage
from vantetider import VantetiderScraper
scraper = VantetiderScraper()
scraper.items # List _implemeted_ datasets
# [<VantetiderDataset: VantatKortareAn60Dagar (Väntat kortare än 60 dagar )>, <VantetiderDataset: Overbelaggning (Överbeläggningar)>, <VantetiderDataset: PrimarvardTelefon (Telefontillgänglighet)>, <VantetiderDataset: PrimarvardBesok (Läkarbesök)>, <VantetiderDataset: SpecialiseradBesok (Förstabesök)>, <VantetiderDataset: SpecialiseradOperation (Operation/åtgärd)>]
dataset = scraper.get("Overbelaggning") # Get a specific dataset
# List all available dimensions
print dataset.dimensions
print datatset.regions # List available region
print datatset.years # List available years
# Make a query, you have to explicitly define all dimension values you want
# to query. By default the scraper will fetch default values.
res = dataset.fetch({
"region": "Blekinge",
"year": "2016",
"period": "Februari",
# Currenty we can only query by id of dimension value
"type_of_overbelaggning": ["0", "1"], # "Somatik" and "Psykiatri"
})
# Do something with the result
df = res.pandas
Practical application, using dataset.py for storege.
from vantetider import VantetiderScraper
from vantetider.allowed_values import TYPE_OF_OVERBELAGGNING, PERIODS
import dataset
db = dataset.connect('sqlite:///vantetider.db')
TOPIC = "Overbelaggning"
# Set up local db
table = db.create_table(TOPIC)
scraper = VantetiderScraper()
dataset = scraper.get(TOPIC)
# Get all available regions and years for query
years = [x.value for x in dataset.years]
regions = [x.value for x in dataset.regions]
# Query in chunks to be able to store to database on the run
for region in regions:
for year in years:
res = dataset.fetch({
"year": year,
"type_of_overbelaggning": [x[0] for x in TYPE_OF_OVERBELAGGNING],
"period": PERIODS,
"region": region,
})
df = res.pandas
data = res.list_of_dicts
table.insert_many(data)
TODO
Implement scraping of “Aterbesok”, “Undersokningar”, “BUPdetalj”, “BUP”.
Enable querying on label names on all dimensions
Add more allowed values to vantetider/allowed_values.py
Make requests-cache optional.
Devlop
Run tests:
make tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for vantetider_scraper-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db45b01c31335a236e80abb6e4362b4942780ca46f44ad78e617887ea70e539c |
|
MD5 | 8b4d44c5822c34f038d504a9b0b46f54 |
|
BLAKE2b-256 | b6ccfbcd047897177fe4759cef6c07d5d3f7188166bca84988bb709472365b41 |