Skip to main content

A scraper of statistical data from Vantetider.se built on top of Statscraper.

Project description

This is a scraper for statistical data from http://www.vantetider.se built on top of the Statscraper package <https://github.com/jplusplus/statscraper>.

Install

pip install -r requirements.txt

The scraper has to do a lot of requests and uses requests-cache <https://pypi.python.org/pypi/requests-cache> to store queries.

Example usage

from vantetider import VantetiderScraper

scraper = VantetiderScraper()
scraper.items  # List _implemeted_ datasets
# [<VantetiderDataset: VantatKortareAn60Dagar (Väntat kortare än 60 dagar )>, <VantetiderDataset: Overbelaggning (Överbeläggningar)>, <VantetiderDataset: PrimarvardTelefon (Telefontillgänglighet)>, <VantetiderDataset: PrimarvardBesok (Läkarbesök)>, <VantetiderDataset: SpecialiseradBesok (Förstabesök)>, <VantetiderDataset: SpecialiseradOperation (Operation/åtgärd)>]

dataset = scraper.get("Overbelaggning")  # Get a specific dataset

# List all available dimensions
print dataset.dimensions

print datatset.regions  # List available region
print datatset.years  # List available years

# Make a query, you have to explicitly define all dimension values you want
# to query. By default the scraper will fetch default values.
res = dataset.fetch({
  "region": "Blekinge",
  "year": "2016",
  "period": "Februari",
  # Currenty we can only query by id of dimension value
  "type_of_overbelaggning": ["0", "1"], # "Somatik" and "Psykiatri"
  })

# Do something with the result
df = res.pandas

Practical application, using dataset.py for storege.

from vantetider import VantetiderScraper
from vantetider.allowed_values import TYPE_OF_OVERBELAGGNING, PERIODS
import dataset

db = dataset.connect('sqlite:///vantetider.db')

TOPIC = "Overbelaggning"

# Set up local db
table = db.create_table(TOPIC)
scraper = VantetiderScraper()

dataset = scraper.get(TOPIC)

# Get all available regions and years for query
years = [x.value for x in dataset.years]
regions = [x.value for x in dataset.regions]

# Query in chunks to be able to store to database on the run
for region in regions:
    for year in years:
        res = dataset.fetch({
            "year": year,
            "type_of_overbelaggning": [x[0] for x in TYPE_OF_OVERBELAGGNING],
            "period": PERIODS,
            "region": region,
            })
        df = res.pandas
        data = res.list_of_dicts
        table.insert_many(data)

TODO

  • Implement scraping of “Aterbesok”, “Undersokningar”, “BUPdetalj”, “BUP”.

  • Enable querying on label names on all dimensions

  • Add more allowed values to vantetider/allowed_values.py

  • Make requests-cache optional.

Devlop

Run tests:

make tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vantetider_scraper-0.2.0.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

vantetider_scraper-0.2.0-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file vantetider_scraper-0.2.0.tar.gz.

File metadata

  • Download URL: vantetider_scraper-0.2.0.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.6.9

File hashes

Hashes for vantetider_scraper-0.2.0.tar.gz
Algorithm Hash digest
SHA256 84e303e815e9a58f0da359be447b8667a761b1c23489aac7de70d3579253686a
MD5 dff669616331e6bb7f8eaceee3bd0aed
BLAKE2b-256 e123576f9fc9b758001e27f7c78ce013bb636188c9a291f04165c9f77a64025f

See more details on using hashes here.

File details

Details for the file vantetider_scraper-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: vantetider_scraper-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.6.9

File hashes

Hashes for vantetider_scraper-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 db45b01c31335a236e80abb6e4362b4942780ca46f44ad78e617887ea70e539c
MD5 8b4d44c5822c34f038d504a9b0b46f54
BLAKE2b-256 b6ccfbcd047897177fe4759cef6c07d5d3f7188166bca84988bb709472365b41

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page