Skip to main content

A base class for building web scrapers for statistical data.

Project description

Statscraper is a base library for building web scrapers for statistical data, with a helper ontology for (primarily Swedish) statistical data. A set of ready-to-use scrapers are included.

For users

You can use Statscraper as a foundation for your next scraper, or try out any of the included scrapers. With Statscraper comes a unified interface for scraping, and some useful helper methods for scraper authors.

Full documentation: ReadTheDocs

For updates and discussion: Facebook

By Journalism++ Stockholm, and Robin Linderborg.


pip install statscraper

Using a scraper

Scrapers acts like “cursors” that move around a hierarchy of datasets and collections of datasets. Collections and datasets are refered to as “items”.

      ┏━ Collection ━━━ Collection ━┳━ Dataset
ROOT ━╋━ Collection ━┳━ Dataset     ┣━ Dataset
      ┗━ Collection  ┣━ Dataset     ┗━ Dataset
                     ┗━ Dataset


Here’s a simple example, with a scraper that returns only a single dataset: The number of cranes spotted at Hornborgarsjön each day as scraped from Länsstyrelsen i Västra Götalands län.

>>> from statscraper.scrapers import Cranes

>>> scraper = Cranes()
>>> scraper.items  # List available datasets
[<Dataset: Number of cranes>]

>>> dataset = scraper["Number of cranes"]
>>> dataset.dimensions
[<Dimension: date (Day of the month)>, <Dimension: month>, <Dimension: year>]

>>> row =[0]  # first row in this dataset
>>> row
<Result: 7 (value)>
>>> row.dict
{'value': '7', u'date': u'7', u'month': u'march', u'year': u'2015'}

>>> df =  # get this dataset as a Pandas dataframe

Building a scraper

Scrapers are built by extending a base scraper, or a derative of that. You need to provide a method for listing datasets or collections of datasets, and for fetching data.

Statscraper is built for statistical data, meaning that it’s most useful when the data you are scraping/fetching can be organized with a numerical value in each row:

city year value
Voi 2009 45483
Kabarnet 2006 10191
Taveta 2009 67505

A scraper can override these methods:

  • _fetch_itemslist(item) to yield collections or datasets at the current cursor position
  • _fetch_data(dataset) to yield rows from the currently selected dataset
  • _fetch_dimensions(dataset) to yield dimensions available for the currently selected dataset
  • _fetch_allowed_values(dimension) to yield allowed values for a dimension

A number of hooks are avaiable for more advanced scrapers. These are called by adding the on decorator on a method:

def my_method(self):
  # Do something when the user moves up one level

For developers

These instructions are for developers working on the BaseScraper. See above for instructions for developing a scraper using the BaseScraper.


git clone
python install


python test

Run python test from the root directory. This will install everything needed for testing, before running tests with nosetests.


  • 1.0.7
    • Remove logic from SCBScraper that is already handled by BaseScraper
  • 1.0.6
    • Added dialect:skatteverket (two/four digit county/municipality codes)
    • Added data type for road category
    • Make SCB scraper treat a “Region” as, well, a region
  • 1.0.5 - Added station key to SMHI scraper
  • 1.0.4 - Added SMHI scraper
  • 1.0.3 - Re-add demo scrapers that accidently got left out in the first release
  • 1.0.0 - First release
  • 1.0.0.dev2
    • Implement translation
    • Add Dataset.fetch_next() as generator for results
  • 1.0.0.dev1
    • Semantic versioning starts here
    • Implement datatypes and dialects
  • 0.0.2
    • Added some demo scrapers
    • The cursor is now moved when accessing datasets
    • Renamed methods for moving cursor: move_up(), move_to()
    • Added tests
    • Added datatypes subtree
  • 0.0.1 - First version

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for statscraper, version 1.0.7
Filename, size File type Python version Upload date Hashes
Filename, size statscraper-1.0.7-py2.py3-none-any.whl (57.4 kB) File type Wheel Python version py2.py3 Upload date Hashes View
Filename, size statscraper-1.0.7.tar.gz (44.7 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page