Scrape Dutch voting advice (StemWijzer) data across live and archived elections

These details have not been verified by PyPI

Project description

nl-voting-data-scraper

Scrape Dutch voting advice (StemWijzer) data across live and archived elections: municipal, national, European, or provincial.

Outputs structured JSON with party positions, policy statements, and metadata. Reusable across election cycles, including historical timeline analysis.

Key Features

Hybrid scraping: API-first (fast HTTP) with Playwright browser automation fallback
Historical extraction: Works against archived StemWijzer apps when live API endpoints no longer exist
Multiple capture modes: Direct API fetch, runtime JSON.parse capture, page-global extraction, and JS bundle parsing
Election-agnostic: Municipal, national (Tweede Kamer), European Parliament, and provincial elections
Historical timeline ready: Includes archived parliamentary elections back to 2006
CLI + Library: Use from the command line or import in Python
Caching & resume: File-based cache for interrupted batch scrapes
Rate limiting: Token-bucket rate limiter with exponential backoff
Base64/AES decoding: Handles encoded StemWijzer API responses automatically
Structured output: Legacy flat JSON layout or engine-friendly snapshot layout for downstream apps

Installation

pip install nl-voting-data-scraper

For browser automation fallback and archived elections:

pip install "nl-voting-data-scraper[browser]"
playwright install chromium

Quick Start

CLI

# List known elections
nl-voting-data-scraper list-elections

# Scrape all municipalities for 2026 municipal elections
nl-voting-data-scraper scrape gr2026 -o ./output

# Scrape a specific municipality
nl-voting-data-scraper scrape gr2026 -m GM0014 -o ./output

# Scrape a live national election
nl-voting-data-scraper scrape tk2025 -o ./output

# Scrape an archived historical election
nl-voting-data-scraper scrape tk2017 --browser-only -o ./output

# Write engine snapshots for downstream apps
nl-voting-data-scraper scrape tk2023 --layout engine -o ./snapshots

# List municipalities for an election
nl-voting-data-scraper list-municipalities gr2026

# Discover endpoints and browser capture details
nl-voting-data-scraper discover tk2021

Python Library

import asyncio

from nl_voting_data_scraper import StemwijzerScraper


async def main():
    async with StemwijzerScraper("tk2023") as scraper:
        results = await scraper.scrape()
        for data in results:
            print(
                f"{data.votematch.name}: "
                f"{len(data.parties)} parties, {len(data.statements)} statements"
            )

    async with StemwijzerScraper("gr2026") as scraper:
        data = await scraper.scrape_one("GM0014")
        if data:
            print(f"Municipality: {data.votematch.name}")


asyncio.run(main())

Supported Elections

Slug	Type	Year	Source mode	Notes
`gr2026`	Municipal	2026	Live API + browser fallback	258+ municipalities
`tk2025`	National	2025	Live API + browser fallback	Single national contest
`eu2024`	European	2024	Live API + browser fallback	Single national contest
`tk2023`	National	2023	Live API + browser fallback	Single national contest
`ps2023`	Provincial	2023	Live API + browser fallback	Multi-jurisdiction provincial dataset
`tk2021`	National	2021	Archived browser extraction	Wayback-backed
`tk2017`	National	2017	Archived browser extraction	Wayback-backed
`tk2012`	National	2012	Archived browser extraction	Wayback-backed
`tk2010`	National	2010	Archived browser extraction	Wayback-backed
`tk2006`	National	2006	Archived browser extraction	Wayback-backed

Unknown election slugs still fall back to auto-detected URL patterns through the library API.

Historical Extraction

Archived elections do not always expose the modern JSON index/data endpoints anymore. For those cases, the scraper can recover contest payloads from the frontend itself:

Live API fetch when a data endpoint still exists.
Runtime capture by intercepting JSON.parse(...) before the app boots.
Page-global extraction for older builds that expose config, objectNames, or related globals.
Static bundle parsing for embedded JSON, URL-encoded JSON, or base64-wrapped payloads.

For archived elections, install the browser extra and Chromium. --browser-only is the safest mode when you already know the election is archive-backed.

How It Works

graph TD
    A[StemwijzerScraper\nOrchestrator] --> B[API Scraper\nPrimary]
    A --> C[Browser Scraper\nFallback]
    B --> D[HTTP fetch + base64 decode]
    C --> E[Playwright runtime capture\nbrowser state + JS bundles]
    D --> F[Structured JSON\nper election]
    E --> F

API-first (fast): Fetches data from StemWijzer data endpoints via HTTP. Handles base64-encoded responses and optional AES decryption.
Browser fallback: If the API fails or no longer exists, Playwright loads the frontend and captures usable contest payloads from runtime state, page globals, or JS bundles.
Synthetic indexing for single contests: When archived national or EU elections expose the contest but not the legacy index, the scraper generates a stable single-entry index automatically.

Output Layouts

Legacy layout

The default legacy layout matches the original package behavior:

output/
  index.json
  GM0014.json
  tk2023.json
  combined.json  # optional

Engine layout

Use --layout engine to write a reusable snapshot structure for downstream applications:

output/
  tk2023/
    index.json
    manifest.json
    raw/
      tk2023.json

The engine layout is especially useful when another project wants to ingest scraped elections directly without any post-processing.

Output Format

Each output entry contains structured party, statement, and contest metadata:

{
  "parties": [
    {
      "id": 206919,
      "name": "Party Name",
      "fullName": "Full Party Name",
      "website": "https://...",
      "hasSeats": true,
      "statements": [
        { "id": 206987, "position": "agree", "explanation": "..." }
      ]
    }
  ],
  "statements": [
    {
      "id": 206987,
      "theme": "Housing",
      "title": "The municipality should build more affordable housing.",
      "index": 1
    }
  ],
  "shootoutStatements": [],
  "votematch": {
    "id": 206918,
    "name": "Municipality Name",
    "context": "2026GR",
    "remote_id": "GM0014",
    "langcode": "nl"
  }
}

CLI Options

nl-voting-data-scraper scrape ELECTION [OPTIONS]

Options:
  -m, --municipality TEXT   Specific GM codes (repeatable)
  -l, --language TEXT       Languages to scrape (default: nl)
  -o, --output TEXT         Output directory (default: ./output)
  --layout [legacy|engine]  Output layout (default: legacy)
  --combined                Also write combined.json
  --rate-limit FLOAT        Requests per second (default: 2.0)
  --no-cache                Disable caching
  --resume                  Resume interrupted scrape from cache
  --browser-only            Only use browser scraping
  --api-only                Only use API scraping
  -v, --verbose             Verbose output

Development

git clone https://github.com/rhnfzl/nl-voting-data-scraper.git
cd nl-voting-data-scraper
pip install -e ".[dev,browser]"
playwright install chromium
ruff check src/ tests/
ruff format --check src/ tests/
mypy src/
pytest -v
python -m build

Release

The repository publishes to PyPI from GitHub Releases:

Update the package version in pyproject.toml and src/nl_voting_data_scraper/__init__.py.
Push to main.
Create a GitHub Release such as v0.3.0.
The publish.yml workflow builds the package and uploads it to PyPI.

Acknowledgements

Inspired by afvanwoudenberg/stemwijzer.

License

This project is licensed under the MIT License. See LICENSE for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.0

Mar 18, 2026

0.2.0

Mar 14, 2026

0.1.1

Mar 14, 2026

0.1.0

Mar 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nl_voting_data_scraper-0.3.0.tar.gz (32.9 kB view details)

Uploaded Mar 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nl_voting_data_scraper-0.3.0-py3-none-any.whl (30.3 kB view details)

Uploaded Mar 18, 2026 Python 3

File details

Details for the file nl_voting_data_scraper-0.3.0.tar.gz.

File metadata

Download URL: nl_voting_data_scraper-0.3.0.tar.gz
Upload date: Mar 18, 2026
Size: 32.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nl_voting_data_scraper-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`64adb81cfd052412abbf966d34ea41a73a1aa0e293580e740c92d54048caded7`
MD5	`741b902e9ccf66cb2ea4bd02602edc51`
BLAKE2b-256	`f68715164b409e21e71eb8f4caf3b767cb96829eefc6f05ca754841a5a0d6f06`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nl_voting_data_scraper-0.3.0.tar.gz:

Publisher: publish.yml on rhnfzl/nl-voting-data-scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nl_voting_data_scraper-0.3.0.tar.gz
- Subject digest: 64adb81cfd052412abbf966d34ea41a73a1aa0e293580e740c92d54048caded7
- Sigstore transparency entry: 1123171946
- Sigstore integration time: Mar 18, 2026
Source repository:
- Permalink: rhnfzl/nl-voting-data-scraper@c554a2c64342ea5cd90cfc522454b3ec81bb856b
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/rhnfzl
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c554a2c64342ea5cd90cfc522454b3ec81bb856b
- Trigger Event: release

File details

Details for the file nl_voting_data_scraper-0.3.0-py3-none-any.whl.

File metadata

Download URL: nl_voting_data_scraper-0.3.0-py3-none-any.whl
Upload date: Mar 18, 2026
Size: 30.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nl_voting_data_scraper-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8a473c2c1863c87f3623301d58e5be55245b5fad79607953ed235b583c303783`
MD5	`fe12130c7db861716ecf3e20c8b11598`
BLAKE2b-256	`08a0191dd49c30d75d6d0ed6166c15bec1597179dd09b7b61484dbc16eebae92`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nl_voting_data_scraper-0.3.0-py3-none-any.whl:

Publisher: publish.yml on rhnfzl/nl-voting-data-scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nl_voting_data_scraper-0.3.0-py3-none-any.whl
- Subject digest: 8a473c2c1863c87f3623301d58e5be55245b5fad79607953ed235b583c303783
- Sigstore transparency entry: 1123171967
- Sigstore integration time: Mar 18, 2026
Source repository:
- Permalink: rhnfzl/nl-voting-data-scraper@c554a2c64342ea5cd90cfc522454b3ec81bb856b
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/rhnfzl
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c554a2c64342ea5cd90cfc522454b3ec81bb856b
- Trigger Event: release

nl-voting-data-scraper 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

nl-voting-data-scraper

Key Features

Installation

Quick Start

CLI

Python Library

Supported Elections

Historical Extraction

How It Works

Output Layouts

Legacy layout

Engine layout

Output Format

CLI Options

Development

Release

Acknowledgements

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance