Skip to main content

Scrape Dutch voting advice (StemWijzer) data for any election

Project description

nl-voting-data-scraper

PyPI PyPI - Downloads Python package Python Versions License

Scrape Dutch voting advice (StemWijzer) data for any election: municipal, national, European, or provincial.

Outputs structured JSON with party positions, policy statements, and metadata. Reusable across election cycles.

Key Features

  • Hybrid scraping: API-first (fast HTTP) with Playwright browser automation fallback
  • Election-agnostic: Municipal, national (Tweede Kamer), European Parliament, and provincial elections
  • CLI + Library: Use from the command line or import in Python
  • Caching & resume: File-based cache for interrupted batch scrapes (258+ municipalities)
  • Rate limiting: Token-bucket rate limiter with exponential backoff
  • Base64/AES decoding: Handles encoded StemWijzer API responses automatically
  • Structured output: JSON format compatible with downstream vote guide applications

Installation

pip install nl-voting-data-scraper

For browser automation fallback (optional):

pip install "nl-voting-data-scraper[browser]"
playwright install chromium

Quick Start

CLI

# List known elections
nl-voting-data-scraper list-elections

# Scrape all municipalities for 2026 municipal elections
nl-voting-data-scraper scrape gr2026 -o ./output

# Scrape a specific municipality
nl-voting-data-scraper scrape gr2026 -m GM0014 -o ./output

# Scrape national election
nl-voting-data-scraper scrape tk2025 -o ./output

# List municipalities for an election
nl-voting-data-scraper list-municipalities gr2026

# Discover API endpoints
nl-voting-data-scraper discover gr2026

Python Library

import asyncio
from nl_voting_data_scraper import StemwijzerScraper

async def main():
    async with StemwijzerScraper("gr2026") as scraper:
        # Scrape a single municipality
        data = await scraper.scrape_one("GM0014")
        print(f"{data.votematch.name}: {len(data.parties)} parties, {len(data.statements)} statements")

        # Scrape all
        results = await scraper.scrape()
        print(f"Scraped {len(results)} entries")

asyncio.run(main())

Supported Elections

Slug Type Year Description
gr2026 Municipal 2026 Gemeenteraadsverkiezingen 2026
tk2025 National 2025 Tweede Kamerverkiezingen 2025
tk2023 National 2023 Tweede Kamerverkiezingen 2023
eu2024 European 2024 Europees Parlement 2024
ps2023 Provincial 2023 Provinciale Staten 2023

New elections are auto-detected from URL patterns. You can also pass custom election slugs.

How It Works

graph TD
    A[StemwijzerScraper\nOrchestrator] --> B[API Scraper\nPrimary]
    A --> C[Browser Scraper\nFallback]
    B --> D[HTTP fetch + base64 decode]
    C --> E[Playwright network intercept\nor DOM scraping]
    D --> F[Structured JSON\nper election]
    E --> F
  1. API-first (fast): Fetches data from StemWijzer data endpoints via HTTP. Handles base64-encoded responses and optional AES decryption.
  2. Browser fallback: If the API fails, uses Playwright to load the frontend, intercept network requests, and capture the data. Falls back to DOM extraction as a last resort.

Output Format

Each municipality/election produces a JSON file:

{
  "parties": [
    {
      "id": 206919,
      "name": "Party Name",
      "fullName": "Full Party Name",
      "website": "https://...",
      "hasSeats": true,
      "statements": [
        { "id": 206987, "position": "agree", "explanation": "..." }
      ]
    }
  ],
  "statements": [
    {
      "id": 206987,
      "theme": "Housing",
      "title": "The municipality should build more affordable housing.",
      "index": 1
    }
  ],
  "shootoutStatements": [...],
  "votematch": {
    "id": 206918,
    "name": "Municipality Name",
    "context": "2026GR",
    "remote_id": "GM0014",
    "langcode": "nl"
  }
}

CLI Options

nl-voting-data-scraper scrape ELECTION [OPTIONS]

Options:
  -m, --municipality TEXT   Specific GM codes (repeatable)
  -l, --language TEXT       Languages to scrape (default: nl)
  -o, --output TEXT         Output directory (default: ./output)
  --combined                Also write combined.json
  --rate-limit FLOAT        Requests per second (default: 2.0)
  --no-cache                Disable caching
  --resume                  Resume interrupted scrape
  --browser-only            Only use browser scraping
  --api-only                Only use API scraping
  -v, --verbose             Verbose output

Development

git clone https://github.com/rhnfzl/nl-voting-data-scraper.git
cd nl-voting-data-scraper
pip install -e ".[dev,browser]"
playwright install chromium
pytest

Acknowledgements

Inspired by afvanwoudenberg/stemwijzer.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nl_voting_data_scraper-0.2.0.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nl_voting_data_scraper-0.2.0-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file nl_voting_data_scraper-0.2.0.tar.gz.

File metadata

  • Download URL: nl_voting_data_scraper-0.2.0.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nl_voting_data_scraper-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0ac0222008c316cf342d347813f28a2264539113d199abf80a35f0633907612e
MD5 479947e79e4c72d5b297304ac300dfbe
BLAKE2b-256 bcc7eaf4186faee7a3455b1802e3e2402d11f1a5c3b1e00de017ca19c69c9dcd

See more details on using hashes here.

Provenance

The following attestation bundles were made for nl_voting_data_scraper-0.2.0.tar.gz:

Publisher: publish.yml on rhnfzl/nl-voting-data-scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nl_voting_data_scraper-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for nl_voting_data_scraper-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0757db9531e8fd8baebee4c6db7d7cece32c00a29efb687c570600bf595bd60d
MD5 493bb70d350ed77917ec343222bdff2b
BLAKE2b-256 80e3d2a3758eb479cb648babfbfdcb0d10f44c07f621dfc78c05547ca6da509f

See more details on using hashes here.

Provenance

The following attestation bundles were made for nl_voting_data_scraper-0.2.0-py3-none-any.whl:

Publisher: publish.yml on rhnfzl/nl-voting-data-scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page