Skip to main content

Swedish legal data collection tool

Project description

juris

Website PyPI License: MIT

Swedish Parliament

prop mot bet skr

Swedish Government

sou ds dir lagr sfs

Courts

nja ad hfd mod pmod

Authorities

jo jk foreskrift

EU Law

eu_reg eu_dir cjeu echr

A command-line tool for collecting and normalizing Swedish legal documents from official government sources.

Sweden has a wealth of public legal information — laws, government bills, public inquiries, court decisions — scattered across multiple government websites and APIs with inconsistent formats. juris collects documents from these sources, normalizes them into a unified format, and saves them as browsable, version-controlled files (Markdown + JSON). Think of it as a git-native open database for Swedish law.

Features

  • 8 data sources covering Swedish parliament, government, courts, authorities, and EU law
  • 21 document types from bills and motions to court decisions and EU regulations
  • Dual output format — Markdown (human-readable, browsable on GitHub) and JSON (machine-parseable)
  • Incremental collection with state tracking to resume where you left off
  • Async I/O with built-in rate limiting to respect source servers
  • PDF text extraction from document attachments
  • Date and session filtering for targeted collection

Data sources

Source Method Document types
Riksdagen JSON API prop, sou, mot, bet, dir, skr, sfs
Regeringen.se Web scraping prop, sou, ds, lagr, dir, skr
Domstolsverket REST API nja, ad, hfd, mod, pmod
JO Web scraping jo
JK Web scraping jk
Lagrummet Web scraping foreskrift
EUR-Lex SPARQL eu_reg, eu_dir
CURIA / HUDOC SPARQL / JSON API cjeu, echr

Document types

Swedish Parliament

Type Swedish English
prop Propositioner Government bills
mot Motioner Parliamentary motions
bet Betänkanden Committee reports
skr Skrivelser Government communications

Swedish Government

Type Swedish English
sou Statens offentliga utredningar State public inquiries
ds Departementsserien Department series
dir Kommittédirektiv Committee directives
lagr Lagrådsremisser Legal council referrals
sfs Svensk författningssamling Swedish Code of Statutes

Courts

Type Swedish English
nja Nytt Juridiskt Arkiv Supreme Court precedents
ad Arbetsdomstolens domar Labour Court decisions
hfd Högsta förvaltningsdomstolens årsbok Supreme Administrative Court
mod Mark- och miljööverdomstolen Land and Environment Court
pmod Patent- och marknadsöverdomstolen Patent and Market Court

Authorities

Type Swedish English
jo Justitieombudsmannens beslut Parliamentary Ombudsman decisions
jk Justitiekanslerns beslut Chancellor of Justice decisions
foreskrift Myndighetsföreskrifter Agency regulations

EU law

Type Swedish English
eu_reg EU-förordningar EU regulations
eu_dir EU-direktiv EU directives
cjeu EU-domstolens domar Court of Justice of the EU
echr Europadomstolens domar European Court of Human Rights

Installation

pip install -e .

Requires Python 3.11 or later.

Usage

# Collect government bills from the 2024/25 parliamentary session
juris collect riksdagen --type prop --session 2024/25

# Collect SOU reports published since a specific date
juris collect riksdagen --type sou --since 2024-01-01

# Collect from the government website with a limit
juris collect regeringen --type prop --session 2024/25 --limit 5

# Collect Supreme Court decisions
juris collect domstol --type nja --since 2024-01-01

# Collect agency regulations
juris collect lagrummet --type foreskrift --limit 10

# Collect EU regulations
juris collect eur_lex --type eu_reg --since 2024-01-01

# Check collection progress
juris status

# Count collected documents
juris stats

Options

Option Description
--type TYPE Document type to collect (required)
--session SESSION Parliamentary session, e.g. 2024/25
--since DATE Collect documents from this date (YYYY-MM-DD)
--until DATE Collect documents until this date (YYYY-MM-DD)
--limit N Maximum number of documents to collect
--skip-existing / --no-skip-existing Skip already collected documents (default: on)
--skip-content / --no-skip-content Metadata only, skip full text (default: off)
--data-dir PATH Output directory (default: data)
-v, --verbose Enable debug logging

Output format

Each document is saved in two formats:

Markdown (human-readable, browsable on GitHub):

---
doc_id: "prop-2024/25:208"
doc_type: prop
title: "Ett mer heltäckande straffansvar vid angrepp på företagshemligheter"
date: "2025-09-08"
source: riksdagen
department: Justitiedepartementet
session: "2024/25"
source_url: "https://..."
---

# Ett mer heltäckande straffansvar vid angrepp på företagshemligheter

Proposition 2024/25:208

[full text...]

JSON (machine-readable, full metadata):

{
  "doc_id": "prop-2024/25:208",
  "doc_type": "prop",
  "title": "Ett mer heltäckande straffansvar...",
  "date": "2025-09-08",
  "text": "...",
  "source": "riksdagen",
  "attachments": [...]
}

Documents are organized by type and session:

data/
├── prop/
│   └── 2024-25/
│       ├── prop-2024-25_208.json
│       └── prop-2024-25_208.md
├── sou/
│   └── 2024/
├── nja/
└── .state/

Project structure

src/juris/
├── cli.py              # Command-line interface (Click)
├── models.py           # Document data models (Pydantic)
├── storage.py          # File storage (JSON + Markdown)
├── state.py            # Incremental collection state
├── pdf.py              # PDF text extraction
├── utils.py            # Shared utilities
└── collectors/
    ├── base.py         # Abstract base collector
    ├── riksdagen.py    # Riksdagen API
    ├── regeringen.py   # Regeringen.se scraper
    ├── domstol.py      # Court decisions API
    ├── jo_jk.py        # JO/JK decisions
    ├── lagrummet.py    # Agency regulations
    ├── eurlex.py       # EUR-Lex SPARQL
    ├── curia.py        # CJEU SPARQL
    └── hudoc.py        # ECtHR API

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Lint
ruff check src/

# Type check
mypy src/

Contributing

See CONTRIBUTING.md for development setup, coding standards, and how to add new collectors.

Please report security vulnerabilities via GitHub's private reporting — see SECURITY.md for details.

This project follows the Contributor Covenant v2.1.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

juris-0.2.0.tar.gz (136.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

juris-0.2.0-py3-none-any.whl (71.4 kB view details)

Uploaded Python 3

File details

Details for the file juris-0.2.0.tar.gz.

File metadata

  • Download URL: juris-0.2.0.tar.gz
  • Upload date:
  • Size: 136.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for juris-0.2.0.tar.gz
Algorithm Hash digest
SHA256 dfe06b461da559c317b1f118aec8ac774f744c4466bc177c37df6a38b796a124
MD5 dc2541d05c6e2e7698f156ffed1e417e
BLAKE2b-256 0845710263ba8a760554c89d3a5d702a9d83cb51cb018d213aaf1be99d64e21d

See more details on using hashes here.

File details

Details for the file juris-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: juris-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 71.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for juris-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f65930783945d02707be05838e7928bc1533932d0e9319a07d724b3952e2a84a
MD5 749ca0d1a09d13f5a8955f7231d7e612
BLAKE2b-256 848d723f9e7ec6260c575873d5e0b693840a91ee9b83049fae6fa912e9277238

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page