Skip to main content

Swedish legal data collection tool

Project description

juris

Website PyPI License: MIT

Swedish Parliament

prop mot bet skr

Swedish Government

sou ds dir lagr sfs

Courts

nja ad hfd mod pmod

Authorities

jo jk foreskrift

EU Law

eu_reg eu_dir cjeu echr

A command-line tool for collecting and normalizing Swedish legal documents from official government sources.

Sweden has a wealth of public legal information — laws, government bills, public inquiries, court decisions — scattered across multiple government websites and APIs with inconsistent formats. juris collects documents from these sources, normalizes them into a unified format, and saves them as browsable, version-controlled files (Markdown + JSON). Think of it as a git-native open database for Swedish law.

Features

  • 8 data sources covering Swedish parliament, government, courts, authorities, and EU law
  • 21 document types from bills and motions to court decisions and EU regulations
  • Dual output format — Markdown (human-readable, browsable on GitHub) and JSON (machine-parseable)
  • Incremental collection with state tracking to resume where you left off
  • Async I/O with built-in rate limiting to respect source servers
  • PDF text extraction from document attachments
  • Date and session filtering for targeted collection

Data sources

Source Method Document types
Riksdagen JSON API prop, sou, mot, bet, dir, skr, sfs
Regeringen.se Web scraping prop, sou, ds, lagr, dir, skr
Domstolsverket REST API nja, ad, hfd, mod, pmod
JO Web scraping jo
JK Web scraping jk
Lagrummet Web scraping foreskrift
EUR-Lex SPARQL eu_reg, eu_dir
CURIA / HUDOC SPARQL / JSON API cjeu, echr

Document types

Swedish Parliament

Type Swedish English
prop Propositioner Government bills
mot Motioner Parliamentary motions
bet Betänkanden Committee reports
skr Skrivelser Government communications

Swedish Government

Type Swedish English
sou Statens offentliga utredningar State public inquiries
ds Departementsserien Department series
dir Kommittédirektiv Committee directives
lagr Lagrådsremisser Legal council referrals
sfs Svensk författningssamling Swedish Code of Statutes

Courts

Type Swedish English
nja Nytt Juridiskt Arkiv Supreme Court precedents
ad Arbetsdomstolens domar Labour Court decisions
hfd Högsta förvaltningsdomstolens årsbok Supreme Administrative Court
mod Mark- och miljööverdomstolen Land and Environment Court
pmod Patent- och marknadsöverdomstolen Patent and Market Court

Authorities

Type Swedish English
jo Justitieombudsmannens beslut Parliamentary Ombudsman decisions
jk Justitiekanslerns beslut Chancellor of Justice decisions
foreskrift Myndighetsföreskrifter Agency regulations

EU law

Type Swedish English
eu_reg EU-förordningar EU regulations
eu_dir EU-direktiv EU directives
cjeu EU-domstolens domar Court of Justice of the EU
echr Europadomstolens domar European Court of Human Rights

Installation

pip install -e .

Requires Python 3.11 or later.

Usage

# Collect government bills from the 2024/25 parliamentary session
juris collect riksdagen --type prop --session 2024/25

# Collect SOU reports published since a specific date
juris collect riksdagen --type sou --since 2024-01-01

# Collect from the government website with a limit
juris collect regeringen --type prop --session 2024/25 --limit 5

# Collect Supreme Court decisions
juris collect domstol --type nja --since 2024-01-01

# Collect agency regulations
juris collect lagrummet --type foreskrift --limit 10

# Collect EU regulations
juris collect eur_lex --type eu_reg --since 2024-01-01

# Check collection progress
juris status

# Count collected documents
juris stats

Options

Option Description
--type TYPE Document type to collect (required)
--session SESSION Parliamentary session, e.g. 2024/25
--since DATE Collect documents from this date (YYYY-MM-DD)
--until DATE Collect documents until this date (YYYY-MM-DD)
--limit N Maximum number of documents to collect
--skip-existing / --no-skip-existing Skip already collected documents (default: on)
--skip-content / --no-skip-content Metadata only, skip full text (default: off)
--data-dir PATH Output directory (default: data)
-v, --verbose Enable debug logging

Output format

Each document is saved in two formats:

Markdown (human-readable, browsable on GitHub):

---
doc_id: "prop-2024/25:208"
doc_type: prop
title: "Ett mer heltäckande straffansvar vid angrepp på företagshemligheter"
date: "2025-09-08"
source: riksdagen
department: Justitiedepartementet
session: "2024/25"
source_url: "https://..."
---

# Ett mer heltäckande straffansvar vid angrepp på företagshemligheter

Proposition 2024/25:208

[full text...]

JSON (machine-readable, full metadata):

{
  "doc_id": "prop-2024/25:208",
  "doc_type": "prop",
  "title": "Ett mer heltäckande straffansvar...",
  "date": "2025-09-08",
  "text": "...",
  "source": "riksdagen",
  "attachments": [...]
}

Documents are organized by type and session:

data/
├── prop/
│   └── 2024-25/
│       ├── prop-2024-25_208.json
│       └── prop-2024-25_208.md
├── sou/
│   └── 2024/
├── nja/
└── .state/

Project structure

src/juris/
├── cli.py              # Command-line interface (Click)
├── models.py           # Document data models (Pydantic)
├── storage.py          # File storage (JSON + Markdown)
├── state.py            # Incremental collection state
├── pdf.py              # PDF text extraction
├── utils.py            # Shared utilities
└── collectors/
    ├── base.py         # Abstract base collector
    ├── riksdagen.py    # Riksdagen API
    ├── regeringen.py   # Regeringen.se scraper
    ├── domstol.py      # Court decisions API
    ├── jo_jk.py        # JO/JK decisions
    ├── lagrummet.py    # Agency regulations
    ├── eurlex.py       # EUR-Lex SPARQL
    ├── curia.py        # CJEU SPARQL
    └── hudoc.py        # ECtHR API

Development

# Install with dev dependencies (or use: make install)
pip install -e ".[dev]"

# Lint and format check
ruff check src/ tests/
ruff format --check src/ tests/

# Type check (strict mode)
mypy src/

# Run unit tests
pytest tests/ --ignore=tests/test_e2e.py

# Or use the Makefile shortcuts
make lint        # Lint + format check
make typecheck   # Type check
make test        # Unit tests
make format      # Auto-format code
make test-e2e    # End-to-end tests (hits live APIs)

Contributing

See CONTRIBUTING.md for development setup, coding standards, and how to add new collectors.

Please report security vulnerabilities via GitHub's private reporting — see SECURITY.md for details.

This project follows the Contributor Covenant v2.1.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

juris-0.3.0.tar.gz (157.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

juris-0.3.0-py3-none-any.whl (84.5 kB view details)

Uploaded Python 3

File details

Details for the file juris-0.3.0.tar.gz.

File metadata

  • Download URL: juris-0.3.0.tar.gz
  • Upload date:
  • Size: 157.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for juris-0.3.0.tar.gz
Algorithm Hash digest
SHA256 647ce4a84c7982ab83869c68fd65bdb7883ab90e9da0e4873106e7141ceb2417
MD5 3db03363a17344c745520c7c3814b344
BLAKE2b-256 18bf9bebc21b4cd69893ed7816ac3a3a42d9b055474d949e6cb655c72cb7775e

See more details on using hashes here.

File details

Details for the file juris-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: juris-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 84.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for juris-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e90b50badde25082d68fe3d0a37c510a3c5796682e86159ea4e315181ba7b751
MD5 32196d2136f1d6eb6068ccd386b52dda
BLAKE2b-256 acfc35968d72842291bf0ef2813b39ffe795557131d6d475cc811b658992b15c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page