Swedish legal data collection tool
Project description
juris
Swedish Parliament
Swedish Government
Courts
Authorities
EU Law
A command-line tool for collecting and normalizing Swedish legal documents from official government sources.
Sweden has a wealth of public legal information — laws, government bills, public inquiries, court decisions — scattered across multiple government websites and APIs with inconsistent formats. juris collects documents from these sources, normalizes them into a unified format, and saves them as browsable, version-controlled files (Markdown + JSON). Think of it as a git-native open database for Swedish law.
Features
- 8 data sources covering Swedish parliament, government, courts, authorities, and EU law
- 21 document types from bills and motions to court decisions and EU regulations
- Dual output format — Markdown (human-readable, browsable on GitHub) and JSON (machine-parseable)
- Incremental collection with state tracking to resume where you left off
- Async I/O with built-in rate limiting to respect source servers
- PDF text extraction from document attachments
- Date and session filtering for targeted collection
Data sources
| Source | Method | Document types |
|---|---|---|
| Riksdagen | JSON API | prop, sou, mot, bet, dir, skr, sfs |
| Regeringen.se | Web scraping | prop, sou, ds, lagr, dir, skr |
| Domstolsverket | REST API | nja, ad, hfd, mod, pmod |
| JO | Web scraping | jo |
| JK | Web scraping | jk |
| Lagrummet | Web scraping | foreskrift |
| EUR-Lex | SPARQL | eu_reg, eu_dir |
| CURIA / HUDOC | SPARQL / JSON API | cjeu, echr |
Document types
Swedish Parliament
| Type | Swedish | English |
|---|---|---|
prop |
Propositioner | Government bills |
mot |
Motioner | Parliamentary motions |
bet |
Betänkanden | Committee reports |
skr |
Skrivelser | Government communications |
Swedish Government
| Type | Swedish | English |
|---|---|---|
sou |
Statens offentliga utredningar | State public inquiries |
ds |
Departementsserien | Department series |
dir |
Kommittédirektiv | Committee directives |
lagr |
Lagrådsremisser | Legal council referrals |
sfs |
Svensk författningssamling | Swedish Code of Statutes |
Courts
| Type | Swedish | English |
|---|---|---|
nja |
Nytt Juridiskt Arkiv | Supreme Court precedents |
ad |
Arbetsdomstolens domar | Labour Court decisions |
hfd |
Högsta förvaltningsdomstolens årsbok | Supreme Administrative Court |
mod |
Mark- och miljööverdomstolen | Land and Environment Court |
pmod |
Patent- och marknadsöverdomstolen | Patent and Market Court |
Authorities
| Type | Swedish | English |
|---|---|---|
jo |
Justitieombudsmannens beslut | Parliamentary Ombudsman decisions |
jk |
Justitiekanslerns beslut | Chancellor of Justice decisions |
foreskrift |
Myndighetsföreskrifter | Agency regulations |
EU law
| Type | Swedish | English |
|---|---|---|
eu_reg |
EU-förordningar | EU regulations |
eu_dir |
EU-direktiv | EU directives |
cjeu |
EU-domstolens domar | Court of Justice of the EU |
echr |
Europadomstolens domar | European Court of Human Rights |
Installation
pip install -e .
Requires Python 3.11 or later.
Usage
# Collect government bills from the 2024/25 parliamentary session
juris collect riksdagen --type prop --session 2024/25
# Collect SOU reports published since a specific date
juris collect riksdagen --type sou --since 2024-01-01
# Collect from the government website with a limit
juris collect regeringen --type prop --session 2024/25 --limit 5
# Collect Supreme Court decisions
juris collect domstol --type nja --since 2024-01-01
# Collect agency regulations
juris collect lagrummet --type foreskrift --limit 10
# Collect EU regulations
juris collect eur_lex --type eu_reg --since 2024-01-01
# Check collection progress
juris status
# Count collected documents
juris stats
Options
| Option | Description |
|---|---|
--type TYPE |
Document type to collect (required) |
--session SESSION |
Parliamentary session, e.g. 2024/25 |
--since DATE |
Collect documents from this date (YYYY-MM-DD) |
--until DATE |
Collect documents until this date (YYYY-MM-DD) |
--limit N |
Maximum number of documents to collect |
--skip-existing / --no-skip-existing |
Skip already collected documents (default: on) |
--skip-content / --no-skip-content |
Metadata only, skip full text (default: off) |
--data-dir PATH |
Output directory (default: data) |
-v, --verbose |
Enable debug logging |
Output format
Each document is saved in two formats:
Markdown (human-readable, browsable on GitHub):
---
doc_id: "prop-2024/25:208"
doc_type: prop
title: "Ett mer heltäckande straffansvar vid angrepp på företagshemligheter"
date: "2025-09-08"
source: riksdagen
department: Justitiedepartementet
session: "2024/25"
source_url: "https://..."
---
# Ett mer heltäckande straffansvar vid angrepp på företagshemligheter
Proposition 2024/25:208
[full text...]
JSON (machine-readable, full metadata):
{
"doc_id": "prop-2024/25:208",
"doc_type": "prop",
"title": "Ett mer heltäckande straffansvar...",
"date": "2025-09-08",
"text": "...",
"source": "riksdagen",
"attachments": [...]
}
Documents are organized by type and session:
data/
├── prop/
│ └── 2024-25/
│ ├── prop-2024-25_208.json
│ └── prop-2024-25_208.md
├── sou/
│ └── 2024/
├── nja/
└── .state/
Project structure
src/juris/
├── cli.py # Command-line interface (Click)
├── models.py # Document data models (Pydantic)
├── storage.py # File storage (JSON + Markdown)
├── state.py # Incremental collection state
├── pdf.py # PDF text extraction
├── utils.py # Shared utilities
└── collectors/
├── base.py # Abstract base collector
├── riksdagen.py # Riksdagen API
├── regeringen.py # Regeringen.se scraper
├── domstol.py # Court decisions API
├── jo_jk.py # JO/JK decisions
├── lagrummet.py # Agency regulations
├── eurlex.py # EUR-Lex SPARQL
├── curia.py # CJEU SPARQL
└── hudoc.py # ECtHR API
Development
# Install with dev dependencies (or use: make install)
pip install -e ".[dev]"
# Lint and format check
ruff check src/ tests/
ruff format --check src/ tests/
# Type check (strict mode)
mypy src/
# Run unit tests
pytest tests/ --ignore=tests/test_e2e.py
# Or use the Makefile shortcuts
make lint # Lint + format check
make typecheck # Type check
make test # Unit tests
make format # Auto-format code
make test-e2e # End-to-end tests (hits live APIs)
Contributing
See CONTRIBUTING.md for development setup, coding standards, and how to add new collectors.
Please report security vulnerabilities via GitHub's private reporting — see SECURITY.md for details.
This project follows the Contributor Covenant v2.1.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file juris-0.3.0.tar.gz.
File metadata
- Download URL: juris-0.3.0.tar.gz
- Upload date:
- Size: 157.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
647ce4a84c7982ab83869c68fd65bdb7883ab90e9da0e4873106e7141ceb2417
|
|
| MD5 |
3db03363a17344c745520c7c3814b344
|
|
| BLAKE2b-256 |
18bf9bebc21b4cd69893ed7816ac3a3a42d9b055474d949e6cb655c72cb7775e
|
File details
Details for the file juris-0.3.0-py3-none-any.whl.
File metadata
- Download URL: juris-0.3.0-py3-none-any.whl
- Upload date:
- Size: 84.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e90b50badde25082d68fe3d0a37c510a3c5796682e86159ea4e315181ba7b751
|
|
| MD5 |
32196d2136f1d6eb6068ccd386b52dda
|
|
| BLAKE2b-256 |
acfc35968d72842291bf0ef2813b39ffe795557131d6d475cc811b658992b15c
|