Build a local SQLite database from the lido-export.ttl.gz linked-data export for use with rechtspraak-extractor
Project description
rechtspraak-ldio-sqlite
Self-hosted pipeline that downloads the Dutch LiDO linked-data export and converts it into a local SQLite database.
The resulting database is a drop-in backend for the rechtspraak-extractor package: its fetch_eclis_via_sqlite() function queries the metadata table built here, avoiding any dependency on a live SPARQL endpoint or the Rechtspraak API.
Prerequisites
| Requirement | Version | Notes |
|---|---|---|
| Python | ≥ 3.10 | |
serdi or rapper |
any | Converts the source Turtle file to N-Triples in lax mode; required because the source file contains syntax violations that strict parsers reject |
Install the system converter:
# macOS
brew install serd # provides serdi (recommended)
# or
brew install raptor # provides rapper
# Ubuntu / Debian
sudo apt install serdi
# or
sudo apt install raptor2-utils
Installation
pip install rechtspraak-lido-sqlite
Or from source:
git clone https://github.com/shashankmc/rechtspraak-lido-sqlite
cd rechtspraak-lido-sqlite
pip install -e .
Quick start
# 1. Download (~3 GB compressed) and build the SQLite database in one step
build-lido-sqlite --download
# 2. Verify the result
test-lido-query
# 3. Inspect the first N lines (useful for debugging)
inspect-lido --mode cases --lines 500000
All three commands are also available as make targets:
make install-tools # brew install serd
make download # download + build → data/lido.db
make build # build from an already-downloaded file
CLI reference
build-lido-sqlite
build-lido-sqlite [--input PATH] [--output PATH] [--download]
--input PATH Source .ttl.gz file (default: data/lido-export.ttl.gz)
--output PATH SQLite output file (default: data/lido.db)
--download Download the source file before building
inspect-lido
Shows subject URI patterns and predicates from a sample of the file — useful for verifying the predicate map or understanding the data structure.
inspect-lido [--input PATH] [--mode subjects|cases] [--skip N] [--lines N]
--mode subjects Show subject URI distribution and top predicates (default)
--mode cases Find ECLI subjects and case-link objects
--skip N Skip the first N N-Triple lines before sampling
--lines N Maximum lines to sample (default: 200 000)
test-lido-query
Runs the exact SQL query used by fetch_eclis_via_sqlite() and prints results plus per-column fill rates.
test-lido-query [--db PATH] [--ecli ECLI ...]
--db PATH SQLite database (default: data/lido.db)
--ecli ECLI ECLI to look up (repeat for multiple; omit to sample 5 rows)
Integration with rechtspraak-extractor
from rechtspraak_extractor import rechtspraak_metadata as rm
df = rm.fetch_eclis_via_sqlite(
ecli_list=["ECLI:NL:HR:2010:BN2349", "ECLI:NL:RBAMS:2023:1234"],
sqlite_db_path="data/lido.db",
columns=rm.METADATA_COLUMNS,
)
print(df.head())
The metadata table has 25 columns matching the MAP_RS keys in rechtspraak-extractor:
| Column | Source predicate | Notes |
|---|---|---|
ecli |
dcterms:identifier |
|
issued |
dcterms:issued |
date of publication on Rechtspraak.nl |
language |
dcterms:language |
|
creator |
dcterms:creator / lx:creator |
name of court |
date_decision |
lido:heeftUitspraakdatum |
date of court decision |
zaaknummer |
lido:heeftZaaknummer |
internal case number |
type |
dcterms:type |
uitspraak or conclusie |
procedure |
lido:heeftProceduresoort |
procedure type |
spatial |
dcterms:spatial |
court municipality |
subject |
lido:heeftRechtsgebied |
area of law |
relation |
dcterms:isReplacedBy / dcterms:replaces |
predecessor/successor cases |
references |
— | applicable legislation titles; empty (not in lido) |
hasVersion |
dcterms:hasVersion / lx:hasVersion |
alternative publishers |
link |
constructed from ECLI | deeplink to Rechtspraak.nl |
title |
dcterms:title |
|
inhoudsindicatie |
— | case summary; empty (not in lido) |
info |
lido:heeftBron |
source information |
full_text |
— | full case text; empty (not in lido) |
jurisdiction_country |
— | country; empty (added by downstream script) |
source |
— | "Rechtspraak" (static) |
citations_incoming |
— | cases citing this case; empty (reverse relation, not in lido) |
citations_outgoing |
lido:refereertAan |
cases cited by this case |
legislations_cited |
lido:linkt |
legislation cited |
summary |
— | empty (not in lido) |
bwb_id |
— | BWB legislation ID; empty (not in lido) |
How it works
- Download —
src/download.pystreamslido-export.ttl.gzfromlinkeddata.overheid.nlwith a progress bar. - Convert —
src/parse.pypipes the decompressed Turtle throughserdi -l(lax mode) orrapper -qto produce N-Triples, working around systematic syntax violations in the source file (invalid escape sequences, non-IRI characters). - Parse — Each N-Triple line is parsed by pyoxigraph. Triples whose subject matches
linkeddata.overheid.nl/terms/jurisprudentie/id/ECLI:…are accumulated per case. - Insert — Cases are batch-inserted into the
metadatatable in SQLite (10 000 rows per transaction).
Development
pip install -e .
python test_query.py --db data/lido.db
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rechtspraak_lido_sqlite-0.1.0.tar.gz.
File metadata
- Download URL: rechtspraak_lido_sqlite-0.1.0.tar.gz
- Upload date:
- Size: 12.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7bdbedb19ae6cf1405562aa796d5bed5a28b6d5393b033a668e716460b98c7dc
|
|
| MD5 |
a3e7e5230ec6a464b7094667a46faa92
|
|
| BLAKE2b-256 |
e1775a15b066c112bf3a29fe72725bbabf0c9e80c2494f087101b44c823c34a3
|
Provenance
The following attestation bundles were made for rechtspraak_lido_sqlite-0.1.0.tar.gz:
Publisher:
publish.yml on maastrichtlawtech/rechtspraak-lido-sqlite
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rechtspraak_lido_sqlite-0.1.0.tar.gz -
Subject digest:
7bdbedb19ae6cf1405562aa796d5bed5a28b6d5393b033a668e716460b98c7dc - Sigstore transparency entry: 1567587587
- Sigstore integration time:
-
Permalink:
maastrichtlawtech/rechtspraak-lido-sqlite@6effa108d384bc2d9882516fb639675457476ba6 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/maastrichtlawtech
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6effa108d384bc2d9882516fb639675457476ba6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file rechtspraak_lido_sqlite-0.1.0-py3-none-any.whl.
File metadata
- Download URL: rechtspraak_lido_sqlite-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
965a2f2019fdfcee78bd1615f73e192c655ccaaff3723dfffbad47a7db4ba0c4
|
|
| MD5 |
fbb700ebaf7e0cdf9939f2d3af77906d
|
|
| BLAKE2b-256 |
0c9274c61b37c9e981bcc23cd2e81d6801f42653935047aa3cbe07f675ff9d5f
|
Provenance
The following attestation bundles were made for rechtspraak_lido_sqlite-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on maastrichtlawtech/rechtspraak-lido-sqlite
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rechtspraak_lido_sqlite-0.1.0-py3-none-any.whl -
Subject digest:
965a2f2019fdfcee78bd1615f73e192c655ccaaff3723dfffbad47a7db4ba0c4 - Sigstore transparency entry: 1567587614
- Sigstore integration time:
-
Permalink:
maastrichtlawtech/rechtspraak-lido-sqlite@6effa108d384bc2d9882516fb639675457476ba6 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/maastrichtlawtech
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6effa108d384bc2d9882516fb639675457476ba6 -
Trigger Event:
push
-
Statement type: