Skip to main content

Build a local SQLite database from the lido-export.ttl.gz linked-data export for use with rechtspraak-extractor

Project description

rechtspraak-ldio-sqlite

Self-hosted pipeline that downloads the Dutch LiDO linked-data export and converts it into a local SQLite database.

The resulting database is a drop-in backend for the rechtspraak-extractor package: its fetch_eclis_via_sqlite() function queries the metadata table built here, avoiding any dependency on a live SPARQL endpoint or the Rechtspraak API.


Prerequisites

Requirement Version Notes
Python ≥ 3.10
serdi or rapper any Converts the source Turtle file to N-Triples in lax mode; required because the source file contains syntax violations that strict parsers reject

Install the system converter:

# macOS
brew install serd          # provides serdi (recommended)
# or
brew install raptor        # provides rapper

# Ubuntu / Debian
sudo apt install serdi
# or
sudo apt install raptor2-utils

Installation

pip install rechtspraak-lido-sqlite

Or from source:

git clone https://github.com/shashankmc/rechtspraak-lido-sqlite
cd rechtspraak-lido-sqlite
pip install -e .

Quick start

# 1. Download (~3 GB compressed) and build the SQLite database in one step
build-lido-sqlite --download

# 2. Verify the result
test-lido-query

# 3. Inspect the first N lines (useful for debugging)
inspect-lido --mode cases --lines 500000

All three commands are also available as make targets:

make install-tools   # brew install serd
make download        # download + build  →  data/lido.db
make build           # build from an already-downloaded file

CLI reference

build-lido-sqlite

build-lido-sqlite [--input PATH] [--output PATH] [--download]

  --input   PATH   Source .ttl.gz file  (default: data/lido-export.ttl.gz)
  --output  PATH   SQLite output file   (default: data/lido.db)
  --download       Download the source file before building

inspect-lido

Shows subject URI patterns and predicates from a sample of the file — useful for verifying the predicate map or understanding the data structure.

inspect-lido [--input PATH] [--mode subjects|cases] [--skip N] [--lines N]

  --mode subjects   Show subject URI distribution and top predicates (default)
  --mode cases      Find ECLI subjects and case-link objects
  --skip  N         Skip the first N N-Triple lines before sampling
  --lines N         Maximum lines to sample (default: 200 000)

test-lido-query

Runs the exact SQL query used by fetch_eclis_via_sqlite() and prints results plus per-column fill rates.

test-lido-query [--db PATH] [--ecli ECLI ...]

  --db    PATH   SQLite database  (default: data/lido.db)
  --ecli  ECLI   ECLI to look up (repeat for multiple; omit to sample 5 rows)

Integration with rechtspraak-extractor

from rechtspraak_extractor import rechtspraak_metadata as rm

df = rm.fetch_eclis_via_sqlite(
    ecli_list=["ECLI:NL:HR:2010:BN2349", "ECLI:NL:RBAMS:2023:1234"],
    sqlite_db_path="data/lido.db",
    columns=rm.METADATA_COLUMNS,
)
print(df.head())

The metadata table has 25 columns matching the MAP_RS keys in rechtspraak-extractor:

Column Source predicate Notes
ecli dcterms:identifier
issued dcterms:issued date of publication on Rechtspraak.nl
language dcterms:language
creator dcterms:creator / lx:creator name of court
date_decision lido:heeftUitspraakdatum date of court decision
zaaknummer lido:heeftZaaknummer internal case number
type dcterms:type uitspraak or conclusie
procedure lido:heeftProceduresoort procedure type
spatial dcterms:spatial court municipality
subject lido:heeftRechtsgebied area of law
relation dcterms:isReplacedBy / dcterms:replaces predecessor/successor cases
references applicable legislation titles; empty (not in lido)
hasVersion dcterms:hasVersion / lx:hasVersion alternative publishers
link constructed from ECLI deeplink to Rechtspraak.nl
title dcterms:title
inhoudsindicatie case summary; empty (not in lido)
info lido:heeftBron source information
full_text full case text; empty (not in lido)
jurisdiction_country country; empty (added by downstream script)
source "Rechtspraak" (static)
citations_incoming cases citing this case; empty (reverse relation, not in lido)
citations_outgoing lido:refereertAan cases cited by this case
legislations_cited lido:linkt legislation cited
summary empty (not in lido)
bwb_id BWB legislation ID; empty (not in lido)

How it works

  1. Downloadsrc/download.py streams lido-export.ttl.gz from linkeddata.overheid.nl with a progress bar.
  2. Convertsrc/parse.py pipes the decompressed Turtle through serdi -l (lax mode) or rapper -q to produce N-Triples, working around systematic syntax violations in the source file (invalid escape sequences, non-IRI characters).
  3. Parse — Each N-Triple line is parsed by pyoxigraph. Triples whose subject matches linkeddata.overheid.nl/terms/jurisprudentie/id/ECLI:… are accumulated per case.
  4. Insert — Cases are batch-inserted into the metadata table in SQLite (10 000 rows per transaction).

Development

pip install -e .
python test_query.py --db data/lido.db

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rechtspraak_lido_sqlite-0.1.0.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rechtspraak_lido_sqlite-0.1.0-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file rechtspraak_lido_sqlite-0.1.0.tar.gz.

File metadata

  • Download URL: rechtspraak_lido_sqlite-0.1.0.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rechtspraak_lido_sqlite-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7bdbedb19ae6cf1405562aa796d5bed5a28b6d5393b033a668e716460b98c7dc
MD5 a3e7e5230ec6a464b7094667a46faa92
BLAKE2b-256 e1775a15b066c112bf3a29fe72725bbabf0c9e80c2494f087101b44c823c34a3

See more details on using hashes here.

Provenance

The following attestation bundles were made for rechtspraak_lido_sqlite-0.1.0.tar.gz:

Publisher: publish.yml on maastrichtlawtech/rechtspraak-lido-sqlite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rechtspraak_lido_sqlite-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for rechtspraak_lido_sqlite-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 965a2f2019fdfcee78bd1615f73e192c655ccaaff3723dfffbad47a7db4ba0c4
MD5 fbb700ebaf7e0cdf9939f2d3af77906d
BLAKE2b-256 0c9274c61b37c9e981bcc23cd2e81d6801f42653935047aa3cbe07f675ff9d5f

See more details on using hashes here.

Provenance

The following attestation bundles were made for rechtspraak_lido_sqlite-0.1.0-py3-none-any.whl:

Publisher: publish.yml on maastrichtlawtech/rechtspraak-lido-sqlite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page