Library for extracting rechtspraak data
Project description
Rechtspraak extractor
This library contains two functions to get rechtspraak data and metadata from the API.
Version
Python 3.9+
Contributors
|
Pranav Bapat |
running-machin |
Piotr Lewandowski |
shashankmc |
gijsvd |
How to install?
pip install rechtspraak_extractor
What are the functions?
get_rechtspraak
Gets all the ECLIs and saves them in the CSV file or in-memory.
get_rechtspraak_metadata
Gets the metadata of the ECLIs created by above function and saves them in the new CSV file or in-memory.
fetch_eclis_via_sqlite
Low-level function to look up a list of ECLIs directly from a local SQLite database and return a DataFrame.
Requires the
It gets, ECLI, title, summary, updated date, link.
Link attribute that we get from the above function contains the links of ECLI metadata.
It gets instantie, datum uitspraak, datum publicatie, zaaknummer, rechtsgebieden, bijzondere kenmerken, inhoudsindicatie, and vindplaatsen.
Supports two extraction methods:
method='api' (default, fetches live from Rechtspraak API)
and method='sqlite' (fetches from a local pre-built SQLite database — see below).
rechtspraak-lido-sqlite package to be installed and its database populated first
(see SQLite method below).
What are the parameters?
- get_rechtspraak(max_ecli=100, sd='2022-05-01', ed='2022-10-01', save_file='y') Parameters:
- max_ecli: int, optional Maximum amount of ECLIs to retrieve
- sd: date, optional, default '2022-08-01' The start publication date (yyyy-mm-dd)
- ed: date, optional, default current date The end publication date (yyyy-mm-dd)
- save_file: ['y', 'n'], default 'y' y - Save data as a CSV file in data folder
- get_rechtspraak_metadata(...)
- save_file: ['y', 'n'], default 'n' y - Save data as a CSV file in data folder
- dataframe: dataframe, optional Dataframe containing ECLIs to retrieve metadata. Cannot be combined with filename
- filename: string, optional CSV file containing ECLIs to retrieve metadata. Cannot be combined with dataframe
- method: ['api', 'sqlite'], default 'api' api - Fetch metadata live from the Rechtspraak API
- sqlite_db_path: string, default 'data/lido_metadata.db' Path to the SQLite database file. Only used when
- fallback_to_api: bool, default True When using
- multi_threading: bool, default True Use multi-threading for API-based metadata extraction. Set to False for single-threaded execution
- fetch_eclis_via_sqlite(ecli_list, sqlite_db_path, columns)
- ecli_list: list[str] List of ECLI identifiers to look up
- sqlite_db_path: string Path to the SQLite database file produced by
- columns: list[str] Column names to select from the database
Default: 100
n - Save data as a dataframe in-memory
n - Return data as a dataframe in-memory
sqlite - Fetch metadata from a local SQLite database (requires
rechtspraak-lido-sqlite)
method='sqlite'
method='sqlite', fall back to the live API for any ECLIs not found in the database
rechtspraak-lido-sqlite
Examples
Downloading ECLIs
import rechtspraak_extractor as rex
# Get rechtspraak data as a DataFrame (100 ECLIs since 2022-08-01)
df = rex.get_rechtspraak(max_ecli=100, sd="2022-08-01", save_file="n")
# Save rechtspraak data directly to CSV in the data/ folder
rex.get_rechtspraak(max_ecli=100, sd="2022-08-01", save_file="y")
Extracting metadata via the live API (default)
# Get metadata into a DataFrame from an existing DataFrame
df_metadata = rex.get_rechtspraak_metadata(save_file="n", dataframe=df)
# Get metadata into a DataFrame from a CSV produced by get_rechtspraak
df_metadata = rex.get_rechtspraak_metadata(save_file="n", filename="rechtspraak.csv")
# Produce metadata CSV from an in-memory DataFrame
rex.get_rechtspraak_metadata(save_file="y", dataframe=df)
# Produce metadata CSV from files already in data/ (processes all files)
rex.get_rechtspraak_metadata(save_file="y")
filenamerefers to a file in thedata/folder created byget_rechtspraak.dfis the DataFrame returned byget_rechtspraak.
Extracting metadata via SQLite (offline, faster)
The SQLite method fetches metadata from a local pre-built database instead of making live API calls. This is significantly faster for large batches and works offline.
Prerequisite: The rechtspraak-lido-sqlite package must be installed and its database must be built locally before using this method.
pip install rechtspraak-lido-sqlite
After installing, follow the rechtspraak-lido-sqlite instructions to build the local database (typically produces a file at data/lido.db or a path you configure).
Using get_rechtspraak_metadata with method='sqlite'
import rechtspraak_extractor as rex
df = rex.get_rechtspraak(max_ecli=500, sd="2025-01-01", save_file="n")
# Fetch metadata from local SQLite database
df_metadata = rex.get_rechtspraak_metadata(
save_file="n",
dataframe=df,
method="sqlite",
sqlite_db_path="data/lido.db", # path to the database built by rechtspraak-lido-sqlite
fallback_to_api=True, # fall back to live API for ECLIs not found in the DB
)
Using fetch_eclis_via_sqlite directly
from rechtspraak_extractor.rechtspraak_metadata import fetch_eclis_via_sqlite
eclis = ["ECLI:NL:HR:2023:1", "ECLI:NL:HR:2023:2"]
columns = ["ecli", "document_type", "date_decision", "instance", "full_text"]
df = fetch_eclis_via_sqlite(
ecli_list=eclis,
sqlite_db_path="data/lido.db",
columns=columns,
)
Note: If the database file does not exist or an ECLI is not found in it,
fetch_eclis_via_sqlitereturns an empty DataFrame rather than raising an error. Usefallback_to_api=Trueinget_rechtspraak_metadatato automatically cover missing ECLIs via the live API.
License
Previously under the MIT License, as of 28/10/2022 this work is licensed under a Apache License, Version 2.0.
Apache License, Version 2.0
Copyright (c) 2022 Maastricht Law & Tech Lab
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rechtspraak_extractor-1.6.0.tar.gz.
File metadata
- Download URL: rechtspraak_extractor-1.6.0.tar.gz
- Upload date:
- Size: 39.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
637dfa95eb631a43e5519eff6f0dc1da3cfc72a59455bf79365de47e6b828658
|
|
| MD5 |
eadef951346a707120690ec024624af5
|
|
| BLAKE2b-256 |
cb5efdf47c95513ee91f6ef1dc5ec5a34fbc76fce2d6d0edb76c961f79293978
|
Provenance
The following attestation bundles were made for rechtspraak_extractor-1.6.0.tar.gz:
Publisher:
github-actions.yml on maastrichtlawtech/rechtspraak-extractor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rechtspraak_extractor-1.6.0.tar.gz -
Subject digest:
637dfa95eb631a43e5519eff6f0dc1da3cfc72a59455bf79365de47e6b828658 - Sigstore transparency entry: 1643635941
- Sigstore integration time:
-
Permalink:
maastrichtlawtech/rechtspraak-extractor@94c2d12ec7eca02d23201889554f32b1b181dacf -
Branch / Tag:
refs/tags/v1.6.0 - Owner: https://github.com/maastrichtlawtech
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
github-actions.yml@94c2d12ec7eca02d23201889554f32b1b181dacf -
Trigger Event:
push
-
Statement type:
File details
Details for the file rechtspraak_extractor-1.6.0-py3-none-any.whl.
File metadata
- Download URL: rechtspraak_extractor-1.6.0-py3-none-any.whl
- Upload date:
- Size: 26.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1558685cff8c3463a8636a59c3e2a86fd3b8496e26eacb5bec8978034d23f54b
|
|
| MD5 |
bae49cda5ad345053a40a4165fd5f3a8
|
|
| BLAKE2b-256 |
695dea558028513965ea11ccf72a82ebde76ce8f1bc7db7e95ed1208f9722511
|
Provenance
The following attestation bundles were made for rechtspraak_extractor-1.6.0-py3-none-any.whl:
Publisher:
github-actions.yml on maastrichtlawtech/rechtspraak-extractor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rechtspraak_extractor-1.6.0-py3-none-any.whl -
Subject digest:
1558685cff8c3463a8636a59c3e2a86fd3b8496e26eacb5bec8978034d23f54b - Sigstore transparency entry: 1643635970
- Sigstore integration time:
-
Permalink:
maastrichtlawtech/rechtspraak-extractor@94c2d12ec7eca02d23201889554f32b1b181dacf -
Branch / Tag:
refs/tags/v1.6.0 - Owner: https://github.com/maastrichtlawtech
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
github-actions.yml@94c2d12ec7eca02d23201889554f32b1b181dacf -
Trigger Event:
push
-
Statement type: