Skip to main content

Library for extracting rechtspraak data

Project description

Rechtspraak extractor

This library contains two functions to get rechtspraak data and metadata from the API.

Version

Python 3.9+

Contributors

pranavnbapat
Pranav Bapat
running-machin
running-machin
Cloud956
Piotr Lewandowski
shashankmc
shashankmc
gijsvd
gijsvd

How to install?

pip install rechtspraak_extractor

What are the functions?

  • Rechtspraak Extractor
    1. get_rechtspraak
    2. Gets all the ECLIs and saves them in the CSV file or in-memory.
      It gets, ECLI, title, summary, updated date, link.
    3. get_rechtspraak_metadata
    4. Gets the metadata of the ECLIs created by above function and saves them in the new CSV file or in-memory.
      Link attribute that we get from the above function contains the links of ECLI metadata.
      It gets instantie, datum uitspraak, datum publicatie, zaaknummer, rechtsgebieden, bijzondere kenmerken, inhoudsindicatie, and vindplaatsen.
      Supports two extraction methods: method='api' (default, fetches live from Rechtspraak API) and method='sqlite' (fetches from a local pre-built SQLite database — see below).
    5. fetch_eclis_via_sqlite
    6. Low-level function to look up a list of ECLIs directly from a local SQLite database and return a DataFrame. Requires the rechtspraak-lido-sqlite package to be installed and its database populated first (see SQLite method below).
  • What are the parameters?

    1. get_rechtspraak(max_ecli=100, sd='2022-05-01', ed='2022-10-01', save_file='y')
    2. Parameters:
      • max_ecli: int, optional
      • Maximum amount of ECLIs to retrieve
        Default: 100
      • sd: date, optional, default '2022-08-01'
      • The start publication date (yyyy-mm-dd)
      • ed: date, optional, default current date
      • The end publication date (yyyy-mm-dd)
      • save_file: ['y', 'n'], default 'y'
      • y - Save data as a CSV file in data folder
        n - Save data as a dataframe in-memory
    3. get_rechtspraak_metadata(...)
      • save_file: ['y', 'n'], default 'n'
      • y - Save data as a CSV file in data folder
        n - Return data as a dataframe in-memory
      • dataframe: dataframe, optional
      • Dataframe containing ECLIs to retrieve metadata. Cannot be combined with filename
      • filename: string, optional
      • CSV file containing ECLIs to retrieve metadata. Cannot be combined with dataframe
      • method: ['api', 'sqlite'], default 'api'
      • api - Fetch metadata live from the Rechtspraak API
        sqlite - Fetch metadata from a local SQLite database (requires rechtspraak-lido-sqlite)
      • sqlite_db_path: string, default 'data/lido_metadata.db'
      • Path to the SQLite database file. Only used when method='sqlite'
      • fallback_to_api: bool, default True
      • When using method='sqlite', fall back to the live API for any ECLIs not found in the database
      • multi_threading: bool, default True
      • Use multi-threading for API-based metadata extraction. Set to False for single-threaded execution
    4. fetch_eclis_via_sqlite(ecli_list, sqlite_db_path, columns)
      • ecli_list: list[str]
      • List of ECLI identifiers to look up
      • sqlite_db_path: string
      • Path to the SQLite database file produced by rechtspraak-lido-sqlite
      • columns: list[str]
      • Column names to select from the database

    Examples

    Downloading ECLIs

    import rechtspraak_extractor as rex
    
    # Get rechtspraak data as a DataFrame (100 ECLIs since 2022-08-01)
    df = rex.get_rechtspraak(max_ecli=100, sd="2022-08-01", save_file="n")
    
    # Save rechtspraak data directly to CSV in the data/ folder
    rex.get_rechtspraak(max_ecli=100, sd="2022-08-01", save_file="y")
    

    Extracting metadata via the live API (default)

    # Get metadata into a DataFrame from an existing DataFrame
    df_metadata = rex.get_rechtspraak_metadata(save_file="n", dataframe=df)
    
    # Get metadata into a DataFrame from a CSV produced by get_rechtspraak
    df_metadata = rex.get_rechtspraak_metadata(save_file="n", filename="rechtspraak.csv")
    
    # Produce metadata CSV from an in-memory DataFrame
    rex.get_rechtspraak_metadata(save_file="y", dataframe=df)
    
    # Produce metadata CSV from files already in data/ (processes all files)
    rex.get_rechtspraak_metadata(save_file="y")
    
    • filename refers to a file in the data/ folder created by get_rechtspraak.
    • df is the DataFrame returned by get_rechtspraak.

    Extracting metadata via SQLite (offline, faster)

    The SQLite method fetches metadata from a local pre-built database instead of making live API calls. This is significantly faster for large batches and works offline.

    Prerequisite: The rechtspraak-lido-sqlite package must be installed and its database must be built locally before using this method.

    pip install rechtspraak-lido-sqlite
    

    After installing, follow the rechtspraak-lido-sqlite instructions to build the local database (typically produces a file at data/lido.db or a path you configure).

    Using get_rechtspraak_metadata with method='sqlite'

    import rechtspraak_extractor as rex
    
    df = rex.get_rechtspraak(max_ecli=500, sd="2025-01-01", save_file="n")
    
    # Fetch metadata from local SQLite database
    df_metadata = rex.get_rechtspraak_metadata(
        save_file="n",
        dataframe=df,
        method="sqlite",
        sqlite_db_path="data/lido.db",   # path to the database built by rechtspraak-lido-sqlite
        fallback_to_api=True,            # fall back to live API for ECLIs not found in the DB
    )
    

    Using fetch_eclis_via_sqlite directly

    from rechtspraak_extractor.rechtspraak_metadata import fetch_eclis_via_sqlite
    
    eclis = ["ECLI:NL:HR:2023:1", "ECLI:NL:HR:2023:2"]
    
    columns = ["ecli", "document_type", "date_decision", "instance", "full_text"]
    
    df = fetch_eclis_via_sqlite(
        ecli_list=eclis,
        sqlite_db_path="data/lido.db",
        columns=columns,
    )
    

    Note: If the database file does not exist or an ECLI is not found in it, fetch_eclis_via_sqlite returns an empty DataFrame rather than raising an error. Use fallback_to_api=True in get_rechtspraak_metadata to automatically cover missing ECLIs via the live API.

    License

    License: Apache 2.0

    Previously under the MIT License, as of 28/10/2022 this work is licensed under a Apache License, Version 2.0.

    Apache License, Version 2.0
    
    Copyright (c) 2022 Maastricht Law & Tech Lab
    
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
    You may obtain a copy of the License at
        
        http://www.apache.org/licenses/LICENSE-2.0
    
    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
    

    Project details


    Release history Release notifications | RSS feed

    Download files

    Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

    Source Distribution

    rechtspraak_extractor-1.6.0.tar.gz (39.7 kB view details)

    Uploaded Source

    Built Distribution

    If you're not sure about the file name format, learn more about wheel file names.

    rechtspraak_extractor-1.6.0-py3-none-any.whl (26.7 kB view details)

    Uploaded Python 3

    File details

    Details for the file rechtspraak_extractor-1.6.0.tar.gz.

    File metadata

    • Download URL: rechtspraak_extractor-1.6.0.tar.gz
    • Upload date:
    • Size: 39.7 kB
    • Tags: Source
    • Uploaded using Trusted Publishing? Yes
    • Uploaded via: twine/6.1.0 CPython/3.13.12

    File hashes

    Hashes for rechtspraak_extractor-1.6.0.tar.gz
    Algorithm Hash digest
    SHA256 637dfa95eb631a43e5519eff6f0dc1da3cfc72a59455bf79365de47e6b828658
    MD5 eadef951346a707120690ec024624af5
    BLAKE2b-256 cb5efdf47c95513ee91f6ef1dc5ec5a34fbc76fce2d6d0edb76c961f79293978

    See more details on using hashes here.

    Provenance

    The following attestation bundles were made for rechtspraak_extractor-1.6.0.tar.gz:

    Publisher: github-actions.yml on maastrichtlawtech/rechtspraak-extractor

    Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

    File details

    Details for the file rechtspraak_extractor-1.6.0-py3-none-any.whl.

    File metadata

    File hashes

    Hashes for rechtspraak_extractor-1.6.0-py3-none-any.whl
    Algorithm Hash digest
    SHA256 1558685cff8c3463a8636a59c3e2a86fd3b8496e26eacb5bec8978034d23f54b
    MD5 bae49cda5ad345053a40a4165fd5f3a8
    BLAKE2b-256 695dea558028513965ea11ccf72a82ebde76ce8f1bc7db7e95ed1208f9722511

    See more details on using hashes here.

    Provenance

    The following attestation bundles were made for rechtspraak_extractor-1.6.0-py3-none-any.whl:

    Publisher: github-actions.yml on maastrichtlawtech/rechtspraak-extractor

    Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

    Supported by

    AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page