Skip to main content

convenient scraping of german court decisions

Project description

Copyright notice: Automated retrieval of decisions from federal and state databases is permitted for non-commercial purposes only. Since gesp accesses these databases, the use of gesp is also permitted for non-commercial purposes only.

gesp: convenient scraping of German court decisions

The federal and state governments in Germany make court decisions available for download on individual online platforms. In addition to the lack of uniformity, these platforms only allow individual retrieval out of the box. With gesp, decisions can be downloaded in large quantities in a filter-based and reproducible manner.

A. Installation

Download & build the package:

git clone https://github.com/niklaswais/gesp && cd gesp && python -m build

Install the local .tar.gz:

python -m pip install dist/gesp-0.1.tar.gz

B. Basic Usage

A call without command-line arguments will result in the retrival of all machine-readable (= non-PDF) court decisions. If only a subset is to be downloaded, the arguments "-s" (followed by abbreviations of states) and "-t" (followed by abbreviations of court types) can be used. Multiple states or court types are separated by commas.

python -m gesp -s bund,by,hh,nw -c bgh,ag,lg,olg

Since Saxony and Bremen provide court decisions only as PDF files, they are excluded when gesp is run without flags. An explicit call nevertheless makes the corresponding files available (-s sn,hb).

A specific path under which the decisions are to be stored can be specified with the argument "-p". If the folder has not been created yet, gesp will take care of that. If the folder has already been created and contains the results of a previous execution, this will cause an update of the dataset.

python -m gesp -p path/to/folder

An existing fingerprint (see C.) can be used to reconstruct a dataset. To do so, the path to the fingerprint file must be passed as an argument using "-fp". Naturally, "-c" and "-s" arguments are not allowed in this case.

python -m gesp -fp /path/to/fingerprint

C. Results

If no specific path is passed with "-p", gesp will create a folder for the results in the current working directory ("results/"). The name of the subfolder is based on the date and time of execution to avoid conflicts in subsequent runs. Decisions that are available as html/xhtml files are preferentially downloaded as such. However, some federal states unfortunately provide decisions only as pdf files. The editable documents are minimally cleaned up (e.g., print dialogs and navigation menus are removed), but not pre-processed, unless "-pp" is used (see E.).

D. Reproducibility

To create a fingerprint file alongside the downloaded court decisions in the results folder ("fp.xz"), set the "-fp" flag. If you want to reconstruct the dataset of a previous run, e.g. because you are working on multiple machines or in a team, simply share the fingerprint file. Using the fingerprint file by means of "-fp" as an argument will result in the assembly of an identical collection.

The fingerprinting feature of gesp can also be used to meet good scientific practice standards without the need to provide large collections of data. Since it is part of good scientific practice to disclose the data basis of the results obtained, publications on the empirical study of court decisions must be accompanied by relatively large data sets. Instead of making the entire collection of decisions available for retrieval online, simply share the fingerprint file that others may use to retrieve your data.

E. Pre-Processing

The use of "-pp" activates pre-processing. A separate subfolder is created in the "results" folder for the subsequent outputs.

F. Delayed Retrieval

You can use the argument "-w" to add a delay between two consecutive downloads of decisions from the same source. This reduces the server load for the provider of the decisions and can prevent bans.

G. Appendix

1. Abbreviations for "-s" (federal/states)

Name Abbreviation
Federal bund
Baden-Württemberg bw
Bavaria by
Berlin be
Brandenburg bb
Bremen hb
Hamburg hh
Hesse he
Mecklenburg-Vorpommern mv
Lower Saxony ni
North Rhine-Westphalia nw
Rhineland-Palatinate rp
Saarland sl
Saxony sn
Saxony-Anhalt st
Schleswig-Holstein sh
Thuringia th

2. Abbreviations for "-c" (court types)

Name Abbreviation
Amtsgerichte ag
Arbeitsgerichte arbg
Bundesgerichtshof bgh
Bundesfinanzhof bfh
Bundesverwaltungsgericht bverwg
Bundesverfassungsgericht bverfg
Bundespatentgericht bpatg
Bundesarbeitsgericht bag
Bundessozialgericht bsg
Finanzgerichte fg
Landesarbeitsgerichte lag
Landgerichte lg
Landessozialgerichte lsg
Landesverfassungsgerichte verfgh
Oberlandesgerichte (incl. KG, BayObLG) olg
Oberverwaltungsgerichte (incl. vgh) ovg
Sozialgerichte sg

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gesp-0.1.tar.gz (45.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gesp-0.1-py3-none-any.whl (66.5 kB view details)

Uploaded Python 3

File details

Details for the file gesp-0.1.tar.gz.

File metadata

  • Download URL: gesp-0.1.tar.gz
  • Upload date:
  • Size: 45.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for gesp-0.1.tar.gz
Algorithm Hash digest
SHA256 958e2ad0ff24d141a6f12e04a048b73cf346dd39e41d64e22bfe61d0bf429f69
MD5 5ddac9e10c945724edd48b7211b0696d
BLAKE2b-256 d3b7806657c02bab3755a023ad32d160522b2477b60fa7a2f30999331a71d674

See more details on using hashes here.

File details

Details for the file gesp-0.1-py3-none-any.whl.

File metadata

  • Download URL: gesp-0.1-py3-none-any.whl
  • Upload date:
  • Size: 66.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for gesp-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4b6990614cba66c0b429b4aab9856f7f0a13f7c69d9317a582423d337df93ca3
MD5 f380a7e4062c5088dd21d002dc66b895
BLAKE2b-256 ff21d208e56b36458e9ead3b8c28a28a6970b1245606f9dd415f1b4a4e24e041

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page