Skip to main content

Ingesting legal data like laws and court decisions via OLDP API

Project description

oldp-ingestor

Ingesting legal data like laws and court decisions via OLDP API.

Data sources:

CLI provider Type Source
ris laws + cases Rechtsinformationssystem des Bundes (RIS)
gii laws Gesetze im Internet (gesetze-im-internet.de)
rii cases Rechtsprechung im Internet (RII) — federal courts
by cases Gesetze Bayern — Bavarian courts
nrw cases NRWE Rechtsprechungsdatenbank — NRW courts
ns cases NI-VORIS Niedersachsen
eu cases EUR-Lex — EU court decisions
juris-bb cases Landesrecht Berlin-Brandenburg
juris-bw cases Landesrecht Baden-Württemberg
juris-he cases Landesrecht Hessen
juris-hh cases Landesrecht Hamburg
juris-mv cases Landesrecht Mecklenburg-Vorpommern
juris-rlp cases Landesrecht Rheinland-Pfalz
juris-sa cases Landesrecht Sachsen-Anhalt
juris-sh cases Landesrecht Schleswig-Holstein
juris-sl cases Landesrecht Saarland
juris-th cases Landesrecht Thüringen
dummy laws + cases Django fixture JSON files (for testing)

Installation

pip install oldp-ingestor

Some providers require Playwright browsers. Install them after pip:

playwright install chromium

For development, clone the repo and use Make (auto-detects uv or falls back to pip):

git clone https://github.com/openlegaldata/oldp-ingestor.git
cd oldp-ingestor
make install

Configuration

Set the following environment variables (or add them to a .env file):

Variable Description
OLDP_API_URL Base URL of the OLDP instance (e.g. http://localhost:8000)
OLDP_API_TOKEN API authentication token
OLDP_API_HTTP_AUTH Optional HTTP basic auth in user:password format

Usage

Show API info

oldp-ingestor info

Ingest laws

From the RIS API (rechtsinformationen.bund.de)

# Ingest all available legislation
oldp-ingestor laws --provider ris

# Search for specific legislation
oldp-ingestor laws --provider ris --search-term "EinbTestV"

# Limit the number of law books to ingest
oldp-ingestor laws --provider ris --limit 5

# Combine search and limit
oldp-ingestor laws --provider ris --search-term "BGB" --limit 1

Incremental fetching and request pacing

# Only fetch legislation adopted since a given date
oldp-ingestor laws --provider ris --date-from 2025-12-01

# Fetch legislation within a date range
oldp-ingestor laws --provider ris --date-from 2025-01-01 --date-to 2025-06-30

# Override the default request delay (0.2s) for slower pacing
oldp-ingestor laws --provider ris --request-delay 0.5

From gesetze-im-internet.de (gii)

Pulls all German federal laws from the official BMJ/juris feed. See docs/providers/de/gii.md for usage, incremental-run setup, and continuous-ingestion examples.

From a JSON fixture file (dummy provider)

oldp-ingestor laws --provider dummy --path /path/to/fixture.json

Ingest cases

From the RIS API (rechtsinformationen.bund.de)

# Ingest all cases from all federal courts
oldp-ingestor cases --provider ris

# Filter by court and date range
oldp-ingestor cases --provider ris --court BGH --date-from 2026-01-01

# Limit for testing
oldp-ingestor cases --provider ris --limit 10 -v

From a JSON fixture file (dummy provider)

oldp-ingestor cases --provider dummy --path /path/to/fixture.json

# Limit the number of cases to ingest
oldp-ingestor cases --provider dummy --path /path/to/fixture.json --limit 10

The fixture file should contain Django fixture entries with courts.court and cases.case models. Court foreign keys are resolved to court_name strings for the OLDP cases API.

Output sinks

By default, data is written to the OLDP REST API. Use --sink json-file to write JSON files to disk instead:

# Export laws to local files
oldp-ingestor --sink json-file --output-dir /tmp/export \
    laws --provider ris --search-term BGB --limit 1

# Export cases to local files
oldp-ingestor --sink json-file --output-dir /tmp/export \
    cases --provider ris --court BGH --limit 5

See docs/sinks.md for details on directory structure and implementing custom sinks.

Architecture

The ingestor uses a provider-based architecture. Each data source implements a provider class (LawProvider or CaseProvider), and shared RIS HTTP logic (retry, pacing, User-Agent) lives in RISBaseClient. Output is routed through a sink (ApiSink or JSONFileSink).

Provider
├── LawProvider   →  DummyLawProvider, RISProvider
└── CaseProvider  →  DummyCaseProvider, RISCaseProvider,
                     RiiCaseProvider, ByCaseProvider,
                     NrwCaseProvider, NsCaseProvider,
                     EuCaseProvider, JurisCaseProvider (10 state variants)

Sink
├── ApiSink        →  OLDP REST API (default)
└── JSONFileSink   →  local JSON files

See docs/architecture.md for the full design.

Politeness and rate limiting

The RIS API allows 600 req/min. The ingestor stays under this with:

  • Request pacing — 0.2 s delay between requests (configurable)
  • Retry with backoff — exponential backoff on 429/503, respects Retry-After
  • Descriptive User-Agentoldp-ingestor/0.1.0

See docs/politeness.md for details.

Further documentation

Provider docs

Detailed documentation about each provider can be found in docs/providers/.

Provider Doc
RIS (laws + cases) docs/providers/de/ris.md
GII (laws) docs/providers/de/gii.md
RII (federal courts) docs/providers/de/rii.md
Bayern docs/providers/de/by.md
NRW docs/providers/de/nrw.md
Niedersachsen docs/providers/de/ns.md
EUR-Lex (EU) docs/providers/de/eu.md
Bremen docs/providers/de/hb.md
Sachsen OVG docs/providers/de/sn_ovg.md
Sachsen ESAMOSplus docs/providers/de/sn.md
Sachsen VerfGH docs/providers/de/sn_verfgh.md
Juris (10 states) docs/providers/de/juris.md
Dummy (test/dev) docs/providers/dummy/dummy.md

Development

# Run tests
make test

# Run tests with coverage
make test-cov

# Lint
make lint

# Auto-format
make format

See CONTRIBUTING.md for the full development setup, how to add new providers, and pull request guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oldp_ingestor-0.1.5.tar.gz (304.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oldp_ingestor-0.1.5-py3-none-any.whl (108.7 kB view details)

Uploaded Python 3

File details

Details for the file oldp_ingestor-0.1.5.tar.gz.

File metadata

  • Download URL: oldp_ingestor-0.1.5.tar.gz
  • Upload date:
  • Size: 304.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oldp_ingestor-0.1.5.tar.gz
Algorithm Hash digest
SHA256 8f37c8ba90298eceaa406d5d9c63652b572815687344d80f4d6a699aa438edaa
MD5 157a2dbc3079e4e82c238a48b65232d6
BLAKE2b-256 fdc761faa4647c5354dcdd23323ce9949ef0ddb3dca1aeea2cc48e6830045c97

See more details on using hashes here.

Provenance

The following attestation bundles were made for oldp_ingestor-0.1.5.tar.gz:

Publisher: publish.yml on openlegaldata/oldp-ingestor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oldp_ingestor-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: oldp_ingestor-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 108.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oldp_ingestor-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 c955193f449f277e77c0dba8db22045d550702e15040e7c85c69d0b2b5c35925
MD5 7ade672927e4549050269e959bfab3e4
BLAKE2b-256 212de3cbf041cadaeb521f11e192830ec211515ea5a33d7f1001e160e2053790

See more details on using hashes here.

Provenance

The following attestation bundles were made for oldp_ingestor-0.1.5-py3-none-any.whl:

Publisher: publish.yml on openlegaldata/oldp-ingestor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page