Skip to main content

Ingesting legal data like laws and court decisions via OLDP API

Project description

oldp-ingestor

Ingesting legal data like laws and court decisions via OLDP API.

Data sources:

CLI provider Type Source
ris laws + cases Rechtsinformationssystem des Bundes (RIS)
rii cases Rechtsprechung im Internet (RII) — federal courts
by cases Gesetze Bayern — Bavarian courts
nrw cases NRWE Rechtsprechungsdatenbank — NRW courts
ns cases NI-VORIS Niedersachsen
eu cases EUR-Lex — EU court decisions
juris-bb cases Landesrecht Berlin-Brandenburg
juris-bw cases Landesrecht Baden-Württemberg
juris-he cases Landesrecht Hessen
juris-hh cases Landesrecht Hamburg
juris-mv cases Landesrecht Mecklenburg-Vorpommern
juris-rlp cases Landesrecht Rheinland-Pfalz
juris-sa cases Landesrecht Sachsen-Anhalt
juris-sh cases Landesrecht Schleswig-Holstein
juris-sl cases Landesrecht Saarland
juris-th cases Landesrecht Thüringen
dummy laws + cases Django fixture JSON files (for testing)

Installation

pip install oldp-ingestor

Some providers require Playwright browsers. Install them after pip:

playwright install chromium

For development, clone the repo and use Make (auto-detects uv or falls back to pip):

git clone https://github.com/openlegaldata/oldp-ingestor.git
cd oldp-ingestor
make install

Configuration

Set the following environment variables (or add them to a .env file):

Variable Description
OLDP_API_URL Base URL of the OLDP instance (e.g. http://localhost:8000)
OLDP_API_TOKEN API authentication token
OLDP_API_HTTP_AUTH Optional HTTP basic auth in user:password format

Usage

Show API info

oldp-ingestor info

Ingest laws

From the RIS API (rechtsinformationen.bund.de)

# Ingest all available legislation
oldp-ingestor laws --provider ris

# Search for specific legislation
oldp-ingestor laws --provider ris --search-term "EinbTestV"

# Limit the number of law books to ingest
oldp-ingestor laws --provider ris --limit 5

# Combine search and limit
oldp-ingestor laws --provider ris --search-term "BGB" --limit 1

Incremental fetching and request pacing

# Only fetch legislation adopted since a given date
oldp-ingestor laws --provider ris --date-from 2025-12-01

# Fetch legislation within a date range
oldp-ingestor laws --provider ris --date-from 2025-01-01 --date-to 2025-06-30

# Override the default request delay (0.2s) for slower pacing
oldp-ingestor laws --provider ris --request-delay 0.5

For automated cron usage, see dev-deployment/ingest-ris.sh (laws) and dev-deployment/ingest-ris-cases.sh (cases) which track the last successful run date in a state file and pass it as --date-from on subsequent runs.

From a JSON fixture file (dummy provider)

oldp-ingestor laws --provider dummy --path /path/to/fixture.json

Ingest cases

From the RIS API (rechtsinformationen.bund.de)

# Ingest all cases from all federal courts
oldp-ingestor cases --provider ris

# Filter by court and date range
oldp-ingestor cases --provider ris --court BGH --date-from 2026-01-01

# Limit for testing
oldp-ingestor cases --provider ris --limit 10 -v

From a JSON fixture file (dummy provider)

oldp-ingestor cases --provider dummy --path /path/to/fixture.json

# Limit the number of cases to ingest
oldp-ingestor cases --provider dummy --path /path/to/fixture.json --limit 10

The fixture file should contain Django fixture entries with courts.court and cases.case models. Court foreign keys are resolved to court_name strings for the OLDP cases API.

Output sinks

By default, data is written to the OLDP REST API. Use --sink json-file to write JSON files to disk instead:

# Export laws to local files
oldp-ingestor --sink json-file --output-dir /tmp/export \
    laws --provider ris --search-term BGB --limit 1

# Export cases to local files
oldp-ingestor --sink json-file --output-dir /tmp/export \
    cases --provider ris --court BGH --limit 5

See docs/sinks.md for details on directory structure and implementing custom sinks.

Architecture

The ingestor uses a provider-based architecture. Each data source implements a provider class (LawProvider or CaseProvider), and shared RIS HTTP logic (retry, pacing, User-Agent) lives in RISBaseClient. Output is routed through a sink (ApiSink or JSONFileSink).

Provider
├── LawProvider   →  DummyLawProvider, RISProvider
└── CaseProvider  →  DummyCaseProvider, RISCaseProvider,
                     RiiCaseProvider, ByCaseProvider,
                     NrwCaseProvider, NsCaseProvider,
                     EuCaseProvider, JurisCaseProvider (10 state variants)

Sink
├── ApiSink        →  OLDP REST API (default)
└── JSONFileSink   →  local JSON files

See docs/architecture.md for the full design.

Politeness and rate limiting

The RIS API allows 600 req/min. The ingestor stays under this with:

  • Request pacing — 0.2 s delay between requests (configurable)
  • Retry with backoff — exponential backoff on 429/503, respects Retry-After
  • Descriptive User-Agentoldp-ingestor/0.1.0

See docs/politeness.md for details.

Further documentation

Provider docs

Provider Doc
RIS (laws + cases) docs/providers/de/ris.md
RII (federal courts) docs/providers/de/rii.md
Bayern docs/providers/de/by.md
NRW docs/providers/de/nrw.md
Niedersachsen docs/providers/de/ns.md
EUR-Lex (EU) docs/providers/de/eu.md
Bremen docs/providers/de/hb.md
Sachsen OVG docs/providers/de/sn_ovg.md
Sachsen ESAMOSplus docs/providers/de/sn.md
Sachsen VerfGH docs/providers/de/sn_verfgh.md
Juris (10 states) docs/providers/de/juris.md
Dummy (test/dev) docs/providers/dummy/dummy.md

Development

# Run tests
make test

# Run tests with coverage
make test-cov

# Lint
make lint

# Auto-format
make format

See CONTRIBUTING.md for the full development setup, how to add new providers, and pull request guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oldp_ingestor-0.1.3.tar.gz (157.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oldp_ingestor-0.1.3-py3-none-any.whl (64.7 kB view details)

Uploaded Python 3

File details

Details for the file oldp_ingestor-0.1.3.tar.gz.

File metadata

  • Download URL: oldp_ingestor-0.1.3.tar.gz
  • Upload date:
  • Size: 157.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for oldp_ingestor-0.1.3.tar.gz
Algorithm Hash digest
SHA256 2a0734d3696c6caafca0d84e00b430bd253af49e318e73d04bdce8879ded95e9
MD5 961a8621f8c20a0255be9484186897b7
BLAKE2b-256 300a38ce99f432e35afb7c07c68c16349dfebc8020239eb3a72a078db36e6c4d

See more details on using hashes here.

Provenance

The following attestation bundles were made for oldp_ingestor-0.1.3.tar.gz:

Publisher: publish.yml on openlegaldata/oldp-ingestor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oldp_ingestor-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: oldp_ingestor-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 64.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for oldp_ingestor-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 14d28d456344296e52f7757c6cf8de10b42c433d1e9fa8ac711e0f4072069551
MD5 42a4bdd751a62007cbb2b7deb7e2d9a0
BLAKE2b-256 ad015e49c5f2117755c537a225f7bf2a73694f707b77370dbfb375f8de845b76

See more details on using hashes here.

Provenance

The following attestation bundles were made for oldp_ingestor-0.1.3-py3-none-any.whl:

Publisher: publish.yml on openlegaldata/oldp-ingestor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page