Skip to main content

Ingesting legal data like laws and court decisions via OLDP API

Project description

oldp-ingestor

Ingesting legal data like laws and court decisions via OLDP API.

Data sources:

CLI provider Type Source
ris laws + cases Rechtsinformationssystem des Bundes (RIS)
gii laws Gesetze im Internet (gesetze-im-internet.de)
rii cases Rechtsprechung im Internet (RII) — federal courts
by cases Gesetze Bayern — Bavarian courts
nrw cases NRWE Rechtsprechungsdatenbank — NRW courts
ns cases NI-VORIS Niedersachsen
eu cases EUR-Lex — EU court decisions
juris-bb cases Landesrecht Berlin-Brandenburg
juris-bw cases Landesrecht Baden-Württemberg
juris-he cases Landesrecht Hessen
juris-hh cases Landesrecht Hamburg
juris-mv cases Landesrecht Mecklenburg-Vorpommern
juris-rlp cases Landesrecht Rheinland-Pfalz
juris-sa cases Landesrecht Sachsen-Anhalt
juris-sh cases Landesrecht Schleswig-Holstein
juris-sl cases Landesrecht Saarland
juris-th cases Landesrecht Thüringen
dummy laws + cases Django fixture JSON files (for testing)

Installation

pip install oldp-ingestor

Some providers require Playwright browsers. Install them after pip:

playwright install chromium

For development, clone the repo and use Make (auto-detects uv or falls back to pip):

git clone https://github.com/openlegaldata/oldp-ingestor.git
cd oldp-ingestor
make install

Configuration

Set the following environment variables (or add them to a .env file):

Variable Description
OLDP_API_URL Base URL of the OLDP instance (e.g. http://localhost:8000)
OLDP_API_TOKEN API authentication token
OLDP_API_HTTP_AUTH Optional HTTP basic auth in user:password format

Usage

Show API info

oldp-ingestor info

Ingest laws

From the RIS API (rechtsinformationen.bund.de)

# Ingest all available legislation
oldp-ingestor laws --provider ris

# Search for specific legislation
oldp-ingestor laws --provider ris --search-term "EinbTestV"

# Limit the number of law books to ingest
oldp-ingestor laws --provider ris --limit 5

# Combine search and limit
oldp-ingestor laws --provider ris --search-term "BGB" --limit 1

Incremental fetching and request pacing

# Only fetch legislation adopted since a given date
oldp-ingestor laws --provider ris --date-from 2025-12-01

# Fetch legislation within a date range
oldp-ingestor laws --provider ris --date-from 2025-01-01 --date-to 2025-06-30

# Override the default request delay (0.2s) for slower pacing
oldp-ingestor laws --provider ris --request-delay 0.5

From gesetze-im-internet.de (gii)

Pulls all German federal laws from the official BMJ/juris feed. See docs/providers/de/gii.md for usage, incremental-run setup, and continuous-ingestion examples.

From a JSON fixture file (dummy provider)

oldp-ingestor laws --provider dummy --path /path/to/fixture.json

Ingest cases

From the RIS API (rechtsinformationen.bund.de)

# Ingest all cases from all federal courts
oldp-ingestor cases --provider ris

# Filter by court and date range
oldp-ingestor cases --provider ris --court BGH --date-from 2026-01-01

# Limit for testing
oldp-ingestor cases --provider ris --limit 10 -v

From a JSON fixture file (dummy provider)

oldp-ingestor cases --provider dummy --path /path/to/fixture.json

# Limit the number of cases to ingest
oldp-ingestor cases --provider dummy --path /path/to/fixture.json --limit 10

The fixture file should contain Django fixture entries with courts.court and cases.case models. Court foreign keys are resolved to court_name strings for the OLDP cases API.

Targeted citation-based lookup

When OLDP can't resolve an extracted citation, an AI agent can use the lookup subcommand group to fetch the specific decision from the right upstream portal. Each call is a single upstream request returning JSON with a three-status contract (ok / not_found / error).

# What lookup-capable providers exist + which courts they cover?
oldp-ingestor lookup providers

# Search RIS by ECLI
oldp-ingestor lookup search --provider ris \
    --ecli "ECLI:DE:BGH:2026:020626BVIAZR482.23.0"

# Search a juris state portal by Aktenzeichen
oldp-ingestor lookup search --provider juris-rlp --file-number "5 T 16/26"

# Fetch the full case (no OLDP write)
oldp-ingestor lookup fetch --provider ris --doc-id KORE615402026

# Fetch + POST to OLDP (idempotent: 409 → already_exists)
oldp-ingestor lookup ingest --provider ris --doc-id KORE615402026

See docs/lookup.md for the full agent loop, JSON schema, and per-provider capability matrix.

Output sinks

By default, data is written to the OLDP REST API. Use --sink json-file to write JSON files to disk instead:

# Export laws to local files
oldp-ingestor --sink json-file --output-dir /tmp/export \
    laws --provider ris --search-term BGB --limit 1

# Export cases to local files
oldp-ingestor --sink json-file --output-dir /tmp/export \
    cases --provider ris --court BGH --limit 5

See docs/sinks.md for details on directory structure and implementing custom sinks.

Architecture

The ingestor uses a provider-based architecture. Each data source implements a provider class (LawProvider or CaseProvider), and shared RIS HTTP logic (retry, pacing, User-Agent) lives in RISBaseClient. Output is routed through a sink (ApiSink or JSONFileSink).

Provider
├── LawProvider   →  DummyLawProvider, RISProvider
└── CaseProvider  →  DummyCaseProvider, RISCaseProvider,
                     RiiCaseProvider, ByCaseProvider,
                     NrwCaseProvider, NsCaseProvider,
                     EuCaseProvider, JurisCaseProvider (10 state variants)

Sink
├── ApiSink        →  OLDP REST API (default)
└── JSONFileSink   →  local JSON files

See docs/architecture.md for the full design.

Politeness and rate limiting

The RIS API allows 600 req/min. The ingestor stays under this with:

  • Request pacing — 0.2 s delay between requests (configurable)
  • Retry with backoff — exponential backoff on 429/503, respects Retry-After
  • Descriptive User-Agentoldp-ingestor/0.1.0

See docs/politeness.md for details.

Further documentation

Provider docs

Detailed documentation about each provider can be found in docs/providers/.

Provider Doc
RIS (laws + cases) docs/providers/de/ris.md
GII (laws) docs/providers/de/gii.md
RII (federal courts) docs/providers/de/rii.md
Bayern docs/providers/de/by.md
NRW docs/providers/de/nrw.md
Niedersachsen docs/providers/de/ns.md
EUR-Lex (EU) docs/providers/de/eu.md
Bremen docs/providers/de/hb.md
Sachsen OVG docs/providers/de/sn_ovg.md
Sachsen ESAMOSplus docs/providers/de/sn.md
Sachsen VerfGH docs/providers/de/sn_verfgh.md
Juris (10 states) docs/providers/de/juris.md
Dummy (test/dev) docs/providers/dummy/dummy.md

Development

# Run tests
make test

# Run tests with coverage
make test-cov

# Lint
make lint

# Auto-format
make format

See CONTRIBUTING.md for the full development setup, how to add new providers, and pull request guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oldp_ingestor-0.1.6.tar.gz (342.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oldp_ingestor-0.1.6-py3-none-any.whl (127.8 kB view details)

Uploaded Python 3

File details

Details for the file oldp_ingestor-0.1.6.tar.gz.

File metadata

  • Download URL: oldp_ingestor-0.1.6.tar.gz
  • Upload date:
  • Size: 342.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oldp_ingestor-0.1.6.tar.gz
Algorithm Hash digest
SHA256 63c376410dbd68f6469097a007170e31b00a074f91539d8c2005fc06e049fd6e
MD5 a15f5b6368dd7821666c15b985ef30a8
BLAKE2b-256 9334bc4f7eabbd92a25d6dda8f6df643af8aba507e09ff06053b8dfe50f0cc72

See more details on using hashes here.

Provenance

The following attestation bundles were made for oldp_ingestor-0.1.6.tar.gz:

Publisher: publish.yml on openlegaldata/oldp-ingestor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oldp_ingestor-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: oldp_ingestor-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 127.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oldp_ingestor-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 62e73588db2e76555c5cdcb9c9d2d71e7d511056b8a0469d50b6496976c5641f
MD5 a2b3f0f082f15ab63a49f2e749477b8d
BLAKE2b-256 16610b05f189f00f76a181efcce32a772af611c0f3f968aca6541a7285fae115

See more details on using hashes here.

Provenance

The following attestation bundles were made for oldp_ingestor-0.1.6-py3-none-any.whl:

Publisher: publish.yml on openlegaldata/oldp-ingestor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page