Skip to main content

Ingesting legal data like laws and court decisions via OLDP API

Project description

oldp-ingestor

Ingesting legal data like laws and court decisions via OLDP API.

Data sources:

CLI provider Type Source
ris laws + cases Rechtsinformationssystem des Bundes (RIS)
gii laws Gesetze im Internet (gesetze-im-internet.de)
rii cases Rechtsprechung im Internet (RII) — federal courts
by cases Gesetze Bayern — Bavarian courts
nrw cases NRWE Rechtsprechungsdatenbank — NRW courts
ns cases NI-VORIS Niedersachsen
eu cases EUR-Lex — EU court decisions
juris-bb cases Landesrecht Berlin-Brandenburg
juris-bw cases Landesrecht Baden-Württemberg
juris-he cases Landesrecht Hessen
juris-hh cases Landesrecht Hamburg
juris-mv cases Landesrecht Mecklenburg-Vorpommern
juris-rlp cases Landesrecht Rheinland-Pfalz
juris-sa cases Landesrecht Sachsen-Anhalt
juris-sh cases Landesrecht Schleswig-Holstein
juris-sl cases Landesrecht Saarland
juris-th cases Landesrecht Thüringen
dummy laws + cases Django fixture JSON files (for testing)

Installation

pip install oldp-ingestor

Some providers require Playwright browsers. Install them after pip:

playwright install chromium

For development, clone the repo and use Make (auto-detects uv or falls back to pip):

git clone https://github.com/openlegaldata/oldp-ingestor.git
cd oldp-ingestor
make install

Configuration

Set the following environment variables (or add them to a .env file):

Variable Description
OLDP_API_URL Base URL of the OLDP instance (e.g. http://localhost:8000)
OLDP_API_TOKEN API authentication token
OLDP_API_HTTP_AUTH Optional HTTP basic auth in user:password format

Usage

Show API info

oldp-ingestor info

Ingest laws

From the RIS API (rechtsinformationen.bund.de)

# Ingest all available legislation
oldp-ingestor laws --provider ris

# Search for specific legislation
oldp-ingestor laws --provider ris --search-term "EinbTestV"

# Limit the number of law books to ingest
oldp-ingestor laws --provider ris --limit 5

# Combine search and limit
oldp-ingestor laws --provider ris --search-term "BGB" --limit 1

Incremental fetching and request pacing

# Only fetch legislation adopted since a given date
oldp-ingestor laws --provider ris --date-from 2025-12-01

# Fetch legislation within a date range
oldp-ingestor laws --provider ris --date-from 2025-01-01 --date-to 2025-06-30

# Override the default request delay (0.2s) for slower pacing
oldp-ingestor laws --provider ris --request-delay 0.5

From gesetze-im-internet.de (gii)

Pulls all German federal laws from the official BMJ/juris feed. See docs/providers/de/gii.md for usage, incremental-run setup, and continuous-ingestion examples.

From a JSON fixture file (dummy provider)

oldp-ingestor laws --provider dummy --path /path/to/fixture.json

Ingest cases

From the RIS API (rechtsinformationen.bund.de)

# Ingest all cases from all federal courts
oldp-ingestor cases --provider ris

# Filter by court and date range
oldp-ingestor cases --provider ris --court BGH --date-from 2026-01-01

# Limit for testing
oldp-ingestor cases --provider ris --limit 10 -v

From a JSON fixture file (dummy provider)

oldp-ingestor cases --provider dummy --path /path/to/fixture.json

# Limit the number of cases to ingest
oldp-ingestor cases --provider dummy --path /path/to/fixture.json --limit 10

The fixture file should contain Django fixture entries with courts.court and cases.case models. Court foreign keys are resolved to court_name strings for the OLDP cases API.

Output sinks

By default, data is written to the OLDP REST API. Use --sink json-file to write JSON files to disk instead:

# Export laws to local files
oldp-ingestor --sink json-file --output-dir /tmp/export \
    laws --provider ris --search-term BGB --limit 1

# Export cases to local files
oldp-ingestor --sink json-file --output-dir /tmp/export \
    cases --provider ris --court BGH --limit 5

See docs/sinks.md for details on directory structure and implementing custom sinks.

Architecture

The ingestor uses a provider-based architecture. Each data source implements a provider class (LawProvider or CaseProvider), and shared RIS HTTP logic (retry, pacing, User-Agent) lives in RISBaseClient. Output is routed through a sink (ApiSink or JSONFileSink).

Provider
├── LawProvider   →  DummyLawProvider, RISProvider
└── CaseProvider  →  DummyCaseProvider, RISCaseProvider,
                     RiiCaseProvider, ByCaseProvider,
                     NrwCaseProvider, NsCaseProvider,
                     EuCaseProvider, JurisCaseProvider (10 state variants)

Sink
├── ApiSink        →  OLDP REST API (default)
└── JSONFileSink   →  local JSON files

See docs/architecture.md for the full design.

Politeness and rate limiting

The RIS API allows 600 req/min. The ingestor stays under this with:

  • Request pacing — 0.2 s delay between requests (configurable)
  • Retry with backoff — exponential backoff on 429/503, respects Retry-After
  • Descriptive User-Agentoldp-ingestor/0.1.0

See docs/politeness.md for details.

Further documentation

Provider docs

Detailed documentation about each provider can be found in docs/providers/.

Provider Doc
RIS (laws + cases) docs/providers/de/ris.md
GII (laws) docs/providers/de/gii.md
RII (federal courts) docs/providers/de/rii.md
Bayern docs/providers/de/by.md
NRW docs/providers/de/nrw.md
Niedersachsen docs/providers/de/ns.md
EUR-Lex (EU) docs/providers/de/eu.md
Bremen docs/providers/de/hb.md
Sachsen OVG docs/providers/de/sn_ovg.md
Sachsen ESAMOSplus docs/providers/de/sn.md
Sachsen VerfGH docs/providers/de/sn_verfgh.md
Juris (10 states) docs/providers/de/juris.md
Dummy (test/dev) docs/providers/dummy/dummy.md

Development

# Run tests
make test

# Run tests with coverage
make test-cov

# Lint
make lint

# Auto-format
make format

See CONTRIBUTING.md for the full development setup, how to add new providers, and pull request guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oldp_ingestor-0.1.4.tar.gz (285.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oldp_ingestor-0.1.4-py3-none-any.whl (98.2 kB view details)

Uploaded Python 3

File details

Details for the file oldp_ingestor-0.1.4.tar.gz.

File metadata

  • Download URL: oldp_ingestor-0.1.4.tar.gz
  • Upload date:
  • Size: 285.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oldp_ingestor-0.1.4.tar.gz
Algorithm Hash digest
SHA256 fae15d737ff444461058738bad4362f3f73c843a14151bcf2a04705d6bd2903e
MD5 59bd10da1e39d5d64cee33237d1a3539
BLAKE2b-256 7b9fe8d61a6c97f86bfb5d97848501f82f0989372b2c812fa04aae7cbea4f31d

See more details on using hashes here.

Provenance

The following attestation bundles were made for oldp_ingestor-0.1.4.tar.gz:

Publisher: publish.yml on openlegaldata/oldp-ingestor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oldp_ingestor-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: oldp_ingestor-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 98.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oldp_ingestor-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d80bf6936fba560fc2c11c170cb894cf5cfb57eac3e598208818de8fc89e214d
MD5 41f720457cc46ae505581eb526b17622
BLAKE2b-256 7196859cece85b7a10bc9b85a7538f664400f507ca6c8f6b47b46705de77a1a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for oldp_ingestor-0.1.4-py3-none-any.whl:

Publisher: publish.yml on openlegaldata/oldp-ingestor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page