Structured parsing, normalisation, and resolution of German federal law references

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

harshildarji

These details have not been verified by PyPI

Project links

Project description

bundesrecht

Python package for parsing, normalising, and resolving German federal law references.

Zero dependencies. Pure Python 3.10+.

Installation
Quick start
Simplified architecture
Parsing references
Data model
Normalising references
What the normaliser handles
Resolving references
Corpus cache
QueryResult
LawData
Resolved depth reference
Complete example
Citation

Installation

pip install bundesrecht

Quick start

The library is split into two parts:

parse_reference() and normalise() work completely offline with zero dependency on any dataset.

To use the resolver, create a Bundesrecht instance. This corpus-backed entry point needs the structured corpus, which is downloaded automatically on first use.

You do not need the dataset if you only want to parse or normalise references.

from bundesrecht import parse_reference, normalise, Bundesrecht

# No corpus needed.
ref = parse_reference('§ 312 i.V.m. § 355 BGB')
refs = normalise('§ 2 Abs. 1 Nr. 1, Nr. 7, Abs. 2 UrhG')

# Only create Bundesrecht() when you need the resolver.
lib = Bundesrecht()
results = lib.query('§ 433 Abs. 1 BGB')

Bundesrecht() loads the corpus version pinned to the installed package. It downloads the corpus on first use, then reuses the local cache.

Simplified architecture

The library is built in three layers. The parser is the foundational brick, identifying the structure of any German citation string. The normaliser builds on the parser to handle expansion and produce canonical strings. The resolver builds on both to look up actual statutory text from the corpus.

All three layers are exposed as public APIs. Use parse_reference() when you only need structured extraction. Use normalise() when you need canonical strings without corpus lookup. Use query() when you need the actual statutory text.

Simplified architecture of the bundesrecht library

Parsing references

Parses a raw citation string into a structured LawReference object without resolving it against any law data.

from bundesrecht import parse_reference

ref = parse_reference('§ 2 Abs. 1 Nr. 1 UrhG')

ref.law                   # → 'UrhG'
ref.paragraphs            # → [ParagraphRef(...)]
str(ref)                  # → '§ 2 Abs. 1 Nr. 1 UrhG'

para = ref.paragraphs[0]
para.paragraph            # → '2'
para.sub_refs             # → [SubReference(Abs, '1'), SubReference(Nr, '1')]
str(para.sub_refs[0])     # → 'Abs. 1'
str(para.sub_refs[1])     # → 'Nr. 1'

Data model

Three dataclasses represent a parsed reference at increasing levels of specificity. These objects are returned by parse_reference() and are also exposed through QueryResult.reference.

LawReference

@dataclass
class LawReference:
    paragraphs: list[ParagraphRef]   # one or more paragraphs
    law: str | None                  # e.g. 'BGB', 'UrhG'
    raw: str                         # original input string

ParagraphRef

@dataclass
class ParagraphRef:
    paragraph: str                   # '312', '312a', '1'
    sub_refs: list[SubReference]     # Abs, Satz, Nr, Buchst, etc.
    range_end: str | None            # set for '§ 312 bis 314'
    is_ff: bool                      # § 312 ff.
    is_f: bool                       # § 312 f.
    ivm_refs: list[SubReference]     # sub-refs after 'iVm' within a paragraph

SubReference

@dataclass
class SubReference:
    level: str      # 'Abs', 'Satz', 'Nr', 'Buchst', 'Alt', 'Halbsatz'
    number: str     # '1', '2', 'a', '1a'
    range_end: str  # set for 'Abs. 2 bis 4'

String representations:

level	example output
Abs	`Abs. 2`
Satz	`Satz 1`
Nr	`Nr. 3`
Buchst	`Buchst. a`
Alt	`Alt. 1`
Halbsatz	`Halbsatz 2`

Normalising references

Available directly without loading any law data.

from bundesrecht import normalise

normalise('§ 312 i.V.m. § 355 BGB')
# → ['§ 312 BGB', '§ 355 BGB']

normalise('§§ 12-15 BGB')
# → ['§ 12 BGB', '§ 13 BGB', '§ 14 BGB', '§ 15 BGB']

normalise('§ 2 Abs. 1 Nr. 1, Nr. 7, Abs. 2 UrhG')
# → ['§ 2 Abs. 1 Nr. 1 UrhG', '§ 2 Abs. 1 Nr. 7 UrhG', '§ 2 Abs. 2 UrhG']

normalise('§§ 137 S. 2, 398, 903 BGB')
# → ['§ 137 Satz 2 BGB', '§ 398 BGB', '§ 903 BGB']

normalise('§§ 46 Abs. 2 ArbGG, 91 Abs. 1 ZPO')
# → ['§ 46 Abs. 2 ArbGG', '§ 91 Abs. 1 ZPO']

# iVm variants - all recognised
normalise('§ 1 iVm § 2 BGB')
normalise('§ 1 i.V.m. § 2 BGB')
normalise('§ 1 i. V. m. § 2 BGB')
# → ['§ 1 BGB', '§ 2 BGB']  in all cases

# S. expands to Satz
normalise('§ 1 S. 2 BGB')
# → ['§ 1 Satz 2 BGB']

# f. always expands to exactly 2 paragraphs
normalise('§ 312 f. BGB')
# → ['§ 312 BGB', '§ 313 BGB']

# ff. is preserved by default - pass ff_expansion to expand
normalise('§ 312 ff. BGB')
# → ['§ 312 ff. BGB']

normalise('§ 312 ff. BGB', ff_expansion=3)
# → ['§ 312 BGB', '§ 313 BGB', '§ 314 BGB']

normalise('§ 312 ff. BGB', ff_expansion=5)
# → ['§ 312 BGB', '§ 313 BGB', '§ 314 BGB', '§ 315 BGB', '§ 316 BGB']

What the normaliser handles

Input form	Output
`§ 312 i.V.m. § 355 BGB`	`['§ 312 BGB', '§ 355 BGB']`
`§ 312 iVm § 355 BGB`	`['§ 312 BGB', '§ 355 BGB']`
`§§ 12-15 BGB`	`['§ 12 BGB', ..., '§ 15 BGB']`
`§§ 12 bis 15 BGB`	same
`§§ 137 S. 2, 398 BGB`	`['§ 137 Satz 2 BGB', '§ 398 BGB']`
`§§ 46 Abs. 2 ArbGG, 91 ZPO`	`['§ 46 Abs. 2 ArbGG', '§ 91 ZPO']`
`§ 2 Abs. 1 Nr. 1, Nr. 7, Abs. 2`	three separate canonical refs
`§ 1 S. 2 BGB`	`['§ 1 Satz 2 BGB']`
`§ 312 f. BGB`	`['§ 312 BGB', '§ 313 BGB']`
`§ 312 ff. BGB`	`['§ 312 ff. BGB']` (preserved by default)
`§ 312 ff. BGB` (ff_expansion=3)	`['§ 312 BGB', '§ 313 BGB', '§ 314 BGB']`
`§312 BGB` (no space)	`['§ 312 BGB']`

Ranges with letter suffixes (§§ 12a-12c) are left unchanged because intermediate values are not predictable.

Resolving references

Bundesrecht is the dataset-backed entry point for resolving references. Load once, query as many times as you like.

from bundesrecht import Bundesrecht

lib = Bundesrecht()

By default, Bundesrecht() uses the corpus version pinned to the installed package. It loads the compatible cached corpus if present, or downloads the matching public gesetze.jsonl from Hugging Face on first use.

For offline or reproducible work with an explicit corpus file:

lib = Bundesrecht(local_path='data/gesetze.jsonl')

lib.query(raw)

Normalises a raw citation string and resolves each canonical reference. Returns list[QueryResult].

# Simple paragraph
results = lib.query('§ 242 BGB')

# Paragraph + Absatz
results = lib.query('§ 433 Abs. 1 BGB')

# Paragraph + Absatz + Nummer
results = lib.query('§ 2 Abs. 1 Nr. 1 UrhG')

# Multi-target: expands into 3 separate results
results = lib.query('§ 2 Abs. 1 Nr. 1, Nr. 7, Abs. 2 UrhG')
# → QueryResult for § 2 Abs. 1 Nr. 1 UrhG
# → QueryResult for § 2 Abs. 1 Nr. 7 UrhG
# → QueryResult for § 2 Abs. 2 UrhG

# i.V.m.: expands into 2 separate results
results = lib.query('§ 312 i.V.m. § 355 BGB')
# → QueryResult for § 312 BGB
# → QueryResult for § 355 BGB

# §§ range: expands into one result per paragraph
results = lib.query('§§ 12-15 BGB')
# → § 12, § 13, § 14, § 15

# §§ with separate laws per chunk
results = lib.query('§§ 46 Abs. 2 ArbGG, 91 Abs. 1 ZPO')
# → § 46 Abs. 2 ArbGG
# → § 91 Abs. 1 ZPO

# Satz reference
results = lib.query('§ 1 Satz 2 BGB')

# Buchstabe reference
results = lib.query('§ 2 Abs. 1 Nr. 1 Buchst. a UrhG')

lib.query_canonical(canonical)

Skips normalisation and resolves a pre-cleaned reference directly. Use this when you have already normalised the string yourself.

results = lib.query_canonical('§ 2 Abs. 1 Nr. 1 UrhG')

lib.normalise(raw)

Normalises a citation string without resolving it. Returns list[str] of canonical strings.

lib.normalise('§ 312 i.V.m. § 355 BGB')
# → ['§ 312 BGB', '§ 355 BGB']

lib.normalise('§§ 12-15 BGB')
# → ['§ 12 BGB', '§ 13 BGB', '§ 14 BGB', '§ 15 BGB']

lib.normalise('§ 2 Abs. 1 Nr. 1, Nr. 7, Abs. 2 UrhG')
# → ['§ 2 Abs. 1 Nr. 1 UrhG', '§ 2 Abs. 1 Nr. 7 UrhG', '§ 2 Abs. 2 UrhG']

lib.get_law(abbreviation)

Returns a LawData object for a law by its abbreviation. Case-insensitive. Returns None if not found.

bgb = lib.get_law('BGB')
bgb = lib.get_law('bgb')   # same result

lib.available_laws

Sorted list of all law abbreviations currently loaded.

lib.available_laws[:5]
# → ['1-DM-GOLDMÜNZG', '1. BESVNG', '1. BIMSCHV', '1. BMELDDÜV', '1. DV LUFTBO']

lib.law_count

Number of distinct laws loaded.

lib.law_count   # → 6873

Corpus cache

The PyPI package ships code only. It does not bundle the full corpus and does not download data during installation.

On first Bundesrecht() use, the package checks a commit-keyed cache:

~/.cache/bundesrecht/<pinned-data-commit>/gesetze.jsonl

If the compatible file is missing, it downloads the exact Hugging Face dataset commit pinned by this package version and validates the JSONL structure before loading it. Later calls reuse the cached file.

To choose a different cache root, set:

export BUNDESRECHT_CACHE_DIR=/path/to/cache

To avoid network access entirely, pass a local file:

lib = Bundesrecht(local_path='data/gesetze.jsonl')

Local files are validated before loading. If a local file does not match the expected corpus shape, use Bundesrecht() to load the package-managed corpus.

QueryResult

Returned by query() and query_canonical(). One object per resolved reference.

r = lib.query('§ 433 Abs. 1 BGB')[0]

r.full_text()

Returns the text at the resolved depth - Satz text if a Satz was resolved, Nummer text if a Nummer was resolved, Absatz text if an Absatz was resolved, or the full section content if only the paragraph was found.

r.full_text()
# → 'Durch den Kaufvertrag wird der Verkäufer einer Sache verpflichtet...'

r.titel()

Returns the section heading (Überschrift), if one exists.

r.titel()
# → 'Vertragstypische Pflichten beim Kaufvertrag'

r.resolved_depth

String indicating how deeply the reference was resolved. One of: 'section', 'absatz', 'satz', 'nummer', 'buchstabe', 'unterbuchstabe'.

r.resolved_depth   # → 'absatz'  (Absatz found, but no Nummer requested)

r.resolution_note

Human-readable explanation when the requested depth was not fully resolved. Empty string when resolution was complete.

r.resolution_note
# → ''  (fully resolved)
# → 'Buchstabe c not found in Nr. 1'  (partial resolution)

r.reference

The parsed LawReference object for this result.

r.reference.law           # → 'BGB'
r.reference.paragraphs    # → [ParagraphRef(paragraph='433', ...)]
str(r.reference)          # → '§ 433 Abs. 1 BGB'

r.law_data

The LawData object for the parent statute.

r.law_data.jurabk                            # → 'BGB'
r.law_data.gesetze_id                        # → 'BGB::BJNR001950896'
r.law_data.metadaten.get('langtitel')         # → 'Bürgerliches Gesetzbuch'
r.law_data.metadaten.get('ausfertigung_datum') # → '1896-08-18'
len(r.law_data.sections)                     # → 2541

r.section

Raw dict of the resolved section, or None if not found.

r.section.get('titel')    # same as r.titel()
r.section.get('content')  # list of content blocks

r.resolved_para

The specific ParagraphRef that was matched (after multi-target expansion).

str(r.resolved_para)   # → '433 Abs. 1'

LawData

Returned by lib.get_law() and available as result.law_data.

bgb = lib.get_law('BGB')

Attributes

bgb.jurabk        # → 'BGB'          abbreviation
bgb.gesetze_id    # → 'BGB::BJNR001950896'  internal corpus ID
bgb.metadaten     # → dict            full metadata
bgb.sections      # → dict            all sections keyed by paragraph string
bgb.fussnoten     # → list            footnotes at law level
bgb.quelle        # → dict            source metadata

Useful metadaten keys

bgb.metadaten.get('langtitel')                        # → 'Bürgerliches Gesetzbuch'
bgb.metadaten.get('kurztitel')                        # short title if present
bgb.metadaten.get('ausfertigung_datum')               # → '1896-08-18'
bgb.metadaten.get('fundstelle', {}).get('periodikum') # → 'RGBl'
bgb.metadaten.get('fundstelle', {}).get('zitstelle')  # → '1896, 195'

bgb.get_section(paragraph)

Look up a section by paragraph number string.

sec = bgb.get_section('433')
sec['titel']    # → 'Vertragstypische Pflichten beim Kaufvertrag'
sec['content']  # → list of Absatz dicts

bgb.get_absatz(paragraph, absatz)

Look up a specific Absatz within a section.

abs1 = bgb.get_absatz('433', 1)
abs1 = bgb.get_absatz('433', '1')   # string also works

Resolved depth reference

`resolved_depth`	Meaning
`'section'`	Only the paragraph was found (no sub-ref match)
`'absatz'`	Absatz resolved, Nummer was not requested/found
`'nummer'`	Nummer resolved, Buchstabe not requested/found
`'buchstabe'`	Buchstabe resolved, Unterbuchstabe not requested/found
`'unterbuchstabe'`	Fully resolved to Unterbuchstabe level (`aa)`, `bb)`)

Complete example

from bundesrecht import Bundesrecht, normalise, parse_reference

# Load
lib = Bundesrecht()
print(lib)   # → Bundesrecht(6873 laws loaded)

# Parse only
ref = parse_reference('§ 433 Abs. 1 Satz 1 BGB')
ref.law                          # → 'BGB'
ref.paragraphs[0].paragraph      # → '433'
ref.paragraphs[0].sub_refs       # → [SubReference(Abs,1), SubReference(Satz,1)]

# Normalise only
normalise('§ 2 Abs. 1 Nr. 1, Nr. 7, Abs. 2 UrhG')
# → ['§ 2 Abs. 1 Nr. 1 UrhG', '§ 2 Abs. 1 Nr. 7 UrhG', '§ 2 Abs. 2 UrhG']

# Resolve
results = lib.query('§ 433 Abs. 1 BGB')
r = results[0]

r.titel()          # → 'Vertragstypische Pflichten beim Kaufvertrag'
r.full_text()      # → actual statutory text of Abs. 1
r.resolved_depth   # → 'absatz'  (Absatz found, but no Nummer requested)
str(r.reference)   # → '§ 433 Abs. 1 BGB'

# Inspect a law directly
bgb = lib.get_law('BGB')
bgb.metadaten.get('langtitel')                        # → 'Bürgerliches Gesetzbuch'
bgb.metadaten.get('ausfertigung_datum')  # → '1896-08-18'
len(bgb.sections)                        # → 2541

# List all laws
lib.available_laws[:5]    # → ['1-DM-GOLDMÜNZG', '1. BESVNG', ...]
lib.law_count             # → 6873

Citation

Please consider citing our paper when using the dataset:

@article{darji2026bundesrecht,
  title={Bundesrecht: An Open Library and Corpus for German Statutory Reference Processing},
  author={Darji, Harshil and Heckelmann, Martin and Kratsch, Christina and de Melo, Gerard},
  journal={arXiv preprint arXiv:2605.31338},
  year={2026}
}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

harshildarji

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.3

Jun 19, 2026

0.1.2

Jun 8, 2026

0.1.1

Jun 8, 2026

0.1.0

May 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bundesrecht-0.1.3.tar.gz (35.2 kB view details)

Uploaded Jun 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bundesrecht-0.1.3-py3-none-any.whl (32.7 kB view details)

Uploaded Jun 19, 2026 Python 3

File details

Details for the file bundesrecht-0.1.3.tar.gz.

File metadata

Download URL: bundesrecht-0.1.3.tar.gz
Upload date: Jun 19, 2026
Size: 35.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bundesrecht-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`12ecf9b1304c01b885d124ccdb3067548e21a50f70537a5ade18318a1c9c3fa6`
MD5	`3f63576182815c45384da099be5402ee`
BLAKE2b-256	`74cc0ce6a43b36baa8d3d665d903cd21b5aec33d7e752ec2c718e943d501a892`

See more details on using hashes here.

Provenance

The following attestation bundles were made for bundesrecht-0.1.3.tar.gz:

Publisher: release.yml on harshildarji/bundesrecht

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: bundesrecht-0.1.3.tar.gz
- Subject digest: 12ecf9b1304c01b885d124ccdb3067548e21a50f70537a5ade18318a1c9c3fa6
- Sigstore transparency entry: 1875121354
- Sigstore integration time: Jun 19, 2026
Source repository:
- Permalink: harshildarji/bundesrecht@e5c9c98e07040adf00f976b8f2daafd7418ff6a7
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/harshildarji
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@e5c9c98e07040adf00f976b8f2daafd7418ff6a7
- Trigger Event: release

File details

Details for the file bundesrecht-0.1.3-py3-none-any.whl.

File metadata

Download URL: bundesrecht-0.1.3-py3-none-any.whl
Upload date: Jun 19, 2026
Size: 32.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bundesrecht-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`942791f5124dde381ad4f3f3ee893b03d6b4b0e88904c538a090ad49a31f79be`
MD5	`c679fdcb799f8cdb1aed251063aa1f19`
BLAKE2b-256	`531d4e2f09cec46a02bc247e7ea846753804dde28f390f1926322866fd9b18ad`

See more details on using hashes here.

Provenance

The following attestation bundles were made for bundesrecht-0.1.3-py3-none-any.whl:

Publisher: release.yml on harshildarji/bundesrecht

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: bundesrecht-0.1.3-py3-none-any.whl
- Subject digest: 942791f5124dde381ad4f3f3ee893b03d6b4b0e88904c538a090ad49a31f79be
- Sigstore transparency entry: 1875121743
- Sigstore integration time: Jun 19, 2026
Source repository:
- Permalink: harshildarji/bundesrecht@e5c9c98e07040adf00f976b8f2daafd7418ff6a7
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/harshildarji
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@e5c9c98e07040adf00f976b8f2daafd7418ff6a7
- Trigger Event: release

bundesrecht 0.1.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

bundesrecht

Contents

Installation

Quick start

Simplified architecture

Parsing references

Data model

LawReference

ParagraphRef

SubReference

Normalising references

What the normaliser handles

Resolving references

lib.query(raw)

lib.query_canonical(canonical)

lib.normalise(raw)

lib.get_law(abbreviation)

lib.available_laws

lib.law_count

Corpus cache

QueryResult

r.full_text()

r.titel()

r.resolved_depth

r.resolution_note

r.reference

r.law_data

r.section

r.resolved_para

LawData

Attributes

Useful metadaten keys

bgb.get_section(paragraph)

bgb.get_absatz(paragraph, absatz)

Resolved depth reference

Complete example

Citation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance