Extract references from legal documents
Project description
Legal Reference Extraction
A toolkit for extracting references and citations from legal documents. References to law sections and case files are supported.
Supported countries:
- Germany (used by de.openlegaldata.io)
Install
# latest from git
pip install git+https://github.com/openlegaldata/legal-reference-extraction.git
# specific version (using git tag)
pip install git+https://github.com/openlegaldata/legal-reference-extraction.git@v0.3.0
# local dev
make install
Usage
from refex.extractor import RefExtractor
extractor = RefExtractor()
content, markers = extractor.extract('<p>Ein Satz mit § 3b AsylG, und weiteren Sachen.</p>')
Examples
Single law reference -- a basic § citation is extracted with the section number and law book code:
from refex.extractor import RefExtractor
extractor = RefExtractor()
content, markers = extractor.extract(
"Die Entscheidung beruht auf § 42 VwGO."
)
for marker in markers:
for ref in marker.get_references():
print(ref)
# <Ref(law: vwgo/42)>
The returned content wraps each matched reference with marker tags:
Die Entscheidung beruht auf [ref=<uuid>]§ 42 VwGO[/ref].
Multiple sections from the same law -- §§ with comma-separated or semicolon-separated sections:
content, markers = extractor.extract(
"Bar und bar §§ 1, 2 Abs. 2, 3, 10 Abs. 1 Nr. 1 BGB foo."
)
refs = [ref for m in markers for ref in m.get_references()]
print(sorted(refs))
# [<Ref(law: bgb/1)>, <Ref(law: bgb/10)>, <Ref(law: bgb/2)>, <Ref(law: bgb/3)>]
Cross-references between laws -- i.V.m. (in conjunction with) linking sections across different law books:
content, markers = extractor.extract(
"Die Entscheidung über die vorläufige Vollstreckbarkeit folgt aus "
"§ 167 VwGO i.V.m. §§ 708 Nr. 11, 711 ZPO."
)
refs = [ref for m in markers for ref in m.get_references()]
print(sorted(refs))
# [<Ref(law: vwgo/167)>, <Ref(law: zpo/708)>, <Ref(law: zpo/711)>]
Case references -- court names and file numbers are extracted from citations:
extractor = RefExtractor()
extractor.do_law_refs = False # only extract case references
extractor.do_case_refs = True
content, markers = extractor.extract(
"Das OVG Schleswig habe bereits in seinem Urteil vom 22.04.2010 "
"(1 KN 19/09) zur im Wesentlichen gleichlautenden Vorgängervorschrift "
"im LROP-TF 2004 festgestellt, dass dieser Vorschrift die erforderliche "
"Bestimmtheit nicht zukomme."
)
for marker in markers:
for ref in marker.get_references():
print(ref)
# <Ref(case: OVG Schleswig/1 KN 19/09/)>
Multiple case references -- multiple courts and file numbers from a single passage:
content, markers = extractor.extract(
"(vgl. BVerwG, Beschluss vom 12.11.1987 - 4 B 216/87 -, juris [Rn. 2]; "
"VGH BW, Urteil vom 10.01.2007 - 3 S 1251/06 -, juris [Rn. 25])"
)
for marker in markers:
for ref in marker.get_references():
print(ref.court, ref.file_number)
# BVerwG 4 B 216/87
# VGH BW 3 S 1251/06
Law book context -- when extracting from within a specific law's text, set law_book_context to resolve bare § references without an explicit book code:
extractor = RefExtractor()
extractor.do_case_refs = False
extractor.law_book_context = "bgb"
content, markers = extractor.extract(
"Der Vorsitzende kann einen solchen Vertreter auch bestellen, "
"wenn in den Fällen des § 20 eine nicht prozessfähige Person bei dem "
"Gericht ihres Aufenthaltsortes verklagt werden soll."
)
refs = [ref for m in markers for ref in m.get_references()]
print(refs[0])
# <Ref(law: bgb/20)>
Development
make install # create venv + install in editable mode with dev deps
make test # run pytest
make lint # ruff check + format check
make format # auto-fix lint + format
See also
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file legal_reference_extraction-0.4.2.tar.gz.
File metadata
- Download URL: legal_reference_extraction-0.4.2.tar.gz
- Upload date:
- Size: 33.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3996c80aa714e5b668b3f156d7dec70208b04ccf7db7492c9e1b97a3bde2c3fd
|
|
| MD5 |
becab094a3cad0b64a7face517adf399
|
|
| BLAKE2b-256 |
d0f44e1684114f56fe69ecc3298876e70095dfc2158f33b582c1b2a05c552f71
|
Provenance
The following attestation bundles were made for legal_reference_extraction-0.4.2.tar.gz:
Publisher:
publish.yml on openlegaldata/legal-reference-extraction
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
legal_reference_extraction-0.4.2.tar.gz -
Subject digest:
3996c80aa714e5b668b3f156d7dec70208b04ccf7db7492c9e1b97a3bde2c3fd - Sigstore transparency entry: 947366988
- Sigstore integration time:
-
Permalink:
openlegaldata/legal-reference-extraction@0cea316fdbbe089110f7f348b69523b2318ef637 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/openlegaldata
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0cea316fdbbe089110f7f348b69523b2318ef637 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file legal_reference_extraction-0.4.2-py3-none-any.whl.
File metadata
- Download URL: legal_reference_extraction-0.4.2-py3-none-any.whl
- Upload date:
- Size: 30.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c02469256aa39330bf7c399a4e83161719efeba6bd33f76376051a0d1886abe
|
|
| MD5 |
928fef9f66c0e8f2b82562cce69a8389
|
|
| BLAKE2b-256 |
82904ccd872d97243c972a7c17af806b0561fab9d001eecbaaac2ed5269415ad
|
Provenance
The following attestation bundles were made for legal_reference_extraction-0.4.2-py3-none-any.whl:
Publisher:
publish.yml on openlegaldata/legal-reference-extraction
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
legal_reference_extraction-0.4.2-py3-none-any.whl -
Subject digest:
1c02469256aa39330bf7c399a4e83161719efeba6bd33f76376051a0d1886abe - Sigstore transparency entry: 947367018
- Sigstore integration time:
-
Permalink:
openlegaldata/legal-reference-extraction@0cea316fdbbe089110f7f348b69523b2318ef637 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/openlegaldata
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0cea316fdbbe089110f7f348b69523b2318ef637 -
Trigger Event:
workflow_dispatch
-
Statement type: