Extract compliance rules from regulatory documents using LLM-assisted parsing. Phase 1 only.
Project description
kg-extractor
Install from PyPI:
pip install ski-kg-extractor(publishing starts with the first release after June 2026).
Status: v3.0 — first production-target release.
Extract structured compliance rules from regulatory documents
(LLM-assisted) and emit a v3 Knowledge Graph ready for
kg-validator and the SKI Model runtime.
kg-extractor is a Phase 1 (compilation) tool. The output goes
through kg-validator (schema + §3.6 validation passes) and is then
signed before crossing the sovereign boundary. The extractor's LLM
backend may be local or cloud — its output is not part of the runtime
path.
Highlights (v3.0)
- Emits v3 KG JSON by default. Each extracted rule is wrapped into
the typed-graph shape from spec v3.0 §3: a Rule node, an Obligation
with a guessed
obligation_typeand numericvalue/unit, a Subject, a Citation, and the four edges connecting them (applies_to,consists_of,scoped_to,cited_by). --jurisdictionflag. Every KG carries a Jurisdiction node and every rule isscoped_toit. Defaults toglobal.- Pluggable LLM backend. Default is Ollama (local). Cloud
backends (
anthropic,openai) exist for compilation-phase use only. - Deterministic by default. Temperature 0, fixed seed, recorded prompt SHA-256 in extraction metadata for reproducibility audits.
IMPLIEDis prohibited (B2.1 Anchor Constraint). The prompt instructs the model not to emitIMPLIEDand the parser downgrades any that slip through toDISCRETIONARYwith a warning. Rules cannot be inferred beyond source text.extraction_qualityreplacesconfidence. The per-rule trust signal is nowextraction_quality(EXPLICIT / DISCRETIONARY / CONFLICTING) — a Phase-1 authoring concept, separate from the runtime's categorical verdict (Axiom 2 prohibits confidence scores in the audit trail).
Installation
pip install -e tools/kg-extractor
Quick start
# Extract a v3 KG from a regulation:
kg-extractor extract --file regulation.txt --sector energy \
--jurisdiction us.federal \
--output kg-extracted-v3.json
# Validate the emitted KG:
kg-validator validate --input kg-extracted-v3.json
# Batch:
kg-extractor batch --input-dir ./regulations --output-dir ./kgs \
--sector energy --jurisdiction us.federal
# Debugging — dump the raw flat-rule list before wrapping:
kg-extractor extract --file regulation.txt --emit-raw --output flat.json
Architecture
kg_extractor/
├── __init__.py public API: Extractor, ExtractionResult, ExtractionQuality, emit_v3_kg
├── extractor.py chunk → LLM → flat ComplianceRule list
├── models.py ComplianceRule, ExtractionResult, ExtractionQuality enum
├── backends.py Ollama / Anthropic / OpenAI backends (compilation phase only)
├── v3_emitter.py wraps ExtractionResult into a v3 KG dict (spec §3 shape)
├── utils.py chunking, document parsing, rule validation
└── cli.py click-based CLI (`extract`, `batch`, `filter`, `examples`, `version`)
Notes on extraction_quality
- EXPLICIT — the LLM produced a verbatim source quote in
reasoning. The rule is safe to ship without further review. - DISCRETIONARY — the LLM could not anchor the rule in a verbatim source quote. A human reviewer must approve before shipping.
- CONFLICTING — the LLM detected internal contradiction in the source. Surface for legal review.
This is not a confidence score in the probabilistic sense. The runtime never reads it. It exists solely to drive the Phase-1 human-review queue.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ski_kg_extractor-3.0.3.tar.gz.
File metadata
- Download URL: ski_kg_extractor-3.0.3.tar.gz
- Upload date:
- Size: 19.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91aa2620b7494aa8b2ce95b1fc1fb5ccafdb1bc293b85184089212a7e7baa817
|
|
| MD5 |
291fe0e844720fec979cf7bcb382565a
|
|
| BLAKE2b-256 |
c1be933f8c25ff332116c8210b48c3583ffc33e23ef6930c45bfb158b5145e93
|
Provenance
The following attestation bundles were made for ski_kg_extractor-3.0.3.tar.gz:
Publisher:
release.yml on kpifinity/ski-framework
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ski_kg_extractor-3.0.3.tar.gz -
Subject digest:
91aa2620b7494aa8b2ce95b1fc1fb5ccafdb1bc293b85184089212a7e7baa817 - Sigstore transparency entry: 1783512878
- Sigstore integration time:
-
Permalink:
kpifinity/ski-framework@36cdaafd0ff3fc0b5e92f58356aeba4bea794db6 -
Branch / Tag:
refs/tags/v3.1.0-alpha.1 - Owner: https://github.com/kpifinity
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@36cdaafd0ff3fc0b5e92f58356aeba4bea794db6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ski_kg_extractor-3.0.3-py3-none-any.whl.
File metadata
- Download URL: ski_kg_extractor-3.0.3-py3-none-any.whl
- Upload date:
- Size: 18.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5cc98f33395466feda0c1371c682c7b7b8aba64453b01dcd5e6461b0e25bc2ec
|
|
| MD5 |
4ea91e34e81057c99113f49213c9368c
|
|
| BLAKE2b-256 |
b83f2b4e28e65f3566824b769f77e8ce05eff11b09cfb3e9756abcb65dc159c9
|
Provenance
The following attestation bundles were made for ski_kg_extractor-3.0.3-py3-none-any.whl:
Publisher:
release.yml on kpifinity/ski-framework
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ski_kg_extractor-3.0.3-py3-none-any.whl -
Subject digest:
5cc98f33395466feda0c1371c682c7b7b8aba64453b01dcd5e6461b0e25bc2ec - Sigstore transparency entry: 1783513675
- Sigstore integration time:
-
Permalink:
kpifinity/ski-framework@36cdaafd0ff3fc0b5e92f58356aeba4bea794db6 -
Branch / Tag:
refs/tags/v3.1.0-alpha.1 - Owner: https://github.com/kpifinity
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@36cdaafd0ff3fc0b5e92f58356aeba4bea794db6 -
Trigger Event:
push
-
Statement type: