Universal Code Database — convert legal documents into portable SQLite databases with Akoma Ntoso / LegalDocML as the canonical XML format.
Project description
Universal Code Database
UCDB converts legal documents into a portable SQLite database with Akoma Ntoso / LegalDocML as the canonical XML format.
The database stores canonical Akoma Ntoso XML per expression and derives query, diff, blame, search, and downstream data surfaces from normalized node tables.
See docs/architecture/020-akoma-ntoso-design.md for the design rationale.
Pipeline
source document or Akoma Ntoso XML
-> normalized legal model
-> Akoma Ntoso / LegalDocML XML
-> SQLite works + expressions + nodes
-> diff / blame / search / exports
The storage artifact is a single SQLite database. It includes:
- source hashes and processing provenance;
- canonical Akoma Ntoso XML and canonical hash;
- normalized structural nodes with
eId,num, heading, text, XML fragment, text hashes, hierarchy, and document ordering; - revision summaries and node-level changes;
- line-level provenance for
ucdb query blame; - FTS5 trigram search for CJK-friendly substring matching;
- placeholders for RAG chunks and reproducible exports.
Install
pip install ucdb
# or
uv tool install ucdb
For local development:
uv sync
uv run ucdb --help
Quick Start
ucdb init
# Import canonical Akoma Ntoso XML produced elsewhere.
ucdb import-akn ./law.xml --work-id civil-code --version 2026-04-29 --no-schema
ucdb query works
ucdb query expressions civil-code
ucdb query nodes 1
ucdb query search "契約" --work-id civil-code
ucdb query akn 1
AI-assisted processing is still available for PDF/DOCX/ODT/TXT/Markdown inputs:
export OPENAI_API_KEY=sk-...
ucdb process ./input
Input repositories are scanned as:
./input/<work-id>/<version-label>/<document>.{pdf,docx,odt,txt,md}
Configuration
| Environment variable | Purpose | Default |
|---|---|---|
UCDB_DB |
Default SQLite path | ucdb.sqlite3 |
OPENAI_API_KEY |
API key for AI-assisted normalization | required for process |
OPENAI_BASE_URL |
OpenAI-compatible endpoint | OpenAI default |
UCDB_MODEL |
Model used for structured extraction | gpt-5.4-mini |
UCDB_AKN_XSD |
Optional Akoma Ntoso XSD path for strict validation | off |
UCDB_JSON |
Emit JSON summaries for process/import commands | off |
Main Commands
ucdb init
ucdb scan <root>
ucdb process <root>
ucdb process-one <file> --work-id ... --version ...
ucdb import-akn <xml> --work-id ... --version ...
ucdb export json <expression-id>
ucdb export rag <expression-id>
ucdb export markdown <expression-id>
ucdb export html <expression-id>
ucdb serve
ucdb query works
ucdb query expressions <work-id>
ucdb query nodes <expression-id>
ucdb query node <node-id> [--xml]
ucdb query search <text> [--work-id ...]
ucdb query akn <expression-id>
ucdb query revisions <work-id>
ucdb query revision <revision-id>
ucdb query diff <change-id>
ucdb query diff-expressions <work-id> --from <v1> --to <v2> [--node-eid ...]
ucdb query blame <work-id> <node-eid> [--version ...]
ucdb query history <work-id> <node-eid>
ucdb query log
Core Modules
model.py normalized LegalDocument / LegalNode dataclasses
akn.py Akoma Ntoso parser, serializer, and validation helpers
tw_profile.py Taiwan profile helpers and identifier normalization
db.py SQLite schema and data access
ingest.py Akoma Ntoso XML -> normalized tables
ai.py OpenAI-compatible extraction -> normalized model -> AKN XML
process.py pipeline orchestration
revisions.py structural node diff engine
blame.py line-level provenance
web.py read-only browser data layer and HTTP server
exporters.py JSON, RAG JSONL, Markdown, and HTML renderers
Development
Run the current no-network regression tests:
uv run python tests/test_history.py
uv run python tests/test_web.py
The fixture imports ten Akoma Ntoso snapshots and verifies search, arbitrary version-pair diff, repeal/re-enactment handling, line blame, history, and web store queries.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ucdb-0.2.0.tar.gz.
File metadata
- Download URL: ucdb-0.2.0.tar.gz
- Upload date:
- Size: 36.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5c5d155a05d42139cb16955275600a8068dbfef62b687eb9d3411eb0f5823d7
|
|
| MD5 |
faf0ce6d92b4aca5886b9ccf344db2ca
|
|
| BLAKE2b-256 |
385202d24cab3263b22c565c85d62cef23b092a8d2a73488e59da150fbef5bec
|
Provenance
The following attestation bundles were made for ucdb-0.2.0.tar.gz:
Publisher:
release.yml on JacobLinCool/ucdb
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ucdb-0.2.0.tar.gz -
Subject digest:
d5c5d155a05d42139cb16955275600a8068dbfef62b687eb9d3411eb0f5823d7 - Sigstore transparency entry: 1399340386
- Sigstore integration time:
-
Permalink:
JacobLinCool/ucdb@f494f576e51e0ca9ca6e0f8cc38c6de3208b6168 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/JacobLinCool
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f494f576e51e0ca9ca6e0f8cc38c6de3208b6168 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ucdb-0.2.0-py3-none-any.whl.
File metadata
- Download URL: ucdb-0.2.0-py3-none-any.whl
- Upload date:
- Size: 41.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
055ccaffd3e8fd232386c89c57b45909917b094ee7981d458aff03604950086e
|
|
| MD5 |
7ccac05e585041346fc884e138533d44
|
|
| BLAKE2b-256 |
f61b49d1589263f8eee632ae5e361fed4c42ac76511e665af74e7a719021dcad
|
Provenance
The following attestation bundles were made for ucdb-0.2.0-py3-none-any.whl:
Publisher:
release.yml on JacobLinCool/ucdb
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ucdb-0.2.0-py3-none-any.whl -
Subject digest:
055ccaffd3e8fd232386c89c57b45909917b094ee7981d458aff03604950086e - Sigstore transparency entry: 1399340389
- Sigstore integration time:
-
Permalink:
JacobLinCool/ucdb@f494f576e51e0ca9ca6e0f8cc38c6de3208b6168 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/JacobLinCool
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f494f576e51e0ca9ca6e0f8cc38c6de3208b6168 -
Trigger Event:
push
-
Statement type: