Skip to main content

Universal Code Database — convert legal documents into portable SQLite databases with Akoma Ntoso / LegalDocML as the canonical XML format.

Project description

Universal Code Database

UCDB converts legal documents into a portable SQLite database with Akoma Ntoso / LegalDocML as the canonical XML format.

The database stores canonical Akoma Ntoso XML per expression and derives query, diff, blame, search, and downstream data surfaces from normalized node tables.

See docs/architecture/020-akoma-ntoso-design.md for the design rationale.

Pipeline

source document or Akoma Ntoso XML
-> normalized legal model
-> Akoma Ntoso / LegalDocML XML
-> SQLite works + expressions + nodes
-> diff / blame / search / exports

The storage artifact is a single SQLite database. It includes:

  • source hashes and processing provenance;
  • canonical Akoma Ntoso XML and canonical hash;
  • normalized structural nodes with eId, num, heading, text, XML fragment, text hashes, hierarchy, and document ordering;
  • revision summaries and node-level changes;
  • line-level provenance for ucdb query blame;
  • FTS5 trigram search for CJK-friendly substring matching;
  • placeholders for RAG chunks and reproducible exports.

Install

pip install ucdb
# or
uv tool install ucdb

For local development:

uv sync
uv run ucdb --help

Quick Start

ucdb init

# Import canonical Akoma Ntoso XML produced elsewhere.
ucdb import-akn ./law.xml --work-id civil-code --version 2026-04-29 --no-schema

ucdb query works
ucdb query expressions civil-code
ucdb query nodes 1
ucdb query search "契約" --work-id civil-code
ucdb query akn 1

AI-assisted processing is still available for PDF/DOCX/ODT/TXT/Markdown inputs:

export OPENAI_API_KEY=sk-...
ucdb process ./input

Input repositories are scanned as:

./input/<work-id>/<version-label>/<document>.{pdf,docx,odt,txt,md}

Configuration

Environment variable Purpose Default
UCDB_DB Default SQLite path ucdb.sqlite3
OPENAI_API_KEY API key for AI-assisted normalization required for process
OPENAI_BASE_URL OpenAI-compatible endpoint OpenAI default
UCDB_MODEL Model used for structured extraction gpt-5.4-mini
UCDB_AKN_XSD Optional Akoma Ntoso XSD path for strict validation off
UCDB_JSON Emit JSON summaries for process/import commands off

Main Commands

ucdb init
ucdb scan <root>
ucdb process <root>
ucdb process-one <file> --work-id ... --version ...
ucdb import-akn <xml> --work-id ... --version ...
ucdb export json <expression-id>
ucdb export rag <expression-id>
ucdb export markdown <expression-id>
ucdb export html <expression-id>
ucdb serve

ucdb query works
ucdb query expressions <work-id>
ucdb query nodes <expression-id>
ucdb query node <node-id> [--xml]
ucdb query search <text> [--work-id ...]
ucdb query akn <expression-id>
ucdb query revisions <work-id>
ucdb query revision <revision-id>
ucdb query diff <change-id>
ucdb query diff-expressions <work-id> --from <v1> --to <v2> [--node-eid ...]
ucdb query blame <work-id> <node-eid> [--version ...]
ucdb query history <work-id> <node-eid>
ucdb query log

Core Modules

model.py       normalized LegalDocument / LegalNode dataclasses
akn.py         Akoma Ntoso parser, serializer, and validation helpers
tw_profile.py  Taiwan profile helpers and identifier normalization
db.py          SQLite schema and data access
ingest.py      Akoma Ntoso XML -> normalized tables
ai.py          OpenAI-compatible extraction -> normalized model -> AKN XML
process.py     pipeline orchestration
revisions.py   structural node diff engine
blame.py       line-level provenance
web.py         read-only browser data layer and HTTP server
exporters.py   JSON, RAG JSONL, Markdown, and HTML renderers

Development

Run the current no-network regression tests:

uv run python tests/test_history.py
uv run python tests/test_web.py

The fixture imports ten Akoma Ntoso snapshots and verifies search, arbitrary version-pair diff, repeal/re-enactment handling, line blame, history, and web store queries.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ucdb-0.2.0.tar.gz (36.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ucdb-0.2.0-py3-none-any.whl (41.5 kB view details)

Uploaded Python 3

File details

Details for the file ucdb-0.2.0.tar.gz.

File metadata

  • Download URL: ucdb-0.2.0.tar.gz
  • Upload date:
  • Size: 36.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for ucdb-0.2.0.tar.gz
Algorithm Hash digest
SHA256 d5c5d155a05d42139cb16955275600a8068dbfef62b687eb9d3411eb0f5823d7
MD5 faf0ce6d92b4aca5886b9ccf344db2ca
BLAKE2b-256 385202d24cab3263b22c565c85d62cef23b092a8d2a73488e59da150fbef5bec

See more details on using hashes here.

Provenance

The following attestation bundles were made for ucdb-0.2.0.tar.gz:

Publisher: release.yml on JacobLinCool/ucdb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ucdb-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: ucdb-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 41.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for ucdb-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 055ccaffd3e8fd232386c89c57b45909917b094ee7981d458aff03604950086e
MD5 7ccac05e585041346fc884e138533d44
BLAKE2b-256 f61b49d1589263f8eee632ae5e361fed4c42ac76511e665af74e7a719021dcad

See more details on using hashes here.

Provenance

The following attestation bundles were made for ucdb-0.2.0-py3-none-any.whl:

Publisher: release.yml on JacobLinCool/ucdb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page