Skip to main content

Git pre-commit hook + web dashboard for detecting LLM-induced document corruption

Project description

Document Integrity Layer

Detect LLM-induced document corruption before it ships.

What is this?

Document Integrity Layer is a Git pre-commit hook and web dashboard that catches document corruption introduced by AI assistants in real-time. It scans Word, PDF, and Markdown files for hallucinated citations, broken cross-references, malformed tables, and formatting inconsistencies—then generates audit trails proving exactly what changed and why. Built for developers and technical writers who delegate writing to Claude, ChatGPT, or similar tools and need verification that the AI didn't silently break your work.

Features

  • Git pre-commit scanning — Automatically checks staged documents before commits
  • Multi-format support — Analyzes .docx, .pdf, and .md files with semantic understanding
  • Corruption detection — Identifies hallucinated URLs, broken internal links, table structure corruption, and citation inconsistencies
  • Web dashboard — Visual history of all integrity checks across your repository
  • Audit trails — Export compliance-ready reports showing what changed and when
  • Slack/Discord alerts — Real-time notifications when corruption is detected
  • Custom rule configuration — Define project-specific validation rules in .dil.toml
  • Docker-ready — Run locally or containerized; no external dependencies required

Quick Start

Installation

pip install document-integrity-layer

Setup

Initialize in your repository:

dil init

This creates .dil.toml with default configuration. Install the Git hook:

dil install-hook

Configuration

Edit .dil.toml to customize detection rules:

[scanner]
check_citations = true
check_cross_references = true
check_table_integrity = true
check_formatting = true

[alerts]
slack_webhook = "https://hooks.slack.com/services/YOUR/WEBHOOK"

Usage

CLI

Run a one-time scan:

dil scan document.docx

Scan an entire directory:

dil scan ./docs --recursive

View integrity history:

dil history

Web Dashboard

Start the dashboard server:

dil server --port 8000

Navigate to http://localhost:8000 to explore:

  • Scan history across commits
  • Corruption reports with side-by-side diffs
  • Citation and link validation results
  • Custom audit exports

Pre-commit Hook

Once installed, the hook runs automatically:

git add my-document.docx
git commit -m "Update docs"
# → Pre-commit hook scans my-document.docx
# → Reports corruption if found
# → Blocks commit if severity threshold exceeded

Tech Stack

  • Python 3.9+ — Core language
  • Flask — Web dashboard and API
  • python-docx — DOCX parsing and analysis
  • PyPDF2 — PDF text extraction
  • Markdown — Native MD support
  • SQLite — Audit trail storage
  • Docker — Containerized deployment

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

document_integrity_layer-0.1.0.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

document_integrity_layer-0.1.0-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file document_integrity_layer-0.1.0.tar.gz.

File metadata

  • Download URL: document_integrity_layer-0.1.0.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for document_integrity_layer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3caab38c50956ef5722a49250689a9891e1d2219c80b517f8087bb6a5bad89e9
MD5 ccc87118bbc5d8b5575574d249604e22
BLAKE2b-256 38c905a982cb9536b06186cdab20c995ff405cae01887e6164869eae722aeec9

See more details on using hashes here.

File details

Details for the file document_integrity_layer-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for document_integrity_layer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 af19aa71e6b506277b75ec590f2d25e5c3d307600507335974d70cd8df62cef1
MD5 1ea215d563321de0e8e64ed93c6a7221
BLAKE2b-256 759e7c16a17412e0956355f13633b1d5b80bc9b0e3ae7d90cb7bdaf5a2b68e99

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page