High-performance document intelligence gateway and safety guardrail engine.

These details have not been verified by PyPI

Project description

DocGaurd (Document Intelligence Gateway)

DocGaurd (Document Intelligence Gateway) is a high-performance document validation, security scanning, quality guardrail, and exact token counting engine. Built in Rust with native Python bindings via PyO3, DocGaurd sits between raw document ingestion and downstream LLM/RAG pipelines to prevent system exploitation, database bloat, and unexpected API costs.

Features • Installation • Quick Start • Python API • Telemetry Schema • Supported Formats • Examples • License

Features

✨ Real GPT Tokenization - Integrates high-performance tiktoken-rs in Rust to calculate exact GPT token budgets (not approximations) for models like GPT-4, GPT-3.5, Claude, or LLaMA.
⚡ Multi-Format Support - Seamlessly extracts text and parses metadata from PDF, TXT, MD, DOCX, PPTX, XLSX, CSV, JSON, XML, and HTML files.
🛡️ Ingestion Security - Built-in security scanners inspect compressed documents and file headers to intercept Zip bombs, compression bombs, and oversized resource limits before they reach system memory.
🔍 Text Quality & OCR Necessity Detection - Evaluates page text density, whitespace-to-character ratio, and empty page signals to flag scanned/image-only documents (requires_ocr) before vector database embedding.
🚀 Native Parallel Batch Processing - Utilizes Rust's concurrent work-stealing thread pool (Rayon) to process thousands of files or directory trees in parallel with zero GIL serialization.
💾 Global De-duplication - Computes high-performance SHA-256 content hashes in parallel to identify and skip exact duplicate files inside a batch queue automatically.
💰 Dynamic Cost Estimation - Estimates LLM input cost and vector database embedding cost dynamically before making external API requests.
🎯 Intelligent Agent Routing - Classifies text based on heuristic token frequencies and assigns a target downstream AI Agent (e.g., LegalAgent, ProcurementAgent).

Installation

From PyPI (Recommended)

Install pre-compiled native binary wheels instantly on Windows, Linux, or macOS:

pip install docgaurd

(No Rust compilers, C-libraries, or compilation tools are required on the host system).

From Source

git clone https://github.com/JIVTESH28/docgaurd.git
cd docgaurd
pip install .

Quick Start

Initialize the Analyzer

import json
import docgaurd

# Initialize the gateway analyzer with custom thresholds
analyzer = docgaurd.DocumentAnalyzer({
    "target_model": "gpt-4",                   # Target context window check
    "tokenizer_name": "cl100k_base",           # Tiktoken profile
    "embedding_rate_per_million": 0.02,        # Cost per 1M tokens ($)
    "llm_input_rate_per_million": 5.00,        # Cost per 1M tokens ($)
    "max_file_size": 52428800                  # Max file size (50MB)
})

Python API Usage

Single File Ingestion (Local Disk)

report_str = analyzer.analyze_file("contract.pdf")
report = json.loads(report_str)
print(f"Tokens: {report['token_count']} | RAG Ready: {report['rag_ready']}")

In-Memory Bytes Ingestion (API Uploads)

uploaded_bytes = b"Sample document text buffer."
report_str = analyzer.analyze_bytes(uploaded_bytes, "invoice.txt")
report = json.loads(report_str)
print(f"Domain Class: {report['document_class']} | RAG Ready: {report['rag_ready']}")

Natively Parallel Batch Processing

file_list = ["agreement.docx", "data.xlsx", "spec.pdf"]
batch_report_str = analyzer.analyze_batch(file_list)
batch_report = json.loads(batch_report_str)

print(f"Successful files: {batch_report['summary']['successful_files']}")
print(f"Duplicates skipped: {batch_report['summary']['duplicate_files']}")

Directory Ingestion (Recursive Scan)

dir_report_str = analyzer.analyze_directory("./archive", recursive=True)
dir_report = json.loads(dir_report_str)
print(f"Total directory tokens: {dir_report['summary']['total_tokens']}")

Ultra-Fast Single-Metric Bypasses

If you only need a single metric and want to bypass the rest of the gateway analysis pipeline (such as security checks, cost estimation, and domain classification), use the sub-millisecond helpers:

# Raw metric count helpers (File-based)
word_count = analyzer.count_words("document.docx")
char_count = analyzer.count_chars("document.docx")
token_count = analyzer.count_tokens("document.docx")

# Raw metric count helpers (Byte-based)
token_count = analyzer.count_tokens_bytes(uploaded_bytes, "invoice.txt")

Telemetry Output Schema

DocGaurd generates a comprehensive, metadata-rich telemetry report for every analyzed file:

{
  "file_name": "contract_agreement.pdf",
  "file_type": "pdf",
  "sha256": "07c270b274dae324f906e0aa3a8d606471931e9c1afc241ddbc8f9ae52baffe7",
  "token_count": 2424,
  "word_count": 1612,
  "character_count": 11448,
  "page_count": 4,
  "requires_ocr": false,
  "quality_score": 0.8,
  "duplicate": false,
  "security_risk": "low",
  "fits_context": true,
  "rag_ready": true,
  "requires_summarization": false,
  "recommended_chunking": "semantic chunking",
  "document_class": "Legal",
  "recommended_agent": "LegalAgent",
  "estimated_embedding_cost": 0.0,
  "estimated_llm_cost": 0.0121,
  "processing_time_ms": 12.34
}

Telemetry Field Descriptions

Field	Type	Description
`file_name`	String	Base name of the analyzed file.
`file_type`	String	Lowercase file extension (e.g. `pdf`, `docx`, `txt`).
`sha256`	String	Cryptographic SHA-256 hash representing the exact content payload.
`token_count`	Integer	Exact token count matching the selected model tokenizer profile.
`word_count`	Integer	Number of words counted based on unicode whitespace dividers.
`character_count`	Integer	UTF-8 character length of the extracted document text.
`page_count`	Integer	Page count (e.g. PDF pages, PowerPoint slides, Excel sheets, estimated text lines).
`requires_ocr`	Boolean	Flags `true` if document has page structures but low text density (image-only scanned).
`quality_score`	Float	Cleanliness index (`0.0` - `1.0`) graded by density, metadata, ratio, and OCR markers.
`duplicate`	Boolean	Flags `true` if identical SHA-256 has already been processed in the concurrent batch queue.
`security_risk`	String	Security score (`low`, `medium`, `high`) validating Zip bombs and size thresholds.
`fits_context`	Boolean	Checks if `token_count` fits inside the target model's context window.
`rag_ready`	Boolean	Evaluates suitability for search databases (`true` if secure, non-scanned, and clean).
`requires_summarization`	Boolean	Recommends pre-summarizing if the token count or page density is excessively large.
`recommended_chunking`	String	Suggested chunking strategy (`no chunking`, `fixed`, `semantic`, `hierarchical`, `agentic`).
`document_class`	String	Classified topical domain (Finance, Procurement, Legal, HR, Tech Doc, Research, etc.).
`recommended_agent`	String	Recommended target downstream AI Agent target (e.g. `LegalAgent`).
`estimated_embedding_cost`	Float	Predicted vector database indexing cost.
`estimated_llm_cost`	Float	Predicted input processing cost.
`processing_time_ms`	Float	Internal Gateway execution latency in milliseconds.

Supported Formats

Format	Extension	Extraction Method	Key Features
PDF	`.pdf`	Native lopdf Parser	Structural reading, scanned detection, page extraction
Word	`.docx`	Native docx XML Parser	Direct paragraph and table text extraction
PowerPoint	`.pptx`	Native pptx XML Parser	Shape text, slide processing, bullet analysis
Excel	`.xlsx`	Calamine Engine	Spreadsheet parsing, cell extraction, rows estimation
CSV	`.csv`	CSV Parser	Direct row, column parsing, delimiter validation
Plain Text	`.txt`, `.md`	Unicode Parser	Streaming flat extraction, lossy fallback encoding
JSON	`.json`	Serde JSON	Recursive nested key-value string extraction
XML	`.xml`	Quick XML Parser	Tag-stripped text, element-wise traversal
HTML	`.html`	Quick XML Parser	Element parsing, script/style extraction filtering

Configuration Limits

Setting	Default Value	Purpose
`target_model`	`"gpt-4"`	Target context size limit check
`tokenizer_name`	`"cl100k_base"`	Tokenizer profile (cl100k_base, r50k_base, p50k_base)
`max_file_size`	`52,428,800` bytes (50MB)	Intercept oversized documents
`embedding_rate_per_million`	`$0.02`	Custom embedding cost rate

How the OCR Integration Works

DocGaurd implements a high-performance hybrid OCR gateway under the OcrDocumentAnalyzer class:

Rust-Native Gatekeeping: When a file is submitted, DocGaurd first uses its sub-millisecond Rust parsers to check the file type and structure.
- If the document is a clean digital file (e.g., text PDF, Word doc, or markdown), the text is extracted instantly, and the heavy OCR engine is completely bypassed.
- If the file is an image (.png, .jpg, .jpeg, etc.) or is flagged by the Rust quality scanner as a scanned/text-empty PDF (requires_ocr: True), the OCR engine is initialized.
Lazy Loading: To keep package imports sub-millisecond, PyTorch and EasyOCR model weights are loaded lazily on-demand only when the first scanned document or raw image is encountered.
Hardware Auto-Detection: The engine dynamically autodetects your host hardware to run deep learning models at maximum speed:
- macOS (Apple Silicon): Natively offloads tensor computations to the GPU via Metal Performance Shaders (MPS).
- Windows/Linux with GPU: Automatically targets your Nvidia GPU via CUDA.
- Fallback: Runs on optimized multi-threaded CPU.
Rust Telemetry Reconciliation: Once text is extracted via OCR, the raw text bytes are passed back into DocGaurd's Rust core using a virtual text buffer. The Rust engine then computes exact GPT token budgets (tiktoken-rs), counts words/characters, runs domain classification, and generates cost estimations—reconciling all statistics back into a single unified JSON schema.

Examples

Example 1: RAG Ingestion Security & Quality Gatekeeper

Ensure that only secure, high-quality, digital documents enter your vector database:

import json
import docgaurd

analyzer = docgaurd.DocumentAnalyzer()
report = json.loads(analyzer.analyze_file("user_upload.pdf"))

# Intercept risks at the gateway
if report["security_risk"] == "high":
    raise ValueError(f"CRITICAL: Security exception triggered for {report['file_name']}")

if report["requires_ocr"]:
    print(f"Routing {report['file_name']} to hardware-accelerated OCR pipeline.")
elif not report["rag_ready"]:
    print(f"Skipping {report['file_name']} due to low text quality score: {report['quality_score']}")
else:
    print(f"Ingesting clean document text. Context Size: {report['token_count']} tokens.")

Example 2: API Cost Budgeting & Model Window Check

Calculate API transaction costs and verify if a document fits within a model's context window:

import json
import docgaurd

analyzer = docgaurd.DocumentAnalyzer({
    "target_model": "gpt-3.5-turbo",
    "llm_input_rate_per_million": 1.50
})

report = json.loads(analyzer.analyze_file("long_transcript.txt"))

if not report["fits_context"]:
    print(f"Document exceeds target context window. Recommended chunking strategy: {report['recommended_chunking']}")
else:
    print(f"Document fits. Estimated processing cost: ${report['estimated_llm_cost']:.4f}")

Example 3: Hardware-Accelerated OCR Integration (Metal/CUDA)

Incorporate unified OCR for scanned files directly from the installed package:

import json
from docgaurd import OcrDocumentAnalyzer

# Initialize unified OcrDocumentAnalyzer (auto-routes to Apple Metal MPS or CUDA)
gateway = OcrDocumentAnalyzer()

report_json = gateway.analyze_file("scanned_receipt.jpg")
report = json.loads(report_json)

print(f"OCR Text: {report['text']}")
print(f"OCR Tokens: {report['token_count']} | RAG Ready: {report['rag_ready']}")

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.12 yanked

Jun 2, 2026

0.1.11 yanked

Jun 2, 2026

0.1.10 yanked

Jun 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docgaurd-0.1.12.tar.gz (34.2 kB view details)

Uploaded Jun 2, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

docgaurd-0.1.12-cp38-abi3-win_amd64.whl (3.0 MB view details)

Uploaded Jun 2, 2026 CPython 3.8+Windows x86-64

docgaurd-0.1.12-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB view details)

Uploaded Jun 2, 2026 CPython 3.8+manylinux: glibc 2.17+ x86-64

docgaurd-0.1.12-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.2 MB view details)

Uploaded Jun 2, 2026 CPython 3.8+manylinux: glibc 2.17+ ARM64

docgaurd-0.1.12-cp38-abi3-macosx_11_0_arm64.whl (3.1 MB view details)

Uploaded Jun 2, 2026 CPython 3.8+macOS 11.0+ ARM64

docgaurd-0.1.12-cp38-abi3-macosx_10_12_x86_64.whl (3.1 MB view details)

Uploaded Jun 2, 2026 CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file docgaurd-0.1.12.tar.gz.

File metadata

Download URL: docgaurd-0.1.12.tar.gz
Upload date: Jun 2, 2026
Size: 34.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docgaurd-0.1.12.tar.gz
Algorithm	Hash digest
SHA256	`973197489781f1a9ec2240a7d60faa47a3140825c9a79d94e42771d2596c4ce6`
MD5	`1bf23392d6deb3ae466adfe5afa7c649`
BLAKE2b-256	`70ca0aee9c52df6b9978a91d7b4ad5fe0ee6f37c0364953baadb24fef98241d2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for docgaurd-0.1.12.tar.gz:

Publisher: pypi.yml on JIVTESH28/docgaurd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: docgaurd-0.1.12.tar.gz
- Subject digest: 973197489781f1a9ec2240a7d60faa47a3140825c9a79d94e42771d2596c4ce6
- Sigstore transparency entry: 1704710625
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: JIVTESH28/docgaurd@484a5beebdb56d08a1afd471dadb710cd3e02010
- Branch / Tag: refs/tags/v0.1.12
- Owner: https://github.com/JIVTESH28
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@484a5beebdb56d08a1afd471dadb710cd3e02010
- Trigger Event: push

File details

Details for the file docgaurd-0.1.12-cp38-abi3-win_amd64.whl.

File metadata

Download URL: docgaurd-0.1.12-cp38-abi3-win_amd64.whl
Upload date: Jun 2, 2026
Size: 3.0 MB
Tags: CPython 3.8+, Windows x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docgaurd-0.1.12-cp38-abi3-win_amd64.whl
Algorithm	Hash digest
SHA256	`4fa85ae5acd847098b630b9d732da1877952da72b208c6eb509cfc821cb307a4`
MD5	`0d3e1f7027e09a43b0b38d883125f3b9`
BLAKE2b-256	`1145cd66d844e46c725a7d9ee2c103c03de237d84af004746e03f6e4c2e58f7b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for docgaurd-0.1.12-cp38-abi3-win_amd64.whl:

Publisher: pypi.yml on JIVTESH28/docgaurd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: docgaurd-0.1.12-cp38-abi3-win_amd64.whl
- Subject digest: 4fa85ae5acd847098b630b9d732da1877952da72b208c6eb509cfc821cb307a4
- Sigstore transparency entry: 1704710668
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: JIVTESH28/docgaurd@484a5beebdb56d08a1afd471dadb710cd3e02010
- Branch / Tag: refs/tags/v0.1.12
- Owner: https://github.com/JIVTESH28
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@484a5beebdb56d08a1afd471dadb710cd3e02010
- Trigger Event: push

File details

Details for the file docgaurd-0.1.12-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: docgaurd-0.1.12-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Jun 2, 2026
Size: 3.3 MB
Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docgaurd-0.1.12-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`9bc2e4752046824d92bc74f51c5c1c4e0ba807519cba6a1a8854ff5a900d17cc`
MD5	`fdeaae7e0fb983e7be05f6ebfb1cdd4a`
BLAKE2b-256	`b765dff442572cfdbb31d6fd5133f1ab9d0d329b6a9d1322bc5cc5cfe5168bf3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for docgaurd-0.1.12-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: pypi.yml on JIVTESH28/docgaurd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: docgaurd-0.1.12-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Subject digest: 9bc2e4752046824d92bc74f51c5c1c4e0ba807519cba6a1a8854ff5a900d17cc
- Sigstore transparency entry: 1704710638
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: JIVTESH28/docgaurd@484a5beebdb56d08a1afd471dadb710cd3e02010
- Branch / Tag: refs/tags/v0.1.12
- Owner: https://github.com/JIVTESH28
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@484a5beebdb56d08a1afd471dadb710cd3e02010
- Trigger Event: push

File details

Details for the file docgaurd-0.1.12-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

Download URL: docgaurd-0.1.12-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Upload date: Jun 2, 2026
Size: 3.2 MB
Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docgaurd-0.1.12-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`03a6c89a6328fad85d2e108c709d3b6a5843983caaf13912b2456a74a8398e16`
MD5	`563c7bb4b91b66f633e83efc84983d7b`
BLAKE2b-256	`2257d788a9aa9e91c94967c56f16c90d1a992d6605a201ccd28523121e208e53`

See more details on using hashes here.

Provenance

The following attestation bundles were made for docgaurd-0.1.12-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: pypi.yml on JIVTESH28/docgaurd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: docgaurd-0.1.12-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Subject digest: 03a6c89a6328fad85d2e108c709d3b6a5843983caaf13912b2456a74a8398e16
- Sigstore transparency entry: 1704710630
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: JIVTESH28/docgaurd@484a5beebdb56d08a1afd471dadb710cd3e02010
- Branch / Tag: refs/tags/v0.1.12
- Owner: https://github.com/JIVTESH28
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@484a5beebdb56d08a1afd471dadb710cd3e02010
- Trigger Event: push

File details

Details for the file docgaurd-0.1.12-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: docgaurd-0.1.12-cp38-abi3-macosx_11_0_arm64.whl
Upload date: Jun 2, 2026
Size: 3.1 MB
Tags: CPython 3.8+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docgaurd-0.1.12-cp38-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`11c0c638ff9d10c6754db1004d8fe69f7d3560ba668ca861aac0e58ccbb6c6c6`
MD5	`f3413b3a254841525c4bba3a06d2dfef`
BLAKE2b-256	`8dddd9ecc0ffc9e9a701d0c685e8c4926ce3e055ffffb93bdb2aad6bc30ec7b9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for docgaurd-0.1.12-cp38-abi3-macosx_11_0_arm64.whl:

Publisher: pypi.yml on JIVTESH28/docgaurd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: docgaurd-0.1.12-cp38-abi3-macosx_11_0_arm64.whl
- Subject digest: 11c0c638ff9d10c6754db1004d8fe69f7d3560ba668ca861aac0e58ccbb6c6c6
- Sigstore transparency entry: 1704710660
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: JIVTESH28/docgaurd@484a5beebdb56d08a1afd471dadb710cd3e02010
- Branch / Tag: refs/tags/v0.1.12
- Owner: https://github.com/JIVTESH28
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@484a5beebdb56d08a1afd471dadb710cd3e02010
- Trigger Event: push

File details

Details for the file docgaurd-0.1.12-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

Download URL: docgaurd-0.1.12-cp38-abi3-macosx_10_12_x86_64.whl
Upload date: Jun 2, 2026
Size: 3.1 MB
Tags: CPython 3.8+, macOS 10.12+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docgaurd-0.1.12-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`f935ab1e6d9dbb0f5fbff4007b484a93c4bff3d8f5f8edb669583aa6ae6f90f8`
MD5	`2eaef56752d042c630c3b09703d6cb04`
BLAKE2b-256	`6b80260f9bd32a0bc99cc4872f82aacc06079962c79f093e838dce4cc44ed79f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for docgaurd-0.1.12-cp38-abi3-macosx_10_12_x86_64.whl:

Publisher: pypi.yml on JIVTESH28/docgaurd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: docgaurd-0.1.12-cp38-abi3-macosx_10_12_x86_64.whl
- Subject digest: f935ab1e6d9dbb0f5fbff4007b484a93c4bff3d8f5f8edb669583aa6ae6f90f8
- Sigstore transparency entry: 1704710651
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: JIVTESH28/docgaurd@484a5beebdb56d08a1afd471dadb710cd3e02010
- Branch / Tag: refs/tags/v0.1.12
- Owner: https://github.com/JIVTESH28
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@484a5beebdb56d08a1afd471dadb710cd3e02010
- Trigger Event: push

docgaurd 0.1.12

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

DocGaurd (Document Intelligence Gateway)

Features • Installation • Quick Start • Python API • Telemetry Schema • Supported Formats • Examples • License

Features

Installation

From PyPI (Recommended)

From Source

Quick Start

Initialize the Analyzer

Python API Usage

Single File Ingestion (Local Disk)

In-Memory Bytes Ingestion (API Uploads)

Natively Parallel Batch Processing

Directory Ingestion (Recursive Scan)

Ultra-Fast Single-Metric Bypasses

Telemetry Output Schema

Telemetry Field Descriptions

Supported Formats

Configuration Limits

How the OCR Integration Works

Examples

Example 1: RAG Ingestion Security & Quality Gatekeeper

Example 2: API Cost Budgeting & Model Window Check

Example 3: Hardware-Accelerated OCR Integration (Metal/CUDA)

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance