Skip to main content

Schema validation pipeline for LLM-generated structured Markdown. CLI · Python API · REST API.

Project description

AI Knowledge Filler

Validation pipeline for LLM-generated structured Markdown

Tests Lint Validate PyPI Python 3.10+ Coverage License: MIT


The Problem

LLMs generate text. You need structured, schema-compliant files.

Without a validation layer, AI-generated Markdown produces:

Error Raw LLM output What you need
Enum violation level: expert beginner | intermediate | advanced
Domain violation domain: Technology domain: system-design
Type mismatch tags: security tags: [security, api, auth]
Date format created: 12-02-2026 created: 2026-02-12

One file? Fixable manually. A hundred files? The schema collapses.

AKF enforces the contract at generation time, not review time.


How It Works

Prompt
  → LLM                  (only non-deterministic component)
  → Validation Engine    (binary: VALID or INVALID + typed E-codes)
  → Error Normalizer     (deterministic repair instructions from E-codes)
  → Retry Controller     (max 3 attempts — aborts on identical failure hash)
  → Commit Gate          (atomic write — only VALID output reaches disk)

No silent failures. No partial commits. No guessing.

Retry = ontology signal. When a domain triggers elevated retries, the taxonomy has a boundary problem — not the model. Telemetry captures this.


Quick Start

pip install ai-knowledge-filler

export GROQ_API_KEY="gsk_..."   # free tier, fastest

# Generate new file
akf generate "Create a Docker networking guide"
# → Docker_Networking_Guide.md (validated, schema-compliant)

# Enrich existing files — add YAML to files that have none
akf enrich docs/

# Validate an entire directory
akf validate --path docs/

AKF Documents Itself

This repo uses AKF to validate its own documentation on every PR.

Setup:

# 1. Define your taxonomy
cat akf.yaml
schema_version: "1.0.0"
vault_path: "./docs"
taxonomy:
  domains:
    - akf-core
    - akf-docs
    - akf-ops
    - akf-spec
# 2. Enrich existing docs — AKF adds frontmatter via LLM
akf enrich docs/ --model groq

# 3. Validate
akf validate --path docs/
# ✅ docs/cli-reference.md
# ✅ docs/user-guide.md
# → Total: 2 | OK: 2 | Errors: 0

CI gate (.github/workflows/validate.yml):

- name: Validate docs/
  run: akf validate --path docs/

Every PR that introduces invalid metadata fails the check. The Validate badge above is AKF validating AKF's own docs.


akf enrich

Add YAML frontmatter to existing Markdown files — bulk or single.

akf enrich docs/                    # enrich all .md files
akf enrich docs/ --dry-run          # preview only, no writes
akf enrich docs/ --force            # overwrite valid frontmatter
akf enrich docs/ --output enriched/ # copy to output dir
File state Default --force
No frontmatter Generate + validate + write Same
Incomplete frontmatter Fill missing fields only Regenerate all
Valid frontmatter Skip Regenerate all
Empty file Skip with warning Skip

Enrich runs through the same validation pipeline as generate — retry loop, commit gate, telemetry.


Python API

from akf import Pipeline

pipeline = Pipeline(output="./vault/", model="groq")

# Generate new file
result = pipeline.generate("Create API rate limiting guide")
print(result.success)        # True
print(result.path)           # PosixPath('vault/API_Rate_Limiting_Guide.md')
print(result.attempts)       # 1 (retried if schema violation)

# Enrich existing file
result = pipeline.enrich("docs/old-note.md")
print(result.status)         # "enriched" | "skipped" | "failed"

# Enrich directory
results = pipeline.enrich_dir("docs/")

# Batch generate
results = pipeline.batch_generate([
    "Docker deployment best practices",
    "Kubernetes security hardening",
    "API authentication strategies",
])

# Validate
v = pipeline.validate("vault/my_file.md")
print(v.valid, v.errors)

REST API

akf serve --port 8000

curl -X POST http://localhost:8000/v1/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Create Docker security checklist", "model": "groq"}'

curl -X POST http://localhost:8000/v1/batch \
  -H "Content-Type: application/json" \
  -d '{"prompts": ["Docker guide", "Kubernetes guide"]}'

curl -X POST http://localhost:8000/v1/validate \
  -H "Content-Type: application/json" \
  -d '{"content": "---\ntitle: Test\n..."}'

Endpoints: POST /v1/generate · POST /v1/enrich · POST /v1/validate · POST /v1/batch · GET /v1/models · GET /health

Swagger UI: http://localhost:8000/docs


What Every Committed File Guarantees

  • Required fields: title, type, domain, level, status, tags, created, updated
  • Valid enums: type, level, status from controlled sets
  • Domain from configured taxonomy (akf.yaml) — not hardcoded
  • ISO 8601 dates with created ≤ updated
  • tags as array (≥3), title as string

Error Codes

Code Field Meaning
E001 type / level / status Invalid enum value
E002 any Required field missing
E003 created / updated Date not ISO 8601
E004 title / tags Type mismatch
E005 frontmatter General schema violation
E006 domain Not in taxonomy
E007 created / updated created > updated

Configuration

# akf.yaml
schema_version: "1.0.0"
vault_path: "./vault"

taxonomy:
  domains:
    - ai-system
    - api-design
    - devops
    - security
    - system-design
    # add your own

enums:
  type: [concept, guide, reference, checklist, project, roadmap, template, audit]
  level: [beginner, intermediate, advanced]
  status: [draft, active, completed, archived]
akf init          # creates akf.yaml in current directory
akf init --force  # overwrite existing

CLI Reference

# Generate
akf generate "prompt" [--model groq|claude|gemini|gpt4|ollama] [--output PATH]

# Enrich
akf enrich PATH [--dry-run] [--force] [--model MODEL] [--output DIR]

# Validate
akf validate [--file FILE] [--path PATH] [--strict]

# Server
akf serve [--host HOST] [--port PORT]

# Models / Init
akf models
akf init [--path DIR] [--force]

Model Selection

Model Key Speed Cost Notes
Groq GROQ_API_KEY Free tier Recommended for CI, high volume
Claude ANTHROPIC_API_KEY Medium $$$ Technical docs, architecture
Gemini GOOGLE_API_KEY Fast $ Quick drafts
GPT-4 OPENAI_API_KEY Medium $$ General purpose
Grok XAI_API_KEY Fast $$ General purpose
Ollama Fast Free Local / offline / private

Auto-selection order: Groq → Grok → Claude → Gemini → GPT-4 → Ollama.


Telemetry

Each generation appends a structured event to telemetry/events.jsonl:

{
  "generation_id": "uuid-v4",
  "document_id": "abc123",
  "schema_version": "1.0.0",
  "attempt": 1,
  "converged": true,
  "timestamp": "2026-02-27T14:22:01Z",
  "model": "groq",
  "temperature": 0
}

Append-only. Never influences the pipeline at runtime.


Security

export AKF_API_KEY="your-secret"          # optional — unset = dev mode
export AKF_CORS_ORIGINS="https://app.com"

Rate limits: POST /v1/generate 10/min · POST /v1/validate 30/min · POST /v1/batch 3/min


Quality

  • 542 tests, 93.74% coverage
  • CI green on Python 3.10 / 3.11 / 3.12
  • Type hints: 100%
  • Pylint: 9.55/10

Roadmap

Shipped

  • akf generate, akf enrich, akf validate, akf serve, akf init
  • Validation pipeline — E001–E007, retry loop, commit gate
  • Telemetry — append-only JSONL, ontology friction metrics
  • Config layer — external akf.yaml, no code changes for taxonomy
  • Pipeline API — from akf import Pipeline
  • REST API — FastAPI, rate limiting, optional auth
  • Self-documentation — AKF validates its own docs/ on every PR

Planned

  • akf generate --batch topics.txt
  • Graph extraction layer
  • n8n / Make integration templates

Documentation


License

MIT — Free for commercial and personal use.


PyPI: https://pypi.org/project/ai-knowledge-filler/ | Version: 0.5.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_knowledge_filler-0.5.2.tar.gz (68.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_knowledge_filler-0.5.2-py3-none-any.whl (53.3 kB view details)

Uploaded Python 3

File details

Details for the file ai_knowledge_filler-0.5.2.tar.gz.

File metadata

  • Download URL: ai_knowledge_filler-0.5.2.tar.gz
  • Upload date:
  • Size: 68.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_knowledge_filler-0.5.2.tar.gz
Algorithm Hash digest
SHA256 306dfa403f66ae5c28dfe0946374094d49f6a03c2d06eecceda250a5d4999275
MD5 7dfdd553c2778a72235e9fcf4c01b79c
BLAKE2b-256 70215699672a615385a7407d084791ee8615540fa9bdaf91596a5f70144776c1

See more details on using hashes here.

File details

Details for the file ai_knowledge_filler-0.5.2-py3-none-any.whl.

File metadata

File hashes

Hashes for ai_knowledge_filler-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0d66f3758de238eee7cfc6d6b05a7adba1b5aa3fecc472ccf64af3e83d491c20
MD5 7d40aeb422501b6ac837dd9f7a5f76d1
BLAKE2b-256 03b09d2dda29d5eaab6dd9a8b51c422cf2748a01c95016fab67918e2d0cd7ec8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page