Skip to main content

Schema validation pipeline for LLM-generated structured Markdown. CLI · Python API · REST API.

Project description

AI Knowledge Filler

Validation pipeline for LLM-generated structured Markdown

Tests Lint Validate PyPI Python 3.10+ Coverage License: MIT


The Problem

LLMs generate text. You need structured, schema-compliant files.

Without a validation layer, AI-generated Markdown produces:

Error Raw LLM output What you need
Enum violation level: expert beginner | intermediate | advanced
Domain violation domain: Technology domain: system-design
Type mismatch tags: security tags: [security, api, auth]
Date format created: 12-02-2026 created: 2026-02-12

One file? Fixable manually. A hundred files? The schema collapses.

AKF enforces the contract at generation time, not review time.


How It Works

Prompt
  → LLM                  (only non-deterministic component)
  → Validation Engine    (binary: VALID or INVALID + typed E-codes)
  → Error Normalizer     (deterministic repair instructions from E-codes)
  → Retry Controller     (max 3 attempts — aborts on identical failure hash)
  → Commit Gate          (atomic write — only VALID output reaches disk)

No silent failures. No partial commits. No guessing.

Retry = ontology signal. When a domain triggers elevated retries, the taxonomy has a boundary problem — not the model. Telemetry captures this.


Quick Start

pip install ai-knowledge-filler

export GROQ_API_KEY="gsk_..."   # free tier, fastest

# Generate new file
akf generate "Create a Docker networking guide"
# → Docker_Networking_Guide.md (validated, schema-compliant)

# Enrich existing files — add YAML to files that have none
akf enrich docs/

# Validate an entire directory
akf validate --path docs/

AKF Documents Itself

This repo uses AKF to validate its own documentation on every PR.

Setup:

# 1. Define your taxonomy
cat akf.yaml
schema_version: "1.0.0"
vault_path: "./docs"
taxonomy:
  domains:
    - akf-core
    - akf-docs
    - akf-ops
    - akf-spec
# 2. Enrich existing docs — AKF adds frontmatter via LLM
akf enrich docs/ --model groq

# 3. Validate
akf validate --path docs/
# ✅ docs/cli-reference.md
# ✅ docs/user-guide.md
# → Total: 2 | OK: 2 | Errors: 0

CI gate (.github/workflows/validate.yml):

- name: Validate docs/
  run: akf validate --path docs/

Every PR that introduces invalid metadata fails the check. The Validate badge above is AKF validating AKF's own docs.


akf enrich

Add YAML frontmatter to existing Markdown files — bulk or single.

akf enrich docs/                    # enrich all .md files
akf enrich docs/ --dry-run          # preview only, no writes
akf enrich docs/ --force            # overwrite valid frontmatter
akf enrich docs/ --output enriched/ # copy to output dir
File state Default --force
No frontmatter Generate + validate + write Same
Incomplete frontmatter Fill missing fields only Regenerate all
Valid frontmatter Skip Regenerate all
Empty file Skip with warning Skip

Enrich runs through the same validation pipeline as generate — retry loop, commit gate, telemetry.


Python API

from akf import Pipeline

pipeline = Pipeline(output="./vault/", model="groq")

# Generate new file
result = pipeline.generate("Create API rate limiting guide")
print(result.success)        # True
print(result.path)           # PosixPath('vault/API_Rate_Limiting_Guide.md')
print(result.attempts)       # 1 (retried if schema violation)

# Enrich existing file
result = pipeline.enrich("docs/old-note.md")
print(result.status)         # "enriched" | "skipped" | "failed"

# Enrich directory
results = pipeline.enrich_dir("docs/")

# Batch generate
results = pipeline.batch_generate([
    "Docker deployment best practices",
    "Kubernetes security hardening",
    "API authentication strategies",
])

# Validate
v = pipeline.validate("vault/my_file.md")
print(v.valid, v.errors)

REST API

akf serve --port 8000

curl -X POST http://localhost:8000/v1/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Create Docker security checklist", "model": "groq"}'

curl -X POST http://localhost:8000/v1/batch \
  -H "Content-Type: application/json" \
  -d '{"prompts": ["Docker guide", "Kubernetes guide"]}'

curl -X POST http://localhost:8000/v1/validate \
  -H "Content-Type: application/json" \
  -d '{"content": "---\ntitle: Test\n..."}'

Endpoints: POST /v1/generate · POST /v1/enrich · POST /v1/validate · POST /v1/batch · GET /v1/models · GET /health

Swagger UI: http://localhost:8000/docs


What Every Committed File Guarantees

  • Required fields: title, type, domain, level, status, tags, created, updated
  • Valid enums: type, level, status from controlled sets
  • Domain from configured taxonomy (akf.yaml) — not hardcoded
  • ISO 8601 dates with created ≤ updated
  • tags as array (≥3), title as string

Error Codes

Code Field Meaning
E001 type / level / status Invalid enum value
E002 any Required field missing
E003 created / updated Date not ISO 8601
E004 title / tags Type mismatch
E005 frontmatter General schema violation
E006 domain Not in taxonomy
E007 created / updated created > updated

Configuration

# akf.yaml
schema_version: "1.0.0"
vault_path: "./vault"

taxonomy:
  domains:
    - ai-system
    - api-design
    - devops
    - security
    - system-design
    # add your own

enums:
  type: [concept, guide, reference, checklist, project, roadmap, template, audit]
  level: [beginner, intermediate, advanced]
  status: [draft, active, completed, archived]
akf init          # creates akf.yaml in current directory
akf init --force  # overwrite existing

CLI Reference

# Generate
akf generate "prompt" [--model groq|claude|gemini|gpt4|ollama] [--output PATH]

# Enrich
akf enrich PATH [--dry-run] [--force] [--model MODEL] [--output DIR]

# Validate
akf validate [--file FILE] [--path PATH] [--strict]

# Server
akf serve [--host HOST] [--port PORT]

# Models / Init
akf models
akf init [--path DIR] [--force]

Model Selection

Model Key Speed Cost Notes
Groq GROQ_API_KEY Free tier Recommended for CI, high volume
Claude ANTHROPIC_API_KEY Medium $$$ Technical docs, architecture
Gemini GOOGLE_API_KEY Fast $ Quick drafts
GPT-4 OPENAI_API_KEY Medium $$ General purpose
Grok XAI_API_KEY Fast $$ General purpose
Ollama Fast Free Local / offline / private

Auto-selection order: Groq → Grok → Claude → Gemini → GPT-4 → Ollama.


Telemetry

Each generation appends a structured event to telemetry/events.jsonl:

{
  "generation_id": "uuid-v4",
  "document_id": "abc123",
  "schema_version": "1.0.0",
  "attempt": 1,
  "converged": true,
  "timestamp": "2026-02-27T14:22:01Z",
  "model": "groq",
  "temperature": 0
}

Append-only. Never influences the pipeline at runtime.


Security

export AKF_API_KEY="your-secret"          # optional — unset = dev mode
export AKF_CORS_ORIGINS="https://app.com"

Rate limits: POST /v1/generate 10/min · POST /v1/validate 30/min · POST /v1/batch 3/min


Quality

  • 542 tests, 93.74% coverage
  • CI green on Python 3.10 / 3.11 / 3.12
  • Type hints: 100%
  • Pylint: 9.55/10

Roadmap

Shipped

  • akf generate, akf enrich, akf validate, akf serve, akf init
  • Validation pipeline — E001–E007, retry loop, commit gate
  • Telemetry — append-only JSONL, ontology friction metrics
  • Config layer — external akf.yaml, no code changes for taxonomy
  • Pipeline API — from akf import Pipeline
  • REST API — FastAPI, rate limiting, optional auth
  • Self-documentation — AKF validates its own docs/ on every PR

Planned

  • akf generate --batch topics.txt
  • Graph extraction layer
  • n8n / Make integration templates

Documentation


License

MIT — Free for commercial and personal use.


PyPI: https://pypi.org/project/ai-knowledge-filler/ | Version: 0.5.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_knowledge_filler-0.6.1.tar.gz (73.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_knowledge_filler-0.6.1-py3-none-any.whl (56.4 kB view details)

Uploaded Python 3

File details

Details for the file ai_knowledge_filler-0.6.1.tar.gz.

File metadata

  • Download URL: ai_knowledge_filler-0.6.1.tar.gz
  • Upload date:
  • Size: 73.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for ai_knowledge_filler-0.6.1.tar.gz
Algorithm Hash digest
SHA256 2957fcece4e55edca24c5f63bd9741c9e8831c89e39f3f173f9784d3c58bae08
MD5 5a4894dddd55acc32e5f11e81c20d244
BLAKE2b-256 f69a9ec6fd6a325436850c03b8442da153b046ffa16a96328181a5c14abb4914

See more details on using hashes here.

File details

Details for the file ai_knowledge_filler-0.6.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ai_knowledge_filler-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ed9df08e7cd757db88a55a530a1883e2a039cb3e3e1c3d6496bb082e4d3eb2f0
MD5 37f6c36a420240a3536e4761de29f539
BLAKE2b-256 90df3070ff11c0281751a54f36dedf2dfa17a3ed00826b96b471992fb5bd8b95

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page