Skip to main content

Schema validation pipeline for LLM-generated structured Markdown. CLI · Python API · REST API.

Project description

AI Knowledge Filler

Validation pipeline for LLM-generated structured Markdown

Tests Lint Validate PyPI Python 3.10+ Coverage License: MIT


The Problem

LLMs generate text. You need structured, schema-compliant files.

Without a validation layer, AI-generated Markdown produces:

Error Raw LLM output What you need
Enum violation level: expert beginner | intermediate | advanced
Domain violation domain: Technology domain: system-design
Type mismatch tags: security tags: [security, api, auth]
Date format created: 12-02-2026 created: 2026-02-12

One file? Fixable manually. A hundred files? The schema collapses.

AKF enforces the contract at generation time, not review time.


How It Works

Prompt
  → LLM                  (only non-deterministic component)
  → Validation Engine    (binary: VALID or INVALID + typed E-codes)
  → Error Normalizer     (deterministic repair instructions from E-codes)
  → Retry Controller     (max 3 attempts — aborts on identical failure hash)
  → Commit Gate          (atomic write — only VALID output reaches disk)

No silent failures. No partial commits. No guessing.

Retry = ontology signal. When a domain triggers elevated retries, the taxonomy has a boundary problem — not the model. Telemetry captures this.


Quick Start

pip install ai-knowledge-filler

export GROQ_API_KEY="gsk_..."   # free tier, fastest

# Generate new file
akf generate "Create a Docker networking guide"
# → Docker_Networking_Guide.md (validated, schema-compliant)

# Enrich existing files — add YAML to files that have none
akf enrich docs/

# Validate an entire directory
akf validate --path docs/

AKF Documents Itself

This repo uses AKF to validate its own documentation on every PR.

Setup:

# 1. Define your taxonomy
cat akf.yaml
schema_version: "1.0.0"
vault_path: "./docs"
taxonomy:
  domains:
    - akf-core
    - akf-docs
    - akf-ops
    - akf-spec
# 2. Enrich existing docs — AKF adds frontmatter via LLM
akf enrich docs/ --model groq

# 3. Validate
akf validate --path docs/
# ✅ docs/cli-reference.md
# ✅ docs/user-guide.md
# → Total: 2 | OK: 2 | Errors: 0

CI gate (.github/workflows/validate.yml):

- name: Validate docs/
  run: akf validate --path docs/

Every PR that introduces invalid metadata fails the check. The Validate badge above is AKF validating AKF's own docs.


akf enrich

Add YAML frontmatter to existing Markdown files — bulk or single.

akf enrich docs/                    # enrich all .md files
akf enrich docs/ --dry-run          # preview only, no writes
akf enrich docs/ --force            # overwrite valid frontmatter
akf enrich docs/ --output enriched/ # copy to output dir
File state Default --force
No frontmatter Generate + validate + write Same
Incomplete frontmatter Fill missing fields only Regenerate all
Valid frontmatter Skip Regenerate all
Empty file Skip with warning Skip

Enrich runs through the same validation pipeline as generate — retry loop, commit gate, telemetry.


Python API

from akf import Pipeline

pipeline = Pipeline(output="./vault/", model="groq")

# Generate new file
result = pipeline.generate("Create API rate limiting guide")
print(result.success)        # True
print(result.path)           # PosixPath('vault/API_Rate_Limiting_Guide.md')
print(result.attempts)       # 1 (retried if schema violation)

# Enrich existing file
result = pipeline.enrich("docs/old-note.md")
print(result.status)         # "enriched" | "skipped" | "failed"

# Enrich directory
results = pipeline.enrich_dir("docs/")

# Batch generate
results = pipeline.batch_generate([
    "Docker deployment best practices",
    "Kubernetes security hardening",
    "API authentication strategies",
])

# Validate
v = pipeline.validate("vault/my_file.md")
print(v.valid, v.errors)

REST API

akf serve --port 8000

curl -X POST http://localhost:8000/v1/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Create Docker security checklist", "model": "groq"}'

curl -X POST http://localhost:8000/v1/batch \
  -H "Content-Type: application/json" \
  -d '{"prompts": ["Docker guide", "Kubernetes guide"]}'

curl -X POST http://localhost:8000/v1/validate \
  -H "Content-Type: application/json" \
  -d '{"content": "---\ntitle: Test\n..."}'

Endpoints: POST /v1/generate · POST /v1/enrich · POST /v1/validate · POST /v1/batch · GET /v1/models · GET /health

Swagger UI: http://localhost:8000/docs


What Every Committed File Guarantees

  • Required fields: title, type, domain, level, status, tags, created, updated
  • Valid enums: type, level, status from controlled sets
  • Domain from configured taxonomy (akf.yaml) — not hardcoded
  • ISO 8601 dates with created ≤ updated
  • tags as array (≥3), title as string

Error Codes

Code Field Meaning
E001 type / level / status Invalid enum value
E002 any Required field missing
E003 created / updated Date not ISO 8601
E004 title / tags Type mismatch
E005 frontmatter General schema violation
E006 domain Not in taxonomy
E007 created / updated created > updated

Configuration

# akf.yaml
schema_version: "1.0.0"
vault_path: "./vault"

taxonomy:
  domains:
    - ai-system
    - api-design
    - devops
    - security
    - system-design
    # add your own

enums:
  type: [concept, guide, reference, checklist, project, roadmap, template, audit]
  level: [beginner, intermediate, advanced]
  status: [draft, active, completed, archived]
akf init          # creates akf.yaml in current directory
akf init --force  # overwrite existing

CLI Reference

# Generate
akf generate "prompt" [--model groq|claude|gemini|gpt4|ollama] [--output PATH]

# Enrich
akf enrich PATH [--dry-run] [--force] [--model MODEL] [--output DIR]

# Validate
akf validate [--file FILE] [--path PATH] [--strict]

# Server
akf serve [--host HOST] [--port PORT]

# Models / Init
akf models
akf init [--path DIR] [--force]

Model Selection

Model Key Speed Cost Notes
Groq GROQ_API_KEY Free tier Recommended for CI, high volume
Claude ANTHROPIC_API_KEY Medium $$$ Technical docs, architecture
Gemini GOOGLE_API_KEY Fast $ Quick drafts
GPT-4 OPENAI_API_KEY Medium $$ General purpose
Grok XAI_API_KEY Fast $$ General purpose
Ollama Fast Free Local / offline / private

Auto-selection order: Groq → Grok → Claude → Gemini → GPT-4 → Ollama.


Telemetry

Each generation appends a structured event to telemetry/events.jsonl:

{
  "generation_id": "uuid-v4",
  "document_id": "abc123",
  "schema_version": "1.0.0",
  "attempt": 1,
  "converged": true,
  "timestamp": "2026-02-27T14:22:01Z",
  "model": "groq",
  "temperature": 0
}

Append-only. Never influences the pipeline at runtime.


Security

export AKF_API_KEY="your-secret"          # optional — unset = dev mode
export AKF_CORS_ORIGINS="https://app.com"

Rate limits: POST /v1/generate 10/min · POST /v1/validate 30/min · POST /v1/batch 3/min


Quality

  • 542 tests, 93.74% coverage
  • CI green on Python 3.10 / 3.11 / 3.12
  • Type hints: 100%
  • Pylint: 9.55/10

Roadmap

Shipped

  • akf generate, akf enrich, akf validate, akf serve, akf init
  • Validation pipeline — E001–E007, retry loop, commit gate
  • Telemetry — append-only JSONL, ontology friction metrics
  • Config layer — external akf.yaml, no code changes for taxonomy
  • Pipeline API — from akf import Pipeline
  • REST API — FastAPI, rate limiting, optional auth
  • Self-documentation — AKF validates its own docs/ on every PR

Planned

  • akf generate --batch topics.txt
  • Graph extraction layer
  • n8n / Make integration templates

Documentation


License

MIT — Free for commercial and personal use.


PyPI: https://pypi.org/project/ai-knowledge-filler/ | Version: 0.5.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_knowledge_filler-0.5.4.tar.gz (69.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_knowledge_filler-0.5.4-py3-none-any.whl (54.3 kB view details)

Uploaded Python 3

File details

Details for the file ai_knowledge_filler-0.5.4.tar.gz.

File metadata

  • Download URL: ai_knowledge_filler-0.5.4.tar.gz
  • Upload date:
  • Size: 69.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for ai_knowledge_filler-0.5.4.tar.gz
Algorithm Hash digest
SHA256 286d94b56e6ab65e69e59efc5c4a5729b919374b61573306050a6c433cd7bebd
MD5 97308f87695f45c302dd6026f45fd775
BLAKE2b-256 927b5f1df0dde3f414f9b0af386ff61bbc04bee4daeaf8fa7a4f087c7f696946

See more details on using hashes here.

File details

Details for the file ai_knowledge_filler-0.5.4-py3-none-any.whl.

File metadata

File hashes

Hashes for ai_knowledge_filler-0.5.4-py3-none-any.whl
Algorithm Hash digest
SHA256 02e422c87b8491dfb543308e45be6ed728f203e7941c0884607c04019345fd09
MD5 8bb4258a07c2ac9b0575e221c42d7a1b
BLAKE2b-256 f144fccc336de1951207d1493ace5341a15c33c9025ead0260e5b261d0291b58

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page