Skip to main content

Generate OKF v0.1 knowledge bundles from codebases — Claude skill + OpenCode integration

Project description

okf-generator banner

PyPI version Python Tests License: MIT OKF v0.1 Claude Skill PRs Welcome

Index any codebase into a structured OKF v0.1 knowledge bundle — then look up exact concepts for AI agents like OpenCode.

Installation · Quick Start · CLI Reference · OpenCode Integration · Contributing


What is this?

okf-generator converts your source code into an Open Knowledge Format (OKF) v0.1 knowledge bundle — structured markdown files that AI agents can read, search, and reason over.

Instead of giving an AI your entire codebase, you give it exactly the concept it needs:

# Before touching WorldBankConnector, look it up
okf lookup WorldBankConnector

# CLASS: WorldBankConnector
# Source      : StockAI/RnD/python/connectors/economic_data.py  line 51
# Description : Fetches World Bank development indicators via wbdata API.
# Methods     : get_indicator, search
# Signature   : class WorldBankConnector

Features

  • 6 languages — Python (stdlib AST), JS/TS/Go/Java/Rust/Ruby (tree-sitter)
  • Zero LLM required for extraction — deterministic, fast, offline-capable
  • OKF v0.1 conformant — type, description, resource, tags, timestamp
  • Domain/resource-path layout — bundle mirrors your source tree exactly
  • Resumable LLM enrichment — enrich descriptions with any OpenAI-compat endpoint; safe to interrupt and rerun
  • OpenCode integrationAGENTS.md + custom commands for pinpoint context injection
  • Training data pipeline — convert bundle to JSONL pairs (codegen, QA, doc, summarize, crosslink)
  • Claude Skill — install SKILL.md to trigger the full pipeline from natural language

Installation

# Core (extraction only — no LLM required)
pip install okf-generator

# With LLM enrichment + training pair generation
pip install okf-generator[llm]

Requirements: Python 3.11+

Quick Start

# 1. Generate a knowledge bundle from your codebase
okf generate ./my_project ./okf_bundle

# 2. Look up a concept (works instantly, zero LLM)
okf lookup WorldBankConnector

# 3. Find all concepts from one file
okf lookup --file src/connectors/economic_data.py

# 4. Generate training pairs from the bundle
okf pairs ./okf_bundle ./train.jsonl

# 5. Regenerate SUMMARY.md after enrichment
okf summarize ./okf_bundle

Bundle Layout

The output mirrors your source tree — not flat buckets:

okf_bundle/
├── SUMMARY.md                        ← bird's-eye view for AI agents
├── index.md                          ← root navigation
├── log.md                            ← generation history
└── StockAI/
    └── RnD/
        └── python/
            └── connectors/
                ├── index.md          ← lists all concepts in this folder
                ├── economic_data.md  ← Module concept
                └── economic_data/
                    ├── WorldBankConnector.md   ← Class
                    ├── get_indicator.md        ← Function
                    └── search.md               ← Function

Each file is OKF v0.1 conformant:

---
type: Class
title: WorldBankConnector
description: Fetches World Bank development indicators via wbdata API.
resource: StockAI/RnD/python/connectors/economic_data.py
tags:
  - lang:python
  - type:Class
  - module:StockAI
  - domain:RnD
  - git:branch:main
  - git:repo:TrainLLMs
timestamp: '2026-05-23T09:01:21Z'
---

# WorldBankConnector

...signature, docstring, params, returns, methods, related concepts...

CLI Reference

okf generate

okf generate <source_dir> [output_dir]

Options:
  --summarize <bundle_dir>   Regenerate SUMMARY.md only (no re-scan)

Environment variables (LLM enrichment):
  OKF_ENRICH=1               Enable LLM enrichment
  OKF_BASE_URL               OpenAI-compat base URL (default: https://api.anthropic.com/v1)
  OKF_API_KEY                API key
  OKF_MODEL                  Model name (default: claude-sonnet-4-6)
  OKF_MAX_WORKERS            Parallel workers (default: 2)

okf lookup

okf lookup [query] [options]

Options:
  --bundle PATH     Bundle directory (default: ./okf_bundle)
  --file PATH       Filter by source file
  --type TYPE       Filter by concept type: Function | Class | Module
  --tag TAG         Filter by tag, repeatable: --tag lang:python
  --limit N         Max results (default: 10)
  --compact         One-line output per result
  --json            JSON output for programmatic use
  --full            Raw .md file content
  --min-score N     Minimum relevance score 0-1 (default: 0.1)

okf pairs

okf pairs <bundle_dir> [output_file]

Environment variables:
  SKIP_SYNTH=1          Static pairs only (no LLM)
  SYNTH_BASE_URL        API endpoint
  SYNTH_API_KEY         API key
  SYNTH_MODEL           Model name
  MAX_WORKERS           Parallel workers (default: 3)
  QA_PER_CONCEPT        Q&A pairs per concept (default: 3)
  PAIR_TYPES            Comma-separated: codegen,qa,doc,summarize,crosslink

Supported Languages

Language Parser Extracts
Python stdlib ast Functions, classes, params, return types, docstrings
JavaScript / TypeScript tree-sitter Functions, arrow fns, classes, JSDoc
Go tree-sitter Funcs, methods, structs, interfaces, GoDoc
Java tree-sitter Classes, methods, constructors, Javadoc
Rust tree-sitter Fns, structs, enums, traits, impl blocks, ///
Ruby tree-sitter Defs, classes, modules, # comments

LLM Enrichment

Works with any OpenAI-compatible endpoint — Claude, Ollama, llama.cpp, etc:

# Using a local llama.cpp server
OKF_ENRICH=1 \
OKF_BASE_URL="http://localhost:8080/v1" \
OKF_API_KEY="llamabarn" \
OKF_MODEL="ggml-org/gemma-3-4b-it-qat-GGUF:Q4_0" \
OKF_MAX_WORKERS=2 \
okf generate ./my_project ./okf_bundle

Enrichment is resumable — interrupt and rerun freely. Already-enriched concepts are skipped.

OpenCode Integration

# 1. Tell OpenCode about the bundle (auto-loaded every session)
cat >> AGENTS.md << 'EOF'
## OKF Knowledge Bundle
Before working on any class or function, look it up:
  okf lookup --bundle ./okf_bundle <ConceptName>
EOF

# 2. Add a custom command
mkdir -p .opencode/commands
echo "RUN okf lookup --bundle ./okf_bundle \$NAME" > .opencode/commands/lookup.md

Then in OpenCode: /lookup NAME=WorldBankConnector

See docs/opencode-integration.md for full setup.

Python API

from okf.generator import scan_codebase, write_bundle, write_summary
from okf.lookup import load_bundle, search

# Generate bundle
concepts = scan_codebase("./my_project")
write_bundle(concepts, "./okf_bundle", "my_project", ["initial generation"])
write_summary("my_project", concepts, "./okf_bundle", {})

# Search concepts
bundle = load_bundle("./okf_bundle")
results = search(bundle, tokens=["WorldBankConnector"])
print(results[0]["description"])

Training Data

Convert your OKF bundle into JSONL training pairs for fine-tuning:

# 5 pair types: codegen, qa, doc, summarize, crosslink
okf pairs ./okf_bundle ./train.jsonl

Each pair is in chat format compatible with most fine-tuning pipelines.

Claude Skill

Install SKILL.md to trigger the full pipeline from natural language inside Claude:

"Index my codebase" → generates OKF bundle
"Look up WorldBankConnector" → returns exact concept
"Generate training pairs from my bundle" → outputs JSONL

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

git clone https://github.com/umairbaig/okf-generator
cd okf-generator
pip install -e ".[dev]"
pytest tests/

Good first issues: adding a new language parser, improving fuzzy search scoring, adding a CHANGELOG.

License

MIT — Copyright © 2026 Umair Baig

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

okf_generator-0.1.2.tar.gz (35.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

okf_generator-0.1.2-py3-none-any.whl (33.5 kB view details)

Uploaded Python 3

File details

Details for the file okf_generator-0.1.2.tar.gz.

File metadata

  • Download URL: okf_generator-0.1.2.tar.gz
  • Upload date:
  • Size: 35.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for okf_generator-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b16502a36cf62d870cee369f13a3b38c2c2df5d90118fa679151ba52201071ef
MD5 2e1902ee1c9cb8eda7093b18297eaf2c
BLAKE2b-256 92a2d8945457ea589d661e11b2f8cc4727116112786a7fb96787b4f05624e5f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for okf_generator-0.1.2.tar.gz:

Publisher: publish.yml on UmairBaig8/okf-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file okf_generator-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: okf_generator-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 33.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for okf_generator-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8b459393ae85e1bec6fb9436170e0d3405767560dffc2f6529616f194245a9d9
MD5 a5008be112888fe40f3e2beabdeafc05
BLAKE2b-256 c0c2d8851c9ec1728afbae945b0abcf887e6619f9bde3131116cf4acf90e75fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for okf_generator-0.1.2-py3-none-any.whl:

Publisher: publish.yml on UmairBaig8/okf-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page