Generate OKF v0.1 knowledge bundles from codebases — Claude skill + OpenCode integration
Project description
Index any codebase into a structured OKF v0.1 knowledge bundle — then look up exact concepts for AI agents like OpenCode.
Installation · Quick Start · CLI Reference · OpenCode Integration · Contributing
What is this?
okf-generator converts your source code into an Open Knowledge Format (OKF) v0.1 knowledge bundle — structured markdown files that AI agents can read, search, and reason over.
Instead of giving an AI your entire codebase, you give it exactly the concept it needs:
# Before touching WorldBankConnector, look it up
okf lookup WorldBankConnector
# CLASS: WorldBankConnector
# Source : StockAI/RnD/python/connectors/economic_data.py line 51
# Description : Fetches World Bank development indicators via wbdata API.
# Methods : get_indicator, search
# Signature : class WorldBankConnector
Features
- 6 languages — Python (stdlib AST), JS/TS/Go/Java/Rust/Ruby (tree-sitter)
- Zero LLM required for extraction — deterministic, fast, offline-capable
- OKF v0.1 conformant — type, description, resource, tags, timestamp
- Domain/resource-path layout — bundle mirrors your source tree exactly
- Resumable LLM enrichment — enrich descriptions with any OpenAI-compat endpoint; safe to interrupt and rerun
- OpenCode integration —
AGENTS.md+ custom commands for pinpoint context injection - Training data pipeline — convert bundle to JSONL pairs (codegen, QA, doc, summarize, crosslink)
- Claude Skill — install
SKILL.mdto trigger the full pipeline from natural language
Installation
One-liner — paste into any terminal:
curl -fsSL https://raw.githubusercontent.com/UmairBaig8/okf-generator/main/scripts/install.sh | bash
This installs okf-generator[llm] + the Claude Code skill in one shot.
Requirements: Python 3.11+ with pip.
Or manually:
# Core (extraction only — no LLM required)
pip install okf-generator
# With LLM enrichment + training pair generation
pip install "okf-generator[llm]"
Quick Start
# 1. Generate a knowledge bundle from your codebase
okf generate ./my_project ./okf_bundle
# 2. Look up a concept (works instantly, zero LLM)
okf lookup WorldBankConnector
# 3. Find all concepts from one file
okf lookup --file src/connectors/economic_data.py
# 4. Generate training pairs from the bundle
okf pairs ./okf_bundle ./train.jsonl
# 5. Regenerate SUMMARY.md after enrichment
okf summarize ./okf_bundle
Bundle Layout
The output mirrors your source tree — not flat buckets:
okf_bundle/
├── SUMMARY.md ← bird's-eye view for AI agents
├── index.md ← root navigation
├── log.md ← generation history
└── StockAI/
└── RnD/
└── python/
└── connectors/
├── index.md ← lists all concepts in this folder
├── economic_data.md ← Module concept
└── economic_data/
├── WorldBankConnector.md ← Class
├── get_indicator.md ← Function
└── search.md ← Function
Each file is OKF v0.1 conformant:
---
type: Class
title: WorldBankConnector
description: Fetches World Bank development indicators via wbdata API.
resource: StockAI/RnD/python/connectors/economic_data.py
tags:
- lang:python
- type:Class
- module:StockAI
- domain:RnD
- git:branch:main
- git:repo:TrainLLMs
timestamp: '2026-05-23T09:01:21Z'
---
# WorldBankConnector
...signature, docstring, params, returns, methods, related concepts...
CLI Reference
okf generate
okf generate <source_dir> [output_dir]
Options:
--summarize <bundle_dir> Regenerate SUMMARY.md only (no re-scan)
Environment variables (LLM enrichment):
OKF_ENRICH=1 Enable LLM enrichment
OKF_BASE_URL OpenAI-compat base URL (default: https://api.anthropic.com/v1)
OKF_API_KEY API key
OKF_MODEL Model name (default: claude-sonnet-4-6)
OKF_MAX_WORKERS Parallel workers (default: 2)
okf lookup
okf lookup [query] [options]
Options:
--bundle PATH Bundle directory (default: ./okf_bundle)
--file PATH Filter by source file
--type TYPE Filter by concept type: Function | Class | Module
--tag TAG Filter by tag, repeatable: --tag lang:python
--limit N Max results (default: 10)
--compact One-line output per result
--json JSON output for programmatic use
--full Raw .md file content
--min-score N Minimum relevance score 0-1 (default: 0.1)
okf pairs
okf pairs <bundle_dir> [output_file]
Environment variables:
SKIP_SYNTH=1 Static pairs only (no LLM)
SYNTH_BASE_URL API endpoint
SYNTH_API_KEY API key
SYNTH_MODEL Model name
MAX_WORKERS Parallel workers (default: 3)
QA_PER_CONCEPT Q&A pairs per concept (default: 3)
PAIR_TYPES Comma-separated: codegen,qa,doc,summarize,crosslink
Supported Languages
| Language | Parser | Extracts |
|---|---|---|
| Python | stdlib ast |
Functions, classes, params, return types, docstrings |
| JavaScript / TypeScript | tree-sitter | Functions, arrow fns, classes, JSDoc |
| Go | tree-sitter | Funcs, methods, structs, interfaces, GoDoc |
| Java | tree-sitter | Classes, methods, constructors, Javadoc |
| Rust | tree-sitter | Fns, structs, enums, traits, impl blocks, /// |
| Ruby | tree-sitter | Defs, classes, modules, # comments |
LLM Enrichment
Works with any OpenAI-compatible endpoint — Claude, Ollama, llama.cpp, etc:
# Using a local llama.cpp server
OKF_ENRICH=1 \
OKF_BASE_URL="http://localhost:8080/v1" \
OKF_API_KEY="llamabarn" \
OKF_MODEL="ggml-org/gemma-3-4b-it-qat-GGUF:Q4_0" \
OKF_MAX_WORKERS=2 \
okf generate ./my_project ./okf_bundle
Enrichment is resumable — interrupt and rerun freely. Already-enriched concepts are skipped.
OpenCode Integration
# 1. Tell OpenCode about the bundle (auto-loaded every session)
cat >> AGENTS.md << 'EOF'
## OKF Knowledge Bundle
Before working on any class or function, look it up:
okf lookup --bundle ./okf_bundle <ConceptName>
EOF
# 2. Add a custom command
mkdir -p .opencode/commands
echo "RUN okf lookup --bundle ./okf_bundle \$NAME" > .opencode/commands/lookup.md
Then in OpenCode: /lookup NAME=WorldBankConnector
See docs/opencode-integration.md for full setup.
Python API
from okf.generator import scan_codebase, write_bundle, write_summary
from okf.lookup import load_bundle, search
# Generate bundle
concepts = scan_codebase("./my_project")
write_bundle(concepts, "./okf_bundle", "my_project", ["initial generation"])
write_summary("my_project", concepts, "./okf_bundle", {})
# Search concepts
bundle = load_bundle("./okf_bundle")
results = search(bundle, tokens=["WorldBankConnector"])
print(results[0]["description"])
Training Data
Convert your OKF bundle into JSONL training pairs for fine-tuning:
# 5 pair types: codegen, qa, doc, summarize, crosslink
okf pairs ./okf_bundle ./train.jsonl
Each pair is in chat format compatible with most fine-tuning pipelines.
Claude Skill
Install the skill in one step:
curl -fsSL https://raw.githubusercontent.com/UmairBaig8/okf-generator/main/scripts/install.sh | bash
Or via pip:
pip install okf-generator && okf install-skill
Once installed, Claude Code automatically triggers the skill on phrases like:
"Index my codebase" → generates OKF bundle
"Look up WorldBankConnector" → returns exact concept
"Generate training pairs from my bundle" → outputs JSONL
Contributing
Contributions are welcome! See CONTRIBUTING.md for guidelines.
git clone https://github.com/UmairBaig8/okf-generator
cd okf-generator
pip install -e ".[dev]"
pytest tests/
Good first issues: adding a new language parser, improving fuzzy search scoring, adding a CHANGELOG.
License
MIT — Copyright © 2026 Umair Baig
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file okf_generator-0.1.8.tar.gz.
File metadata
- Download URL: okf_generator-0.1.8.tar.gz
- Upload date:
- Size: 36.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2030cd4cb2479258f36883b6d949e091adf770e585c431e0ebf0d21c9c13c234
|
|
| MD5 |
1ce56e7276e2a517557a8fcd9c86f1db
|
|
| BLAKE2b-256 |
6f3554aeafce5b6538fb66b5efe1c3cabe45dfa07b9aa75ea5c75d234c779759
|
Provenance
The following attestation bundles were made for okf_generator-0.1.8.tar.gz:
Publisher:
publish.yml on UmairBaig8/okf-generator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
okf_generator-0.1.8.tar.gz -
Subject digest:
2030cd4cb2479258f36883b6d949e091adf770e585c431e0ebf0d21c9c13c234 - Sigstore transparency entry: 2011889437
- Sigstore integration time:
-
Permalink:
UmairBaig8/okf-generator@d6431c329f334ce598b2ec50b39f5bf2f60cdddf -
Branch / Tag:
refs/tags/v0.1.8 - Owner: https://github.com/UmairBaig8
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d6431c329f334ce598b2ec50b39f5bf2f60cdddf -
Trigger Event:
push
-
Statement type:
File details
Details for the file okf_generator-0.1.8-py3-none-any.whl.
File metadata
- Download URL: okf_generator-0.1.8-py3-none-any.whl
- Upload date:
- Size: 37.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b991ba121e247d2166594d3ffbe70fb8e47ef0951dc6d1010ed232aa4ba3ee41
|
|
| MD5 |
9d84f749ff0d05e0891ff2eff51056a1
|
|
| BLAKE2b-256 |
46d01b4667904b87fcda239f9b7a7655d09cf716bd306940201470c2c05f8c84
|
Provenance
The following attestation bundles were made for okf_generator-0.1.8-py3-none-any.whl:
Publisher:
publish.yml on UmairBaig8/okf-generator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
okf_generator-0.1.8-py3-none-any.whl -
Subject digest:
b991ba121e247d2166594d3ffbe70fb8e47ef0951dc6d1010ed232aa4ba3ee41 - Sigstore transparency entry: 2011889516
- Sigstore integration time:
-
Permalink:
UmairBaig8/okf-generator@d6431c329f334ce598b2ec50b39f5bf2f60cdddf -
Branch / Tag:
refs/tags/v0.1.8 - Owner: https://github.com/UmairBaig8
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d6431c329f334ce598b2ec50b39f5bf2f60cdddf -
Trigger Event:
push
-
Statement type: