Skip to main content

Generate OKF v0.1 knowledge bundles from codebases — Claude skill + OpenCode integration

Project description

okf-generator banner

PyPI version Python Tests License: MIT OKF v0.1 Claude Skill PRs Welcome

Index any codebase into a structured OKF v0.1 knowledge bundle — then look up exact concepts for AI agents like OpenCode.

Installation · Quick Start · CLI Reference · OpenCode Integration · Contributing


What is this?

okf-generator converts your source code into an Open Knowledge Format (OKF) v0.1 knowledge bundle — structured markdown files that AI agents can read, search, and reason over.

Instead of giving an AI your entire codebase, you give it exactly the concept it needs:

# Before touching WorldBankConnector, look it up
okf lookup WorldBankConnector

# CLASS: WorldBankConnector
# Source      : StockAI/RnD/python/connectors/economic_data.py  line 51
# Description : Fetches World Bank development indicators via wbdata API.
# Methods     : get_indicator, search
# Signature   : class WorldBankConnector

Features

  • 6 languages — Python (stdlib AST), JS/TS/Go/Java/Rust/Ruby (tree-sitter)
  • Zero LLM required for extraction — deterministic, fast, offline-capable
  • OKF v0.1 conformant — type, description, resource, tags, timestamp
  • Domain/resource-path layout — bundle mirrors your source tree exactly
  • Resumable LLM enrichment — enrich descriptions with any OpenAI-compat endpoint; safe to interrupt and rerun
  • OpenCode integrationAGENTS.md + custom commands for pinpoint context injection
  • Training data pipeline — convert bundle to JSONL pairs (codegen, QA, doc, summarize, crosslink)
  • Claude Skill — install SKILL.md to trigger the full pipeline from natural language

Installation

# Core (extraction only — no LLM required)
pip install okf-generator

# With LLM enrichment + training pair generation
pip install okf-generator[llm]

Requirements: Python 3.11+

Quick Start

# 1. Generate a knowledge bundle from your codebase
okf generate ./my_project ./okf_bundle

# 2. Look up a concept (works instantly, zero LLM)
okf lookup WorldBankConnector

# 3. Find all concepts from one file
okf lookup --file src/connectors/economic_data.py

# 4. Generate training pairs from the bundle
okf pairs ./okf_bundle ./train.jsonl

# 5. Regenerate SUMMARY.md after enrichment
okf summarize ./okf_bundle

Bundle Layout

The output mirrors your source tree — not flat buckets:

okf_bundle/
├── SUMMARY.md                        ← bird's-eye view for AI agents
├── index.md                          ← root navigation
├── log.md                            ← generation history
└── StockAI/
    └── RnD/
        └── python/
            └── connectors/
                ├── index.md          ← lists all concepts in this folder
                ├── economic_data.md  ← Module concept
                └── economic_data/
                    ├── WorldBankConnector.md   ← Class
                    ├── get_indicator.md        ← Function
                    └── search.md               ← Function

Each file is OKF v0.1 conformant:

---
type: Class
title: WorldBankConnector
description: Fetches World Bank development indicators via wbdata API.
resource: StockAI/RnD/python/connectors/economic_data.py
tags:
  - lang:python
  - type:Class
  - module:StockAI
  - domain:RnD
  - git:branch:main
  - git:repo:TrainLLMs
timestamp: '2026-05-23T09:01:21Z'
---

# WorldBankConnector

...signature, docstring, params, returns, methods, related concepts...

CLI Reference

okf generate

okf generate <source_dir> [output_dir]

Options:
  --summarize <bundle_dir>   Regenerate SUMMARY.md only (no re-scan)

Environment variables (LLM enrichment):
  OKF_ENRICH=1               Enable LLM enrichment
  OKF_BASE_URL               OpenAI-compat base URL (default: https://api.anthropic.com/v1)
  OKF_API_KEY                API key
  OKF_MODEL                  Model name (default: claude-sonnet-4-6)
  OKF_MAX_WORKERS            Parallel workers (default: 2)

okf lookup

okf lookup [query] [options]

Options:
  --bundle PATH     Bundle directory (default: ./okf_bundle)
  --file PATH       Filter by source file
  --type TYPE       Filter by concept type: Function | Class | Module
  --tag TAG         Filter by tag, repeatable: --tag lang:python
  --limit N         Max results (default: 10)
  --compact         One-line output per result
  --json            JSON output for programmatic use
  --full            Raw .md file content
  --min-score N     Minimum relevance score 0-1 (default: 0.1)

okf pairs

okf pairs <bundle_dir> [output_file]

Environment variables:
  SKIP_SYNTH=1          Static pairs only (no LLM)
  SYNTH_BASE_URL        API endpoint
  SYNTH_API_KEY         API key
  SYNTH_MODEL           Model name
  MAX_WORKERS           Parallel workers (default: 3)
  QA_PER_CONCEPT        Q&A pairs per concept (default: 3)
  PAIR_TYPES            Comma-separated: codegen,qa,doc,summarize,crosslink

Supported Languages

Language Parser Extracts
Python stdlib ast Functions, classes, params, return types, docstrings
JavaScript / TypeScript tree-sitter Functions, arrow fns, classes, JSDoc
Go tree-sitter Funcs, methods, structs, interfaces, GoDoc
Java tree-sitter Classes, methods, constructors, Javadoc
Rust tree-sitter Fns, structs, enums, traits, impl blocks, ///
Ruby tree-sitter Defs, classes, modules, # comments

LLM Enrichment

Works with any OpenAI-compatible endpoint — Claude, Ollama, llama.cpp, etc:

# Using a local llama.cpp server
OKF_ENRICH=1 \
OKF_BASE_URL="http://localhost:8080/v1" \
OKF_API_KEY="llamabarn" \
OKF_MODEL="ggml-org/gemma-3-4b-it-qat-GGUF:Q4_0" \
OKF_MAX_WORKERS=2 \
okf generate ./my_project ./okf_bundle

Enrichment is resumable — interrupt and rerun freely. Already-enriched concepts are skipped.

OpenCode Integration

# 1. Tell OpenCode about the bundle (auto-loaded every session)
cat >> AGENTS.md << 'EOF'
## OKF Knowledge Bundle
Before working on any class or function, look it up:
  okf lookup --bundle ./okf_bundle <ConceptName>
EOF

# 2. Add a custom command
mkdir -p .opencode/commands
echo "RUN okf lookup --bundle ./okf_bundle \$NAME" > .opencode/commands/lookup.md

Then in OpenCode: /lookup NAME=WorldBankConnector

See docs/opencode-integration.md for full setup.

Python API

from okf.generator import scan_codebase, write_bundle, write_summary
from okf.lookup import load_bundle, search

# Generate bundle
concepts = scan_codebase("./my_project")
write_bundle(concepts, "./okf_bundle", "my_project", ["initial generation"])
write_summary("my_project", concepts, "./okf_bundle", {})

# Search concepts
bundle = load_bundle("./okf_bundle")
results = search(bundle, tokens=["WorldBankConnector"])
print(results[0]["description"])

Training Data

Convert your OKF bundle into JSONL training pairs for fine-tuning:

# 5 pair types: codegen, qa, doc, summarize, crosslink
okf pairs ./okf_bundle ./train.jsonl

Each pair is in chat format compatible with most fine-tuning pipelines.

Claude Skill

Install SKILL.md to trigger the full pipeline from natural language inside Claude:

"Index my codebase" → generates OKF bundle
"Look up WorldBankConnector" → returns exact concept
"Generate training pairs from my bundle" → outputs JSONL

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

git clone https://github.com/umairbaig/okf-generator
cd okf-generator
pip install -e ".[dev]"
pytest tests/

Good first issues: adding a new language parser, improving fuzzy search scoring, adding a CHANGELOG.

License

MIT — Copyright © 2026 Umair Baig

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

okf_generator-0.1.3.tar.gz (35.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

okf_generator-0.1.3-py3-none-any.whl (33.6 kB view details)

Uploaded Python 3

File details

Details for the file okf_generator-0.1.3.tar.gz.

File metadata

  • Download URL: okf_generator-0.1.3.tar.gz
  • Upload date:
  • Size: 35.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for okf_generator-0.1.3.tar.gz
Algorithm Hash digest
SHA256 fcd7733dbfd943bc352a533285f500a074cb739fd5bc5926e58f0b8b7acf58bb
MD5 e94f52e3b36da075964c6bc6a0cd5ac1
BLAKE2b-256 e07e39a35a8d945ecbc3844215adeb42c40ed98c2cf00d4407261d3dfda75797

See more details on using hashes here.

Provenance

The following attestation bundles were made for okf_generator-0.1.3.tar.gz:

Publisher: publish.yml on UmairBaig8/okf-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file okf_generator-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: okf_generator-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 33.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for okf_generator-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6ecdc5e4517e193678d07d5f314db88bb59951454502774c99bf4fe3bc378cf6
MD5 60e8d75f651af05b926f5bbceaff1e21
BLAKE2b-256 d52be408fdb58171a1256e728e510ca0364cb0cebbdf695fd70a5e4d9100b28b

See more details on using hashes here.

Provenance

The following attestation bundles were made for okf_generator-0.1.3-py3-none-any.whl:

Publisher: publish.yml on UmairBaig8/okf-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page