Skip to main content

Generate OKF v0.1 knowledge bundles from codebases — Claude skill + OpenCode integration

Project description

okf-generator banner

PyPI version Python Tests License: MIT OKF v0.1 Claude Skill OpenCode Cursor Windsurf PRs Welcome

Index any codebase into a structured OKF v0.1 knowledge bundle — then look up exact concepts for any AI coding agent.

Installation · Quick Start · CLI Reference · AI Agent Integration · Contributing


What is this?

okf-generator converts your source code into an Open Knowledge Format (OKF) v0.1 knowledge bundle — structured markdown files that AI agents can read, search, and reason over.

Instead of giving an AI your entire codebase, you give it exactly the concept it needs:

# Before touching WorldBankConnector, look it up
okf lookup WorldBankConnector

# CLASS: WorldBankConnector
# Source      : StockAI/RnD/python/connectors/economic_data.py  line 51
# Description : Fetches World Bank development indicators via wbdata API.
# Methods     : get_indicator, search
# Signature   : class WorldBankConnector

Features

  • 7 languages — Python (stdlib AST), JS/TS/Go/Java/Rust/Ruby (tree-sitter), SQL (dialect-tolerant regex)
  • Zero LLM required for extraction — deterministic, fast, offline-capable
  • OKF v0.1 conformant — type, description, resource, tags, timestamp
  • Domain/resource-path layout — bundle mirrors your source tree exactly
  • Resumable LLM enrichment — enrich descriptions with any OpenAI-compat endpoint; safe to interrupt and rerun
  • Any AI agent — OpenCode, Claude Code, Cursor, Windsurf, Cline, GitHub Copilot, and more
  • Training data pipeline — convert bundle to JSONL pairs (codegen, QA, doc, summarize, crosslink)
  • Claude Skill included — install SKILL.md to trigger the full pipeline from natural language

Installation

One-liner — paste into any terminal:

curl -fsSL https://raw.githubusercontent.com/UmairBaig8/okf-generator/main/scripts/install.sh | bash

This installs okf-generator[llm] + the Claude Code skill in one shot.
Requirements: Python 3.11+ with pip.

Or manually:

# Core (extraction only — no LLM required)
pip install okf-generator

# With LLM enrichment + training pair generation
pip install "okf-generator[llm]"

Quick Start

# 1. Generate a knowledge bundle from your codebase
okf generate ./my_project ./okf_bundle

# 2. Look up a concept (works instantly, zero LLM)
okf lookup WorldBankConnector

# 3. Find all concepts from one file
okf lookup --file src/connectors/economic_data.py

# 4. Generate training pairs from the bundle
okf pairs ./okf_bundle ./train.jsonl

# 5. Regenerate SUMMARY.md after enrichment
okf summarize ./okf_bundle

Bundle Layout

The output mirrors your source tree — not flat buckets:

okf_bundle/
├── SUMMARY.md                        ← bird's-eye view for AI agents
├── index.md                          ← root navigation
├── log.md                            ← generation history
└── StockAI/
    └── RnD/
        └── python/
            └── connectors/
                ├── index.md          ← lists all concepts in this folder
                ├── economic_data.md  ← Module concept
                └── economic_data/
                    ├── WorldBankConnector.md   ← Class
                    ├── get_indicator.md        ← Function
                    └── search.md               ← Function

Each file is OKF v0.1 conformant:

---
type: Class
title: WorldBankConnector
description: Fetches World Bank development indicators via wbdata API.
resource: StockAI/RnD/python/connectors/economic_data.py
tags:
  - lang:python
  - type:Class
  - module:StockAI
  - domain:RnD
  - git:branch:main
  - git:repo:TrainLLMs
timestamp: '2026-05-23T09:01:21Z'
---

# WorldBankConnector

...signature, docstring, params, returns, methods, related concepts...

CLI Reference

okf generate

okf generate <source_dir> [output_dir]

Options:
  --summarize <bundle_dir>   Regenerate SUMMARY.md only (no re-scan)

Environment variables (LLM enrichment):
  OKF_ENRICH=1               Enable LLM enrichment
  OKF_BASE_URL               OpenAI-compat base URL (default: https://api.anthropic.com/v1)
  OKF_API_KEY                API key
  OKF_MODEL                  Model name (default: claude-sonnet-4-6)
  OKF_MAX_WORKERS            Parallel workers (default: 2)

okf lookup

okf lookup [query] [options]

Options:
  --bundle PATH     Bundle directory (default: ./okf_bundle)
  --file PATH       Filter by source file
  --type TYPE       Filter by concept type: Function | Class | Module
  --tag TAG         Filter by tag, repeatable: --tag lang:python
  --limit N         Max results (default: 10)
  --compact         One-line output per result
  --json            JSON output for programmatic use
  --full            Raw .md file content
  --min-score N     Minimum relevance score 0-1 (default: 0.1)
  --no-cache        Bypass and skip writing the lookup cache

okf pairs

okf pairs <bundle_dir> [output_file]

Environment variables:
  SKIP_SYNTH=1          Static pairs only (no LLM)
  SYNTH_BASE_URL        API endpoint
  SYNTH_API_KEY         API key
  SYNTH_MODEL           Model name
  MAX_WORKERS           Parallel workers (default: 3)
  QA_PER_CONCEPT        Q&A pairs per concept (default: 3)
  PAIR_TYPES            Comma-separated: codegen,qa,doc,summarize,crosslink

Supported Languages

Language Parser Extracts
Python stdlib ast Functions, classes, params, return types, docstrings
JavaScript / TypeScript tree-sitter Functions, arrow fns, classes, JSDoc
Go tree-sitter Funcs, methods, structs, interfaces, GoDoc
Java tree-sitter Classes, methods, constructors, Javadoc
Rust tree-sitter Fns, structs, enums, traits, impl blocks, ///
Ruby tree-sitter Defs, classes, modules, # comments
SQL regex (dialect-tolerant) CREATE TABLE/VIEW/FUNCTION/PROCEDURE/INDEX, preceding --//* */ comments

LLM Enrichment

Works with any OpenAI-compatible endpoint — Claude, Ollama, llama.cpp, etc:

# Using a local llama.cpp server
OKF_ENRICH=1 \
OKF_BASE_URL="http://localhost:8080/v1" \
OKF_API_KEY="llamabarn" \
OKF_MODEL="ggml-org/gemma-3-4b-it-qat-GGUF:Q4_0" \
OKF_MAX_WORKERS=2 \
okf generate ./my_project ./okf_bundle

Enrichment is resumable — interrupt and rerun freely. Already-enriched concepts are skipped.

AI Agent Integration

okf-generator works with any AI coding agent — the output is plain markdown files that every agent can read.

OpenCode / Claude Code

# Tell your agent about the bundle
cat >> AGENTS.md << 'EOF'
## OKF Knowledge Bundle
Before working on any class or function, look it up:
  okf lookup --bundle ./okf_bundle <ConceptName>
EOF

# Add a custom command (OpenCode)
mkdir -p .opencode/commands
echo "RUN okf lookup --bundle ./okf_bundle \$NAME" > .opencode/commands/lookup.md

Then: /lookup NAME=WorldBankConnector

Cursor / Windsurf / Cline

Add to .cursorrules or agent instructions:

Before editing a function or class, run:
  okf lookup --bundle ./okf_bundle <Name>
To see dependencies:
  okf lookup --bundle ./okf_bundle --type Dependency

GitHub Copilot

Reference OKF bundle files in your /.github/copilot-instructions.md:

Project knowledge is indexed in ./okf_bundle/
  - okf lookup <Name> returns full concept context
  - okf lookup --type Dependency returns dependency info

Any agent with RUN capability

# Prime full context
cat ./okf_bundle/SUMMARY.md

# Look up a specific concept
okf lookup --bundle ./okf_bundle WorldBankConnector

# List dependencies
okf lookup --bundle ./okf_bundle --type Dependency

# JSON for programmatic agent use
okf lookup --bundle ./okf_bundle --json WorldBankConnector

See docs/opencode-integration.md for full OpenCode setup.

Python API

from okf.generator import scan_codebase, write_bundle, write_summary
from okf.lookup import load_bundle, search

# Generate bundle
concepts = scan_codebase("./my_project")
write_bundle(concepts, "./okf_bundle", "my_project", ["initial generation"])
write_summary("my_project", concepts, "./okf_bundle", {})

# Search concepts
bundle = load_bundle("./okf_bundle")
results = search(bundle, tokens=["WorldBankConnector"])
print(results[0]["description"])

Training Data

Convert your OKF bundle into JSONL training pairs for fine-tuning:

# 5 pair types: codegen, qa, doc, summarize, crosslink
okf pairs ./okf_bundle ./train.jsonl

Each pair is in chat format compatible with most fine-tuning pipelines.

Claude Skill

Install the skill in one step:

curl -fsSL https://raw.githubusercontent.com/UmairBaig8/okf-generator/main/scripts/install.sh | bash

Or via pip:

pip install okf-generator && okf install-skill

Once installed, Claude Code automatically triggers the skill on phrases like:

"Index my codebase" → generates OKF bundle
"Look up WorldBankConnector" → returns exact concept
"Generate training pairs from my bundle" → outputs JSONL

The same .md output works with any agent — no vendor lock-in. Point Cursor, Windsurf, Cline, or Copilot at your bundle and they get the same structured knowledge.

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

git clone https://github.com/UmairBaig8/okf-generator
cd okf-generator
pip install -e ".[dev]"
pytest tests/

Good first issues: adding a new language parser, improving fuzzy search scoring, adding a CHANGELOG.

License

MIT — Copyright © 2026 Umair Baig

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

okf_generator-0.1.11.tar.gz (54.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

okf_generator-0.1.11-py3-none-any.whl (45.8 kB view details)

Uploaded Python 3

File details

Details for the file okf_generator-0.1.11.tar.gz.

File metadata

  • Download URL: okf_generator-0.1.11.tar.gz
  • Upload date:
  • Size: 54.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for okf_generator-0.1.11.tar.gz
Algorithm Hash digest
SHA256 d68152fc12b7c200729469171c8894f8ddfd37b31fffc1b301ad1609d3c89422
MD5 c7bb56d7d79c1ab49fd5bae379169465
BLAKE2b-256 1dc23b3338583c385619d35d555d2e92194c306bca410be6b965d576514bb33e

See more details on using hashes here.

Provenance

The following attestation bundles were made for okf_generator-0.1.11.tar.gz:

Publisher: publish.yml on UmairBaig8/okf-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file okf_generator-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: okf_generator-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 45.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for okf_generator-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 3be095d717638965e521bbb82cb59846f38af4a0da5f18a552296832b16fef61
MD5 b647bbd799cd4a8a33f4d14796172a87
BLAKE2b-256 61b54c9816167eb49bbeab74f96767049e888045919554d540af9fa9b5882bdb

See more details on using hashes here.

Provenance

The following attestation bundles were made for okf_generator-0.1.11-py3-none-any.whl:

Publisher: publish.yml on UmairBaig8/okf-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page