Skip to main content

Generate OKF v0.1 knowledge bundles from codebases — Claude skill + OpenCode integration

Project description

okf-generator banner

PyPI version Python Tests License: MIT OKF v0.1 Claude Skill OpenCode Cursor Windsurf PRs Welcome

Index any codebase into a structured OKF v0.1 knowledge bundle — then look up exact concepts for any AI coding agent.

Installation · Quick Start · CLI Reference · AI Agent Integration · Contributing


What is this?

okf-generator converts your source code into an Open Knowledge Format (OKF) v0.1 knowledge bundle — structured markdown files that AI agents can read, search, and reason over.

Instead of giving an AI your entire codebase, you give it exactly the concept it needs:

# Before touching WorldBankConnector, look it up
okf lookup WorldBankConnector

# CLASS: WorldBankConnector
# Source      : StockAI/RnD/python/connectors/economic_data.py  line 51
# Description : Fetches World Bank development indicators via wbdata API.
# Methods     : get_indicator, search
# Signature   : class WorldBankConnector

Features

  • 7 languages — Python (stdlib AST), JS/TS/Go/Java/Rust/Ruby (tree-sitter), SQL (dialect-tolerant regex)
  • 12 manifest formatsrequirements.txt, pyproject.toml, package.json, Cargo.toml, go.mod, composer.json, pom.xml, Gemfile, build.gradle, Package.swift, project.clj, mix.exs — each dependency becomes a Dependency concept with ecosystem, version, and dev-flag metadata
  • Zero LLM required for extraction — deterministic, fast, offline-capable
  • OKF v0.1 conformant — type, description, resource, tags, timestamp
  • Domain/resource-path layout — bundle mirrors your source tree exactly; dependencies organized in _dependencies/{ecosystem}/ folders
  • Resumable LLM enrichment — enrich descriptions with any OpenAI-compat endpoint; safe to interrupt and rerun
  • Lookup cache — auto-caches parsed concepts; subsequent lookups skip re-parsing (~2x faster)
  • Any AI agent — OpenCode, Claude Code, Cursor, Windsurf, Cline, GitHub Copilot, and more
  • Training data pipeline — convert bundle to JSONL pairs (codegen, QA, doc, summarize, crosslink)
  • Claude Skill included — install SKILL.md to trigger the full pipeline from natural language

Installation

One-liner — paste into any terminal:

curl -fsSL https://raw.githubusercontent.com/UmairBaig8/okf-generator/main/scripts/install.sh | bash

This installs okf-generator[llm] + the Claude Code skill in one shot.
Requirements: Python 3.11+ with pip.

Or manually:

# Core (extraction only — no LLM required)
pip install okf-generator

# With LLM enrichment + training pair generation
pip install "okf-generator[llm]"

Quick Start

# 1. Generate a knowledge bundle from your codebase
okf generate ./my_project ./okf_bundle

# 2. Look up a concept (works instantly, zero LLM)
okf lookup WorldBankConnector

# 3. Find all concepts from one file
okf lookup --file src/connectors/economic_data.py

# 4. Generate training pairs from the bundle
okf pairs ./okf_bundle ./train.jsonl

# 5. Regenerate SUMMARY.md after enrichment
okf summarize ./okf_bundle

Bundle Layout

The output mirrors your source tree — dependencies get their own organized namespace:

okf_bundle/
├── SUMMARY.md                        ← bird's-eye view for AI agents
├── index.md                          ← root navigation
├── log.md                            ← generation history
├── _dependencies/                    ← all dependency concepts
│   ├── index.md                      ← lists ecosystems: pip, npm, cargo, ...
│   ├── pip/
│   │   ├── index.md
│   │   ├── requests.md               ← Dependency concept
│   │   └── flask.md
│   └── npm/
│       ├── index.md
│       ├── express.md
│       └── react.md
└── StockAI/
    └── RnD/
        └── python/
            └── connectors/
                ├── index.md          ← lists all concepts in this folder
                ├── economic_data.md  ← Module concept
                └── economic_data/
                    ├── WorldBankConnector.md   ← Class
                    ├── get_indicator.md        ← Function
                    └── search.md               ← Function

Each file is OKF v0.1 conformant:

---
type: Class
title: WorldBankConnector
description: Fetches World Bank development indicators via wbdata API.
resource: StockAI/RnD/python/connectors/economic_data.py
tags:
  - lang:python
  - type:Class
  - module:StockAI
  - domain:RnD
  - git:branch:main
  - git:repo:TrainLLMs
timestamp: '2026-05-23T09:01:21Z'
---

# WorldBankConnector

...signature, docstring, params, returns, methods, related concepts...

CLI Reference

okf generate

okf generate <source_dir> [output_dir]

Options:
  --summarize <bundle_dir>   Regenerate SUMMARY.md only (no re-scan)

Environment variables (LLM enrichment):
  OKF_ENRICH=1               Enable LLM enrichment
  OKF_BASE_URL               OpenAI-compat base URL (default: https://api.anthropic.com/v1)
  OKF_API_KEY                API key
  OKF_MODEL                  Model name (default: claude-sonnet-4-6)
  OKF_MAX_WORKERS            Parallel workers (default: 2)

okf lookup

okf lookup [query] [options]

Options:
  --bundle PATH     Bundle directory (default: ./okf_bundle)
  --file PATH       Filter by source file
  --type TYPE       Filter by concept type: Function | Class | Module | Dependency
  --tag TAG         Filter by tag, repeatable: --tag lang:python or --tag ecosystem:npm
  --limit N         Max results (default: 10)
  --compact         One-line output per result
  --json            JSON output for programmatic use
  --full            Raw .md file content
  --min-score N     Minimum relevance score 0-1 (default: 0.1)
  --no-cache        Bypass and skip writing the lookup cache

okf pairs

okf pairs <bundle_dir> [output_file]

Environment variables:
  SKIP_SYNTH=1          Static pairs only (no LLM)
  SYNTH_BASE_URL        API endpoint
  SYNTH_API_KEY         API key
  SYNTH_MODEL           Model name
  MAX_WORKERS           Parallel workers (default: 3)
  QA_PER_CONCEPT        Q&A pairs per concept (default: 3)
  PAIR_TYPES            Comma-separated: codegen,qa,doc,summarize,crosslink

Supported Languages & Manifests

Code Languages

Language Parser Extracts
Python stdlib ast Functions, classes, params, return types, docstrings
JavaScript / TypeScript tree-sitter Functions, arrow fns, classes, JSDoc
Go tree-sitter Funcs, methods, structs, interfaces, GoDoc
Java tree-sitter Classes, methods, constructors, Javadoc
Rust tree-sitter Fns, structs, enums, traits, impl blocks, ///
Ruby tree-sitter Defs, classes, modules, # comments
SQL regex (dialect-tolerant) CREATE TABLE/VIEW/FUNCTION/PROCEDURE/INDEX, preceding --//* */ comments

Manifest / Build Files

Format Parser Extracts
requirements.txt regex pip package names + version constraints
pyproject.toml tomllib PEP 621 deps + optional-dependencies + Poetry legacy
package.json json npm/Node dependencies + devDependencies
Cargo.toml tomllib Rust crate deps + dev/build-dependencies
go.mod regex Go module deps + // indirect flag
composer.json json PHP packages (skips php/ext-* platform entries)
pom.xml xml.etree.ElementTree Maven dependencies + test/provided scope → dev
Gemfile regex Ruby gems + group :test/:development → dev
build.gradle / .kts regex Gradle deps (Groovy + Kotlin DSL) + testImplementation → dev
Package.swift regex SwiftPM packages from .package(url:from:)
project.clj regex Clojars deps + :dev profile
mix.exs regex Hex packages + only: :dev/:test → dev

LLM Enrichment

Works with any OpenAI-compatible endpoint — Claude, Ollama, llama.cpp, etc:

# Using a local llama.cpp server
OKF_ENRICH=1 \
OKF_BASE_URL="http://localhost:8080/v1" \
OKF_API_KEY="llamabarn" \
OKF_MODEL="ggml-org/gemma-3-4b-it-qat-GGUF:Q4_0" \
OKF_MAX_WORKERS=2 \
okf generate ./my_project ./okf_bundle

Enrichment is resumable — interrupt and rerun freely. Already-enriched concepts are skipped.

AI Agent Integration

okf-generator works with any AI coding agent — the output is plain markdown files that every agent can read.

OpenCode / Claude Code

# Tell your agent about the bundle
cat >> AGENTS.md << 'EOF'
## OKF Knowledge Bundle
Before working on any class or function, look it up:
  okf lookup --bundle ./okf_bundle <ConceptName>
EOF

# Add a custom command (OpenCode)
mkdir -p .opencode/commands
echo "RUN okf lookup --bundle ./okf_bundle \$NAME" > .opencode/commands/lookup.md

Then: /lookup NAME=WorldBankConnector

Cursor / Windsurf / Cline

Add to .cursorrules or agent instructions:

Before editing a function or class, run:
  okf lookup --bundle ./okf_bundle <Name>
To see dependencies:
  okf lookup --bundle ./okf_bundle --type Dependency

GitHub Copilot

Reference OKF bundle files in your /.github/copilot-instructions.md:

Project knowledge is indexed in ./okf_bundle/
  - okf lookup <Name> returns full concept context
  - okf lookup --type Dependency returns dependency info

Any agent with RUN capability

# Prime full context
cat ./okf_bundle/SUMMARY.md

# Look up a specific concept
okf lookup --bundle ./okf_bundle WorldBankConnector

# List dependencies
okf lookup --bundle ./okf_bundle --type Dependency

# JSON for programmatic agent use
okf lookup --bundle ./okf_bundle --json WorldBankConnector

See docs/opencode-integration.md for full OpenCode setup.

Python API

from okf.generator import scan_codebase, write_bundle, write_summary
from okf.lookup import load_bundle, search

# Generate bundle
concepts = scan_codebase("./my_project")
write_bundle(concepts, "./okf_bundle", "my_project", ["initial generation"])
write_summary("my_project", concepts, "./okf_bundle", {})

# Search concepts
bundle = load_bundle("./okf_bundle")
results = search(bundle, tokens=["WorldBankConnector"])
print(results[0]["description"])

Training Data

Convert your OKF bundle into JSONL training pairs for fine-tuning:

# 5 pair types: codegen, qa, doc, summarize, crosslink
okf pairs ./okf_bundle ./train.jsonl

Each pair is in chat format compatible with most fine-tuning pipelines.

Claude Skill

Install the skill in one step:

curl -fsSL https://raw.githubusercontent.com/UmairBaig8/okf-generator/main/scripts/install.sh | bash

Or via pip:

pip install okf-generator && okf install-skill

Once installed, Claude Code automatically triggers the skill on phrases like:

"Index my codebase" → generates OKF bundle
"Look up WorldBankConnector" → returns exact concept
"Generate training pairs from my bundle" → outputs JSONL

The same .md output works with any agent — no vendor lock-in. Point Cursor, Windsurf, Cline, or Copilot at your bundle and they get the same structured knowledge.

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

git clone https://github.com/UmairBaig8/okf-generator
cd okf-generator
pip install -e ".[dev]"
pytest tests/

Good first issues: adding a new language parser, improving fuzzy search scoring, adding a CHANGELOG.

License

MIT — Copyright © 2026 Umair Baig

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

okf_generator-0.1.12.tar.gz (54.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

okf_generator-0.1.12-py3-none-any.whl (46.5 kB view details)

Uploaded Python 3

File details

Details for the file okf_generator-0.1.12.tar.gz.

File metadata

  • Download URL: okf_generator-0.1.12.tar.gz
  • Upload date:
  • Size: 54.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for okf_generator-0.1.12.tar.gz
Algorithm Hash digest
SHA256 8ffa54fd3cd20321fad480643ec060bf5638177bfe25f2c2120bd90b77235052
MD5 e2e26966106d2e38b1d5bd8a581e3582
BLAKE2b-256 978da474d199a95c6a9a1e86ada6e2a08a04cd702ee851812a2956e41e2201c7

See more details on using hashes here.

Provenance

The following attestation bundles were made for okf_generator-0.1.12.tar.gz:

Publisher: publish.yml on UmairBaig8/okf-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file okf_generator-0.1.12-py3-none-any.whl.

File metadata

  • Download URL: okf_generator-0.1.12-py3-none-any.whl
  • Upload date:
  • Size: 46.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for okf_generator-0.1.12-py3-none-any.whl
Algorithm Hash digest
SHA256 005f28d75ae0948af8e28d8a4a0593ce0e4f5bb4c59eacb5383accef70c82a4d
MD5 1959b8f8ac8fb80d5f9e45cd6d6af308
BLAKE2b-256 6c358748d97f19d8881a5ee6cdd772e98eb1e9ab6af51f65251a8b9124d3133b

See more details on using hashes here.

Provenance

The following attestation bundles were made for okf_generator-0.1.12-py3-none-any.whl:

Publisher: publish.yml on UmairBaig8/okf-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page