Generate OKF v0.1 knowledge bundles from codebases — Claude skill + OpenCode integration
Project description
Index any codebase into a structured knowledge bundle — then look up exact concepts for any AI coding agent.
Demo · Quick Start · Installation · Agent Integration · Supported Languages · How it compares · FAQ
Demo
Features
🧠 AI-agent ready — Claude, Cursor, Copilot, Windsurf, Cline, OpenCode — any agent reads the bundle instantly.
⚡ Zero LLM extraction — fully offline, deterministic, no API key needed.
🌍 7 languages + 15 manifest formats — Python, JS/TS, Go, Java, Rust, Ruby, SQL + pip, npm, cargo, go, maven, gradle, and more.
🔗 Cross-reference linker — imports → dependencies, function calls → caller/callee across all languages.
📦 Training data pipeline — convert any bundle to JSONL pairs for fine-tuning.
🔍 Instant lookup — find any function, class, or dependency in milliseconds.
Quick Start
# Install
pip install okf-generator
# Generate a bundle from your project
okf generate ./my_project ./okf_bundle
# Look up a concept (zero LLM, instant)
okf lookup WorldBankConnector
# List all pip dependencies
okf lookup --type Dependency --tag ecosystem:pip
Installation
One-liner:
curl -fsSL https://raw.githubusercontent.com/UmairBaig8/okf-generator/main/scripts/install.sh | bash
Or via pip:
pip install okf-generator # core (offline extraction)
pip install "okf-generator[llm]" # with LLM enrichment + training pairs
Why this exists
AI coding agents waste enormous amounts of context re-reading entire files to find one function, class, or dependency version. Ask an agent "what does WorldBankConnector do?" and it either guesses from a stale memory, or burns thousands of tokens reading the whole file to find a 12-line answer.
okf-generator solves this by converting your source code into the Open Knowledge Format (OKF) v0.1 — a directory of small, structured markdown files, one per concept (function, class, module, dependency). An agent then asks a surgical question and gets a surgical answer:
# Before touching WorldBankConnector, look it up
okf lookup WorldBankConnector
# CLASS: WorldBankConnector
# Source : StockAI/RnD/python/connectors/economic_data.py line 51
# Description : Fetches World Bank development indicators via wbdata API.
# Methods : get_indicator, search
# Signature : class WorldBankConnector
No re-reading the file. No guessing. No LLM call required.
How it compares
The OKF ecosystem is moving fast — here's where okf-generator sits relative to other producers:
| okf-generator | Other OKF producers | |
|---|---|---|
| Language coverage | 7 languages (Python, JS/TS, Go, Java, Rust, Ruby, SQL) | Usually 1 language or doc-only |
| Cross-reference linking | Imports → dependencies, function calls → caller/callee across all languages | Not typically supported |
| Dependency/manifest parsing | 12 formats (pip, npm, cargo, go, maven, gradle, composer, rubygems, swiftpm, clojars, hex, +1) | Not typically supported |
| Extraction | Zero-LLM, deterministic, offline | Often LLM-required for every concept |
| Optional enrichment | Any OpenAI-compatible endpoint (Claude, local llama.cpp, Ollama) | Often locked to one vendor |
| Training data export | Built-in JSONL pair generator (5 pair types) | Not typically included |
| Agent compatibility | Any agent that can run a CLI (Claude Code, Cursor, Windsurf, Copilot, OpenCode, Cline) | Often single-agent focused |
If you're choosing between OKF producers: pick okf-generator when you want broad language + dependency coverage with zero mandatory LLM cost, and you want the bundle to double as a fine-tuning data source.
How it works
flowchart LR
A[Your codebase] -->|okf generate| B[Scanners: AST / tree-sitter / regex]
B --> C[Concepts: Function / Class / Module / Dependency]
C --> D[OKF Bundle: markdown + YAML frontmatter]
D -->|okf lookup| E[AI Agent]
D -->|okf pairs| F[JSONL training data]
Extraction is fully deterministic and offline-capable. LLM enrichment is an optional second pass, resumable on interrupt.
After scanning, a cross-reference linker builds two edge types:
- Imports → Dependencies — module imports matched against the dependency index.
- Calls → Callees — function call sites resolved to concept IDs.
Used by / Built for
okf-generator was originally built to index a large, multi-domain codebase (StockAI/TrainLLMs) spanning Python data connectors, ML pipelines, and SQL schemas — the kind of project where giving an agent the whole repo as context is both slow and unaffordable in tokens. If you're working in a sprawling codebase and tired of re-explaining your own code to your AI agent every session, this is the tool that problem was built to solve.
Bundle Layout
The output mirrors your source tree — dependencies get their own organized namespace:
okf_bundle/
├── SUMMARY.md ← bird's-eye view for AI agents
├── index.md ← root navigation
├── log.md ← generation history
├── _dependencies/ ← all dependency concepts
│ ├── index.md ← lists ecosystems: pip, npm, cargo, ...
│ ├── pip/
│ │ ├── index.md
│ │ ├── requests.md ← Dependency concept
│ │ └── flask.md
│ └── npm/
│ ├── index.md
│ ├── express.md
│ └── react.md
└── StockAI/
└── RnD/
└── python/
└── connectors/
├── index.md ← lists all concepts in this folder
├── economic_data.md ← Module concept
└── economic_data/
├── WorldBankConnector.md ← Class
├── get_indicator.md ← Function
└── search.md ← Function
Each file is OKF v0.1 conformant:
---
type: Class
title: WorldBankConnector
description: Fetches World Bank development indicators via wbdata API.
resource: StockAI/RnD/python/connectors/economic_data.py
tags:
- lang:python
- type:Class
- module:StockAI
- domain:RnD
- git:branch:main
- git:repo:TrainLLMs
timestamp: '2026-05-23T09:01:21Z'
---
# WorldBankConnector
...signature, docstring, params, returns, methods, related concepts...
CLI Reference
Full documentation for every command:
okf --help Show available commands
okf <command> --help Show options for a specific command
okf --version Show version
| Command | Usage |
|---|---|
generate |
okf generate <source_dir> [output_dir] |
lookup |
okf lookup <query> |
diff |
okf diff <old_bundle> <new_bundle> |
pairs |
okf pairs <bundle_dir> [output_file] |
summarize |
okf summarize <bundle_dir> |
install |
okf install [claude | opencode | copilot | cursor | windsurf | cline] |
visualize |
okf visualize <bundle_dir> [output.html] |
See docs/cli-reference.md for full options, environment variables, and examples.
Supported Languages & Manifests
#supported-languages--manifests
Code Languages
| Language | Parser | Extracts |
|---|---|---|
| Python | stdlib ast |
Functions, classes, params, return types, docstrings |
| JavaScript / TypeScript | tree-sitter | Functions, arrow fns, classes, JSDoc |
| Go | tree-sitter | Funcs, methods, structs, interfaces, GoDoc |
| Java | tree-sitter | Classes, methods, constructors, Javadoc |
| Rust | tree-sitter | Fns, structs, enums, traits, impl blocks, /// |
| Ruby | tree-sitter | Defs, classes, modules, # comments |
| SQL | tree-sitter | Tables, views, functions, indexes, types, triggers with preceding --//* */ comments |
Manifest / Build Files
| Format | Parser | Extracts |
|---|---|---|
requirements.txt |
regex | pip package names + version constraints |
pyproject.toml |
tomllib |
PEP 621 deps + optional-dependencies + Poetry legacy |
package.json |
json |
npm/Node dependencies + devDependencies |
Cargo.toml |
tomllib |
Rust crate deps + dev/build-dependencies |
Cargo.lock |
tomllib |
Rust lockfile — pinned versions from [[package]] entries |
go.mod |
regex | Go module deps + // indirect flag |
go.sum |
regex | Go checksum lockfile — deduplicated module versions |
poetry.lock |
tomllib |
Python Poetry lockfile — [[package]] with dev category detection |
composer.json |
json |
PHP packages (skips php/ext-* platform entries) |
pom.xml |
xml.etree.ElementTree |
Maven dependencies + test/provided scope → dev |
Gemfile |
regex | Ruby gems + group :test/:development → dev |
build.gradle / .kts |
regex | Gradle deps (Groovy + Kotlin DSL) + testImplementation → dev |
Package.swift |
regex | SwiftPM packages from .package(url:from:) |
project.clj |
regex | Clojars deps + :dev profile |
mix.exs |
regex | Hex packages + only: :dev/:test → dev |
LLM Enrichment
Works with any OpenAI-compatible endpoint — Claude, Ollama, llama.cpp, etc:
# Using a local llama.cpp server
OKF_ENRICH=1 \
OKF_BASE_URL="http://localhost:8080/v1" \
OKF_API_KEY="llamabarn" \
OKF_MODEL="ggml-org/gemma-3-4b-it-qat-GGUF:Q4_0" \
OKF_MAX_WORKERS=2 \
okf generate ./my_project ./okf_bundle
Enrichment is resumable — interrupt and rerun freely. Already-enriched concepts are skipped.
AI Agent Integration
okf-generator works with any AI coding agent — the output is plain markdown files that every agent can read.
OpenCode / Claude Code
# Tell your agent about the bundle
cat >> AGENTS.md << 'EOF'
## OKF Knowledge Bundle
Before working on any class or function, look it up:
okf lookup --bundle ./okf_bundle <ConceptName>
EOF
# Add a custom command (OpenCode)
mkdir -p .opencode/commands
echo "RUN okf lookup --bundle ./okf_bundle \$NAME" > .opencode/commands/lookup.md
Then: /lookup NAME=WorldBankConnector
Cursor / Windsurf / Cline
Add to .cursorrules or agent instructions:
Before editing a function or class, run:
okf lookup --bundle ./okf_bundle <Name>
To see dependencies:
okf lookup --bundle ./okf_bundle --type Dependency
GitHub Copilot
Reference OKF bundle files in your /.github/copilot-instructions.md:
Project knowledge is indexed in ./okf_bundle/
- okf lookup <Name> returns full concept context
- okf lookup --type Dependency returns dependency info
Recommended system prompt
When setting up agent instructions, include:
This project has an OKF knowledge bundle at ./okf_bundle/.
- Use `okf lookup <Name>` to get full concept context.
- Use `okf lookup --type <Type>` to filter by type (Class, Function, Dependency).
- Use `okf lookup --tag ecosystem:<name>` for ecosystem-specific queries.
- Read `SUMMARY.md` for the full knowledge map.
Token efficiency
| Optimization | How okf-generator helps | Agent impact |
|---|---|---|
| Deterministic types | Every concept has type: Function, type: Class, type: Dependency |
Agent filters by type precisely |
| Incremental access | okf lookup <Name> returns one concept, not whole files |
Saves 80-95% token cost vs reading source |
| Structured metadata | Signature, params, returns in YAML frontmatter | Agent extracts info without parsing code |
| Cross-reference edges | Calls / Called By / Used By in each concept | Enables multi-hop reasoning without grep |
Any agent with RUN capability
#any-agent-with-run-capability
# Prime full context
cat ./okf_bundle/SUMMARY.md
# Look up a specific concept
okf lookup --bundle ./okf_bundle WorldBankConnector
# List dependencies
okf lookup --bundle ./okf_bundle --type Dependency
# JSON for programmatic agent use
okf lookup --bundle ./okf_bundle --json WorldBankConnector
See docs/opencode-integration.md for full OpenCode setup.
Python API
See docs/python-api.md for the full API reference.
from okf.generator import scan_codebase, write_bundle, write_summary
from okf.lookup import load_bundle, search
concepts = scan_codebase("./my_project")
write_bundle(concepts, "./okf_bundle", "my_project", ["initial generation"])
write_summary("my_project", concepts, "./okf_bundle", {})
Training Data
Convert your OKF bundle into JSONL training pairs for fine-tuning:
# 5 pair types: codegen, qa, doc, summarize, crosslink
okf pairs ./okf_bundle ./train.jsonl
Each pair is in chat format compatible with most fine-tuning pipelines.
Agent Installation
Install integration for any AI agent in one command:
# Install for all detected agents
okf install all
# Or pick specific agents
okf install claude # Claude Code skill
okf install opencode # OpenCode /lookup command
okf install copilot # GitHub Copilot instructions
okf install cursor # Cursor rules
okf install windsurf # Windsurf rules
okf install cline # Cline rules
What each install does:
| Agent | Files created | Effect |
|---|---|---|
| Claude Code | ~/.config/opencode/skills/okf-generator/SKILL.md |
Auto-triggers on phrases like "index my codebase" |
| OpenCode | .opencode/commands/lookup.md |
/lookup NAME=<ConceptName> |
| Copilot | .github/copilot-instructions.md |
Auto-loaded in VS Code |
| Cursor | .cursorrules |
Auto-loaded by Cursor |
| Windsurf | .windsurfrules |
Auto-loaded by Windsurf |
| Cline | .clinerules |
Auto-loaded by Cline |
Or via the one-liner installer:
curl -fsSL https://raw.githubusercontent.com/UmairBaig8/okf-generator/main/scripts/install.sh | bash
FAQ
Does this require an API key or internet connection?
No. Core extraction (okf generate) is fully offline and deterministic — no LLM call is made unless you explicitly enable OKF_ENRICH=1.
How is this different from RAG / vector search?
RAG retrieves chunks by semantic similarity, which is approximate and can miss exact symbols. okf lookup is exact: it indexes real functions, classes, modules, and dependencies by name and resolves to the precise concept, with zero embedding/vector infrastructure required.
What happens if my language isn't supported?
Unsupported files are skipped, not dropped silently from the bundle log — log.md records what was scanned. Adding a new language is a self-contained tree-sitter grammar mapping; see CONTRIBUTING.md for a starting point — it's a listed good-first-issue.
Does this work on monorepos / very large codebases?
Yes — the bundle mirrors your source tree, so scanning is linear in file count. For very large repos, scope okf generate to a subdirectory if you only need part of the codebase indexed.
Can I use this without any LLM at all, ever?
Yes. okf generate + okf lookup together form a complete, zero-LLM workflow. LLM enrichment and okf pairs synthesis are optional layers on top.
Is the bundle safe to commit to git? Yes, and that's the intended workflow — bundles are plain markdown, diff cleanly, and version alongside the code they describe.
Contributing
Contributions are welcome! See CONTRIBUTING.md for guidelines.
git clone https://github.com/UmairBaig8/okf-generator
cd okf-generator
pip install -e ".[dev]"
pytest tests/
Good first issues: adding a new language parser, improving fuzzy search scoring, adding incremental/diff-based regeneration.
Acknowledgments
okf-generator is an independent, third-party implementation of the Open Knowledge Format (OKF) v0.1, a knowledge-representation spec introduced by Google Cloud in June 2026. See the full v0.1 specification for the conformance rules this generator targets. This project is not built, maintained, or endorsed by Google.
License
MIT — Copyright © 2026 Umair Baig
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file okf_generator-0.1.22.tar.gz.
File metadata
- Download URL: okf_generator-0.1.22.tar.gz
- Upload date:
- Size: 72.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e4c8f02e56272434537da09ba8bda44eeab02c22cdb8bda39201920c7a58d11
|
|
| MD5 |
340d09be0c1d761ceabcdd3005809454
|
|
| BLAKE2b-256 |
8ea67ba0849deed22d4c9fd05b9d976aa8d5dc3029aca7af4bcf7ad737605999
|
Provenance
The following attestation bundles were made for okf_generator-0.1.22.tar.gz:
Publisher:
publish.yml on UmairBaig8/okf-generator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
okf_generator-0.1.22.tar.gz -
Subject digest:
1e4c8f02e56272434537da09ba8bda44eeab02c22cdb8bda39201920c7a58d11 - Sigstore transparency entry: 2040066847
- Sigstore integration time:
-
Permalink:
UmairBaig8/okf-generator@441b3364c13dd7abcce3fbca3c265ed24fc5809e -
Branch / Tag:
refs/tags/v0.1.22 - Owner: https://github.com/UmairBaig8
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@441b3364c13dd7abcce3fbca3c265ed24fc5809e -
Trigger Event:
push
-
Statement type:
File details
Details for the file okf_generator-0.1.22-py3-none-any.whl.
File metadata
- Download URL: okf_generator-0.1.22-py3-none-any.whl
- Upload date:
- Size: 64.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ae52ff534deaa7456cccbe1f5e67f258451e7f1c9dc4d8dbf77b35396ea6645
|
|
| MD5 |
864cae5e98852db1bcd1938757893a72
|
|
| BLAKE2b-256 |
5901c789ae35806052829aad5ae3656de4953057f0920af7957f8124c7c09588
|
Provenance
The following attestation bundles were made for okf_generator-0.1.22-py3-none-any.whl:
Publisher:
publish.yml on UmairBaig8/okf-generator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
okf_generator-0.1.22-py3-none-any.whl -
Subject digest:
4ae52ff534deaa7456cccbe1f5e67f258451e7f1c9dc4d8dbf77b35396ea6645 - Sigstore transparency entry: 2040067075
- Sigstore integration time:
-
Permalink:
UmairBaig8/okf-generator@441b3364c13dd7abcce3fbca3c265ed24fc5809e -
Branch / Tag:
refs/tags/v0.1.22 - Owner: https://github.com/UmairBaig8
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@441b3364c13dd7abcce3fbca3c265ed24fc5809e -
Trigger Event:
push
-
Statement type: