Generate OKF v0.1 knowledge bundles from codebases — Claude skill + OpenCode integration
Project description
Map any codebase into an interactive knowledge graph — for AI agents, local SLMs, and human architectural review.
Installation · Quick Start · Architecture · Agents · Local AI · CI/CD · Languages · FAQ
Visual Showcase
okf generate scans any repo using tree-sitter AST parsers, resolves cross-references across 10 languages, and outputs a structured knowledge graph. Explore it interactively or consume it programmatically — no LLM required.
# Generate a knowledge bundle from any codebase
okf generate ./my_project ./okf_bundle
# Explore as an interactive HTML dashboard
okf visualize ./okf_bundle
# Browse via local HTTP
okf serve ./okf_bundle --open
# Look up any concept in milliseconds
okf lookup WorldBankConnector
Quick Start
# Install
pip install okf-generator
# Generate a bundle from your project
okf generate ./my_project ./okf_bundle
# Look up a concept (zero LLM, instant)
okf lookup WorldBankConnector
# List all dependencies
okf lookup --deps
# Interactive bundle setup wizard
okf init
# Visualize as interactive HTML
okf visualize ./okf_bundle
Installation
# One-liner (macOS / Linux)
curl -fsSL https://raw.githubusercontent.com/UmairBaig8/okf-generator/main/scripts/install.sh | bash
# Or via pip
pip install okf-generator # core (offline extraction)
pip install "okf-generator[llm]" # with LLM enrichment + training pairs
Why — Code-Level Knowledge Graphs
AI coding agents waste enormous amounts of context re-reading entire files to find one function signature or dependency version. Cloud models with 200K token windows mask this cost; local SLMs (Gemma, Llama, Phi) on a MacBook run out of memory immediately.
okf-generator solves this by converting source code into a deterministic, cross-referenced knowledge graph. Using tree-sitter AST parsers across 10 languages, every function, class, module, and dependency becomes a structured node with typed edges (calls, called-by, imports, depends-on).
# Before touching WorldBankConnector, get its full graph context
okf lookup WorldBankConnector
CLASS: WorldBankConnector
Source : StockAI/RnD/python/connectors/economic_data.py line 51
Description : Fetches World Bank development indicators via wbdata API.
Methods : get_indicator, search
Signature : class WorldBankConnector
Calls : [wbdata.get_indicator, pandas.DataFrame]
Called-by : [DataPipeline.fetch_economic]
No re-reading the file. No guessing. No LLM call required.
How It Works
1. Scan — tree-sitter AST parsers extract every function, class, method, and module with signature, params, docstring, and return types across 10 languages.
2. Link — the cross-reference linker resolves two edge types:
- Imports → Dependencies — module imports matched against the dependency index.
- Calls → Callees — function call sites resolved to concept IDs.
3. Write — outputs an OKF v0.1 bundle: structured markdown files (one per concept) mirroring the source tree.
4. Consume — 8 commands: lookup, pairs, diff, visualize, mcp, serve, init, summarize.
LLM enrichment is optional, resumable, and works with any OpenAI-compatible endpoint (Claude, Ollama, llama.cpp). Extraction itself is fully deterministic and offline-capable.
Used by / Built for
okf-generator was originally built to index a large, multi-domain codebase (StockAI/TrainLLMs) spanning Python data connectors, ML pipelines, and SQL schemas — the kind of project where giving an agent the whole repo as context is both slow and unaffordable in tokens. If you are working in a sprawling codebase and tired of re-explaining your own code to your AI agent every session, this is the tool that problem was built to solve.
Bundle at a Glance
The output mirrors your source tree — dependencies get their own organized namespace:
okf_bundle/
├── SUMMARY.md ← bird's-eye view for AI agents
├── index.md ← root navigation
├── log.md ← generation history
├── _dependencies/ ← all dependency concepts
│ ├── index.md ← lists ecosystems: pip, npm, cargo, ...
│ ├── pip/
│ │ ├── index.md
│ │ ├── requests.md ← Dependency concept
│ │ └── flask.md
│ └── npm/
│ ├── index.md
│ ├── express.md
│ └── react.md
└── StockAI/
└── RnD/
└── python/
└── connectors/
├── index.md ← lists all concepts in this folder
├── economic_data.md ← Module concept
└── economic_data/
├── WorldBankConnector.md ← Class
├── get_indicator.md ← Function
└── search.md ← Function
Each file is OKF v0.1 conformant with YAML frontmatter:
---
type: Class
title: WorldBankConnector
description: Fetches World Bank development indicators via wbdata API.
resource: StockAI/RnD/python/connectors/economic_data.py
tags:
- lang:python
- type:Class
- module:StockAI
- domain:RnD
- git:branch:main
- git:repo:TrainLLMs
timestamp: '2026-05-23T09:01:21Z'
---
Interactive Visualization
okf visualize generates a self-contained HTML dashboard — no server, no installation, works offline:
okf visualize ./okf_bundle ./viz.html
# Open viz.html in any browser
The visualization uses D3.js with:
- Force-directed graph — color-coded nodes by concept type (Class, Function, Module, Dependency)
- Relationship edges — calls, called-by, imports, related
- Search/filter — by name, type, ecosystem
- Tooltip on hover — description + resource location
- Pan/zoom — navigate large graphs
- Dark/light theme — toggle at runtime
Multi-bundle monorepo support
If your bundle contains sub-bundles (detected by SUMMARY.md in subdirectories), the viz adds a bundle selector dropdown in the topbar to filter by project. Each sub-bundle's dependencies and source files are scoped under its own namespace.
# Combined viz with bundle switcher (cross-bundle edges preserved)
okf visualize ./okf_bundle
# Standalone viz per sub-bundle (smaller, faster)
okf visualize ./okf_bundle/AgentBox agentbox.html
okf visualize ./okf_bundle/StockAI stockai.html
The bundled viz is ideal for exploring relationships across projects; per-bundle viz files are better for focused navigation on a single project.
For AI Agents
Every concept in the bundle is deterministic, typed, and cross-referenced — agents get surgical precision without burning context:
| Capability | How |
|---|---|
| Zero-LLM lookups | okf lookup <Name> returns full concept detail in milliseconds |
| Type filters | `okf lookup --type Function |
| Ecosystem queries | okf lookup --tag ecosystem:pip |
| Source file queries | okf lookup --file path/to/file.py |
| JSON output | okf lookup --json <Name> for programmatic agent use |
| MCP protocol | okf mcp ./okf_bundle exposes via Model Context Protocol |
| Summary map | cat ./okf_bundle/SUMMARY.md primes full context |
Quick setup for any agent:
Add to your agent instructions or custom rules:
This project has an OKF knowledge bundle at ./okf_bundle/.
- Use `okf lookup <Name>` for full concept context.
- Use `okf lookup --type <Type>` to filter by type.
- Read `SUMMARY.md` for the full knowledge map.
Token efficiency
| Optimization | Agent impact |
|---|---|
| Incremental access — one concept, not whole files | Saves 80-95% token cost vs reading source |
| Structured metadata in YAML frontmatter | Agent extracts info without parsing code |
| Cross-reference edges (calls/called-by) | Multi-hop reasoning without grep |
| Deterministic types | Agent filters by type precisely |
Full agent integration guide — OpenCode commands, Cursor rules, Copilot instructions, MCP setup: docs/agent-integration.md
Automated agent setup —
okf install claude,okf install opencode,okf install cursor, etc: see Agent Installation.
For Local AI & SLMs
Cloud models have massive context windows. Local SLMs (Gemma 3 4B, Llama 3.2, Phi-3) running on a MacBook Pro or Air do not — they run out of memory if you try to feed an entire repository.
okf lookup solves this with exact-symbol retrieval: the agent sends a 50-token query and gets back a 200-token concept card. No embeddings, no vector DB, no RAG pipeline. This makes local coding assistants viable for enterprise-scale codebases.
# Enrichment with a local llama.cpp server (MacBook-friendly)
OKF_ENRICH=1 \
OKF_BASE_URL="http://localhost:8080/v1" \
OKF_API_KEY="llamabarn" \
OKF_MODEL="ggml-org/gemma-3-4b-it-qat-GGUF:Q4_0" \
OKF_MAX_WORKERS=2 \
okf generate ./my_project ./okf_bundle
Enrichment works with any OpenAI-compatible endpoint — Ollama, llama.cpp, vLLM, or cloud APIs (Claude, GPT). It is resumable: interrupt and rerun freely, already-enriched concepts are skipped.
For CI/CD Pipelines
Deterministic + fully offline = ideal for automated pipelines:
# .github/workflows/okf-bundle.yml
name: Generate OKF Bundle
on:
push:
branches: [main]
jobs:
generate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install okf-generator
- run: okf generate ./src ./okf_bundle
- uses: actions/upload-artifact@v4
with:
name: okf-bundle
path: ./okf_bundle
Push bundles to S3/GCS/Azure for centralized multi-tenant access. Serve them as static websites for zero-infrastructure browsing.
Full CI/CD guide — GitLab, pre-commit hooks, S3 static hosting, monorepo strategies: docs/ci-cd.md
Language & Manifest Coverage
Code Languages (10)
| Language | Parser | Extracts |
|---|---|---|
| Python | stdlib ast |
Functions, classes, methods, params, return types, docstrings, decorators, inheritance, type params |
| JavaScript / TypeScript | tree-sitter | Functions, arrow fns, classes, methods, JSDoc, generics, heritage (extends/implements) |
| Go | tree-sitter | Funcs, methods, structs, interfaces, GoDoc, type params (Go 1.18+) |
| Java | tree-sitter | Classes, methods, constructors, Javadoc, generics, inheritance (extends/implements), annotations |
| Rust | tree-sitter | Fns, structs, enums, traits, impl blocks, ///, generics, attributes |
| Ruby | tree-sitter | Defs, classes, modules, # comments, superclass |
| C | tree-sitter | Functions, structs with /** doc comments |
| C++ | tree-sitter | Functions, classes, structs, methods, templates, base classes |
| C# | tree-sitter | Classes, methods, generics, attributes, base types |
| SQL | tree-sitter | Tables, views, functions, indexes, types, triggers |
Manifest / Build Formats (17)
requirements.txt · pyproject.toml · package.json · Cargo.toml · Cargo.lock · yarn.lock · pnpm-lock.yaml · go.mod · go.sum · poetry.lock · composer.json · pom.xml · Gemfile · build.gradle / .kts · Package.swift · project.clj · mix.exs
Full table with parser details + architectural query examples: docs/languages-and-manifests.md
Architectural query example — find every microservice depending on a deprecated Rust crate:
okf lookup --type Dependency --tag ecosystem:cargo --compact
okf lookup --type Dependency openssl
Same logic works for pip, npm, go, maven — any of the 17 supported formats. Pin a vulnerable package version across every service in seconds.
CLI Reference
okf --help Show available commands
okf <command> --help Show options for a specific command
okf --version Show version
| Command | Usage |
|---|---|
generate |
okf generate <source_dir> [output_dir] |
lookup |
okf lookup <query> |
diff |
okf diff <old_bundle> <new_bundle> |
pairs |
okf pairs <bundle_dir> [output_file] |
summarize |
okf summarize <bundle_dir> |
install |
okf install [claude | opencode | copilot | cursor | windsurf | cline] |
init |
okf init [dir] |
visualize |
okf visualize <bundle_dir> [output.html] |
mcp |
okf mcp <bundle_dir> |
serve |
okf serve [dir] [--port] [--open] |
Full options, environment variables, and examples: docs/cli-reference.md
Training Data
Convert your OKF bundle into JSONL training pairs for fine-tuning:
# 5 pair types: codegen, qa, doc, summarize, crosslink
okf pairs ./okf_bundle ./train.jsonl
Each pair is in chat format compatible with most fine-tuning pipelines.
- Static pairs (no LLM):
SKIP_SYNTH=1 okf pairs ... - LLM-synthesized pairs: set
SYNTH_MODEL,QA_PER_CONCEPT,PAIR_TYPES
Python API
from okf.generator import scan_codebase, write_bundle, write_summary
from okf.lookup import load_bundle, search
concepts = scan_codebase("./my_project")
write_bundle(concepts, "./okf_bundle", "my_project", ["initial generation"])
write_summary("my_project", concepts, "./okf_bundle", {})
bundle = load_bundle("./okf_bundle")
results = search(bundle, tokens=["WorldBankConnector"])
Full API reference with
Conceptdataclass: docs/python-api.md
Agent Installation
Install integration for any AI agent in one command:
# Install for all detected agents
okf install all
# Or pick specific agents
okf install claude # Claude Code skill
okf install opencode # OpenCode /lookup command
okf install copilot # GitHub Copilot instructions
okf install cursor # Cursor rules
okf install windsurf # Windsurf rules
okf install cline # Cline rules
What each install does:
| Agent | Files created | Effect |
|---|---|---|
| Claude Code | ~/.config/opencode/skills/okf-generator/SKILL.md |
Auto-triggers on phrases like "index my codebase" |
| OpenCode | .opencode/commands/lookup.md |
/lookup NAME=<ConceptName> |
| Copilot | .github/copilot-instructions.md |
Auto-loaded in VS Code |
| Cursor | .cursorrules |
Auto-loaded by Cursor |
| Windsurf | .windsurfrules |
Auto-loaded by Windsurf |
| Cline | .clinerules |
Auto-loaded by Cline |
How It Compares
| okf-generator | Other OKF producers | |
|---|---|---|
| Language coverage | 10 languages (Python, JS/TS, Go, Java, Rust, Ruby, SQL, C, C++, C#) | Usually 1 language or doc-only |
| Cross-reference linking | Imports → dependencies, function calls → caller/callee across all languages | Not typically supported |
| Dependency/manifest parsing | 17 formats (pip, npm, cargo, go, maven, gradle, composer, rubygems, swiftpm, clojars, hex, +7) | Not typically supported |
| Extraction | Zero-LLM, deterministic, offline | Often LLM-required for every concept |
| Optional enrichment | Any OpenAI-compatible endpoint (Claude, local llama.cpp, Ollama) | Often locked to one vendor |
| Training data export | Built-in JSONL pair generator (5 pair types) | Not typically included |
| Agent compatibility | Any agent that can run a CLI (Claude Code, Cursor, Windsurf, Copilot, OpenCode, Cline) | Often single-agent focused |
If you are choosing between OKF producers: pick okf-generator when you want broad language + dependency coverage with zero mandatory LLM cost, and you want the bundle to double as a fine-tuning data source.
FAQ
Does this require an API key or internet connection?
No. Core extraction (okf generate) is fully offline and deterministic — no LLM call is made unless you explicitly enable OKF_ENRICH=1.
How is this different from RAG / vector search?
RAG retrieves chunks by semantic similarity, which is approximate and can miss exact symbols. okf lookup is exact: it indexes real functions, classes, modules, and dependencies by name and resolves to the precise concept, with zero embedding/vector infrastructure required.
What happens if my language is not supported?
Unsupported files are skipped, not dropped silently — log.md records what was scanned. Adding a new language is a self-contained tree-sitter grammar mapping; see CONTRIBUTING.md — it is a listed good-first-issue.
Does this work on monorepos / very large codebases?
Yes — the bundle mirrors your source tree, so scanning is linear in file count. For very large repos, scope okf generate to a subdirectory if you only need part of the codebase indexed.
Can I use this without any LLM at all, ever?
Yes. okf generate + okf lookup together form a complete, zero-LLM workflow. LLM enrichment and okf pairs synthesis are optional layers on top.
Is the bundle safe to commit to git? Yes, and that is the intended workflow — bundles are plain markdown, diff cleanly, and version alongside the code they describe.
Contributing
git clone https://github.com/UmairBaig8/okf-generator
cd okf-generator
pip install -e ".[dev]"
pytest tests/
Good first issues: adding a new language parser, improving fuzzy search scoring, adding incremental/diff-based regeneration.
See CONTRIBUTING.md for full guidelines.
Acknowledgments
okf-generator is an independent, third-party implementation of the Open Knowledge Format (OKF) v0.1, a knowledge-representation spec introduced by Google Cloud in June 2026. See the full v0.1 specification for the conformance rules this generator targets.
This project is not built, maintained, or endorsed by Google.
License
MIT — Copyright © 2026 Umair Baig
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file okf_generator-0.1.29.tar.gz.
File metadata
- Download URL: okf_generator-0.1.29.tar.gz
- Upload date:
- Size: 149.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d10dade5a7b6df055f741d0ad024092908ec09b937c8030990e90e435a0204e8
|
|
| MD5 |
60543cd2520d454e73c1ec88b45b3309
|
|
| BLAKE2b-256 |
e9e62cc840ec1cba4eb4db1386742074307b6e0ad6efa65d1b42052ec7c6fa2d
|
Provenance
The following attestation bundles were made for okf_generator-0.1.29.tar.gz:
Publisher:
publish.yml on UmairBaig8/okf-generator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
okf_generator-0.1.29.tar.gz -
Subject digest:
d10dade5a7b6df055f741d0ad024092908ec09b937c8030990e90e435a0204e8 - Sigstore transparency entry: 2048555341
- Sigstore integration time:
-
Permalink:
UmairBaig8/okf-generator@33292775305053ec6451ee3684298ee7a148bbb1 -
Branch / Tag:
refs/tags/v0.1.29 - Owner: https://github.com/UmairBaig8
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@33292775305053ec6451ee3684298ee7a148bbb1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file okf_generator-0.1.29-py3-none-any.whl.
File metadata
- Download URL: okf_generator-0.1.29-py3-none-any.whl
- Upload date:
- Size: 96.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7fcf6279576f952c1919b9861c6773719541b7c1a12986e5a2f099e5bfa54586
|
|
| MD5 |
5d6672d3f39bd352fa360f58a1d203e4
|
|
| BLAKE2b-256 |
3e4ef90e5cee3b258e327f103c4c986082c763d60862eb69f7855d455e5e3a00
|
Provenance
The following attestation bundles were made for okf_generator-0.1.29-py3-none-any.whl:
Publisher:
publish.yml on UmairBaig8/okf-generator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
okf_generator-0.1.29-py3-none-any.whl -
Subject digest:
7fcf6279576f952c1919b9861c6773719541b7c1a12986e5a2f099e5bfa54586 - Sigstore transparency entry: 2048555352
- Sigstore integration time:
-
Permalink:
UmairBaig8/okf-generator@33292775305053ec6451ee3684298ee7a148bbb1 -
Branch / Tag:
refs/tags/v0.1.29 - Owner: https://github.com/UmairBaig8
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@33292775305053ec6451ee3684298ee7a148bbb1 -
Trigger Event:
push
-
Statement type: