Turn any repo, database, docs site, or open-data portal into an AI-ready Open Knowledge Format (OKF) knowledge graph — deterministically, no LLM required.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

bushans

These details have not been verified by PyPI

Project links

Specification

Project description

okfgen

Point it at a repo, database, docs site, or open-data portal.
Get a portable, agent-ready knowledge graph in seconds — no LLM, no API key, no lock-in.

Python 3.9+ License Apache-2.0 No LLM required Zero core dependencies

▶ Explore the live interactive knowledge graphs →

okfgen is a deterministic reference implementation of both sides of the Open Knowledge Format (OKF) — Google's vendor-neutral standard for representing the knowledge around your data and systems as just markdown files with YAML frontmatter (announcement).

Producers turn a source system, database, docs site, or live open-data portal into a bundle.
Consumers read a bundle back out — a viewer, a search index, a reasoning agent.

It extracts structured facts straight from the source (schemas, file structure, READMEs, dependency manifests, page headings). No LLM and no API key are required; an optional --llm flag adds Claude-powered enrichment where you want it.

              PRODUCERS                              CONSUMERS
   git repo  ─┐                          ┌─  visualize  → interactive HTML graph
   database  ─┤                          ├─  search     → full-text index
  open data  ─┼─►  generate ─► BUNDLE ─► ┼─  ask        → reasoning agent
   local dir ─┤        │       (.md +    └─  validate   → conformance check
   web docs  ─┘        ▼      frontmatter)
                    enrich  (pass 2: join paths, backlinks, citations)

Quickstart (30 seconds)

Zero-install with uv — turn this directory into a knowledge graph and open it:

uvx --from git+https://github.com/bushans/okfgen okfgen generate . -o my-okf
uvx --from git+https://github.com/bushans/okfgen okfgen visualize my-okf -o my-okf/graph.html
# then open my-okf/graph.html

Or with pip:

pip install git+https://github.com/bushans/okfgen.git   # PyPI release coming soon
okfgen generate . -o my-okf && okfgen visualize my-okf -o my-okf/graph.html

🎥 Demo GIF coming soon (see docs/RECORD_DEMO.md) — meanwhile the live gallery is fully interactive.

Why okfgen

One command, any source → a knowledge graph. Code, databases, docs, and live open-data portals — the same tool, the same output.
Deterministic, offline, no API key. Reproducible facts, not LLM hallucinations. Runs in air-gapped environments. The LLM is strictly opt-in.
Open format, zero lock-in. Output is plain markdown + YAML you can read, diff in git, and grep — not a proprietary database.
Agent-ready. Search, a citation-backed reasoning agent, and a portable JSON index make bundles first-class context for RAG and AI agents.
A viewer you can email. The visualizer is a single self-contained HTML file — no backend, no CDN, data never leaves the page.
Reference implementation of an open standard. Tracks the OKF v0.1 spec; every bundle it emits passes its own conformance validator.

How it compares

	okfgen	Data catalogs (DataHub / Amundsen)	LLM auto-doc tools	Hand-written wiki
Runs with no server/DB to deploy	✅	❌	⚠️	✅
No API key / no LLM required	✅	✅	❌	✅
Deterministic & reproducible	✅	✅	❌	✅
Open, plain-markdown output (no lock-in)	✅	❌	⚠️	✅
Code and DB and docs and live open data	✅	⚠️ data only	⚠️	manual
Self-contained interactive graph viewer	✅	⚠️ needs server	❌	❌
Agent-ready (search + reasoning over bundle)	✅	⚠️	✅	❌
Time to first result	seconds	hours–days	minutes	∞

Install

pip install git+https://github.com/bushans/okfgen.git   # core: git, local, web, schema (zero deps)
# from a clone, for development:
pip install -e '.[all]'     # add BigQuery, Firebase, PyYAML

Optional extras: .[bigquery], .[firebase], .[yaml], .[dev].

Producers — make a bundle from your data

okfgen generate https://github.com/psf/requests.git   # a source system (git)
okfgen generate ./my-project                          # a source system (local)
okfgen generate schema:./warehouse.schema.json        # a database (offline)
okfgen generate schema:./ddl.sql                      # a database (SQL DDL)
okfgen generate bq:my-gcp-project                     # BigQuery datasets/tables
okfgen generate firebase:my-firebase-project          # Firestore collections
okfgen generate https://docs.mytool.dev/              # a documentation site
okfgen generate ckan:https://portal/dataset/some-set  # a live CKAN open-data portal
okfgen generate socrata:https://data.cityofnewyork.us/d/erm2-nwe9  # a live Socrata dataset

Input	Detected as	What it extracts
`git@…` / `*.git` / github URL	`git`	shallow-clones, then scans like a local dir
a directory path	`local`	README overview, per-directory code modules (functions/classes/types), doc files, dependency inventory
`schema:FILE.json` / `.sql`	`schema`	dataset + table concepts with full column schemas — no cloud creds
`bq:PROJECT`	`bigquery`	one concept per dataset and per table, with column schemas
`firebase:PROJECT`	`firebase`	one concept per Firestore collection, fields/types inferred from sampled docs
`ckan:PORTAL/dataset/SLUG`	`ckan`	a live CKAN open-data dataset → one concept per resource, with live column schemas + example rows from the DataStore. No auth; works against data.gov, data.gov.au, the EU portal, city portals, etc.
`socrata:DOMAIN/d/4x4-ID`	`socrata`	a live Socrata dataset (NYC Open Data, Seattle, Chicago, many state portals) → Dataset + Table concepts with live column schema + descriptions + example rows. No auth.
`http(s)://…`	`web`	crawls same-host pages (depth/page budget) into one concept per page

Cloud sources use Application Default Credentials (gcloud auth application-default login). Output goes to ./<name>-okf/.

The enrichment agent (pass 2)

Producers draft concepts; the enrichment agent enriches them — exactly the two-pass pattern from the OKF blog. Deterministically, it infers join paths between tables from foreign-key naming (customer_id → customers) and wires backlinks so the graph is navigable both ways:

okfgen enrich ./my-okf                 # in place
okfgen enrich ./my-okf -o ./enriched   # to a new directory
okfgen enrich ./my-okf --llm           # also rewrite descriptions via Claude

Consumers — read a bundle back out

The OKF value proposition is producer/consumer independence: any consumer works on any bundle, regardless of who produced it.

# Viewer: a self-contained interactive graph (no backend, no CDN, data stays local)
okfgen visualize ./my-okf -o graph.html

# Search index: full-text, TF-IDF ranked
okfgen search ./my-okf "weekly active users"
okfgen search ./my-okf --export index.json      # portable JSON index

# Reasoning agent: retrieves concepts, follows join links, answers with citations
okfgen ask ./my-okf "how do orders relate to customers?"
okfgen ask ./my-okf "..." --llm                 # phrase answer via Claude

# Conformance validation
okfgen validate ./my-okf --strict

okfgen ask shows its work — the retrieved concepts, the links it traversed, and the citations behind the answer — so the reasoning is auditable.

Use it inside your AI agent (MCP)

okfgen ships an MCP server, so Claude Desktop, Claude Code, Cursor, and any Model Context Protocol client can produce and reason over OKF bundles without leaving the agent.

pip install "okfgen[mcp]"
okfgen-mcp            # stdio MCP server

{
  "mcpServers": {
    "okfgen": { "command": "okfgen-mcp" }
  }
}

Exposed tools: okfgen_generate, okfgen_search, okfgen_ask, okfgen_validate, okfgen_visualize, okfgen_list_source_types. Now an agent can say "catalog this database and tell me how orders join to customers" and get grounded, cited answers.

Sample bundles

Browse the sample knowledge graphs online: https://bushans.github.io/okfgen/

Ready-to-browse bundles live in samples/bundles/. Open any graph.html in a browser, or point the consumers at them. The same visualizers are published to GitHub Pages from docs/ (regenerate with python samples/build_pages.py).

Three offline, reproducible bundles (database, source system, docs site): python samples/build_samples.py
One live public-data bundle — Toronto Beaches Water Quality from the Toronto Open Data CKAN portal: python samples/build_live_samples.py

See samples/README.md for details.

Output layout

<name>-okf/
├── index.md            # root listing + okf_version: "0.1"
├── log.md              # generation / enrichment log (ISO-dated)
├── overview.md         # the root "Project" / "Data Project" concept
├── dependencies.md     # parsed manifests (git/local)
├── docs/…              # documentation concepts
├── modules/…           # per-directory code concepts (git/local)
├── datasets/… tables/… # database / BigQuery concepts
├── collections/…       # Firestore concepts
├── pages/…             # web page concepts
└── graph.html          # (after `visualize`) the interactive viewer

Every concept carries the required type frontmatter field plus recommended title/description/resource/tags/timestamp, and bodies use the conventional OKF # Schema, # Examples, # Citations, # Joins headings.

Design notes

Deterministic by default. git/local/web/schema run on the standard library alone (zero third-party deps). Cloud SDKs and the LLM are optional extras, loaded lazily and off unless you ask.
Producer/consumer split. Consumers depend only on markdown + frontmatter (okfgen/consumer.py), never on producer internals.
Scriptable. Every command prints its primary output path to stdout and logs to stderr: BUNDLE=$(okfgen generate ./repo).

Development

pip install -e '.[dev]'
pytest

New to the project? TESTING.md is a step-by-step VS Code walkthrough: environment setup, running the test suite, and driving every producer/consumer command locally.

License

Apache-2.0.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

bushans

These details have not been verified by PyPI

Project links

Specification

Release history Release notifications | RSS feed

This version

0.1.0

Jul 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

okfgen-0.1.0.tar.gz (58.6 kB view details)

Uploaded Jul 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

okfgen-0.1.0-py3-none-any.whl (58.0 kB view details)

Uploaded Jul 1, 2026 Python 3

File details

Details for the file okfgen-0.1.0.tar.gz.

File metadata

Download URL: okfgen-0.1.0.tar.gz
Upload date: Jul 1, 2026
Size: 58.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for okfgen-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`13153bc8d09bd36d6c22b88f307f16d2bbf5debfb294dcdaa4c760c2460fbe15`
MD5	`1f0ecbecfa271df56919da404652f2e6`
BLAKE2b-256	`cd635b980aee0a6eb14f5f49499fd746d35e60c88409d12197e51cd1d52438b4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for okfgen-0.1.0.tar.gz:

Publisher: publish.yml on bushans/okfgen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: okfgen-0.1.0.tar.gz
- Subject digest: 13153bc8d09bd36d6c22b88f307f16d2bbf5debfb294dcdaa4c760c2460fbe15
- Sigstore transparency entry: 2036020700
- Sigstore integration time: Jul 1, 2026
Source repository:
- Permalink: bushans/okfgen@c4fdfd49f78c13a7bdb6211d1e5e1ee15552e2bd
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/bushans
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c4fdfd49f78c13a7bdb6211d1e5e1ee15552e2bd
- Trigger Event: release

File details

Details for the file okfgen-0.1.0-py3-none-any.whl.

File metadata

Download URL: okfgen-0.1.0-py3-none-any.whl
Upload date: Jul 1, 2026
Size: 58.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for okfgen-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4850d9f1fad9e9308ec47c6ce46058a9aa50977b74a4b76137412a2b231b99cf`
MD5	`6d6fadbb9249a0b8e45a23cf318a02c4`
BLAKE2b-256	`49ba2fd1ec83a47a4761bcf2d45dc42519704869fdd91e6474ee1534c3c02117`

See more details on using hashes here.

Provenance

The following attestation bundles were made for okfgen-0.1.0-py3-none-any.whl:

Publisher: publish.yml on bushans/okfgen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: okfgen-0.1.0-py3-none-any.whl
- Subject digest: 4850d9f1fad9e9308ec47c6ce46058a9aa50977b74a4b76137412a2b231b99cf
- Sigstore transparency entry: 2036020955
- Sigstore integration time: Jul 1, 2026
Source repository:
- Permalink: bushans/okfgen@c4fdfd49f78c13a7bdb6211d1e5e1ee15552e2bd
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/bushans
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c4fdfd49f78c13a7bdb6211d1e5e1ee15552e2bd
- Trigger Event: release

okfgen 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

okfgen

Quickstart (30 seconds)

Why okfgen

How it compares

Install

Producers — make a bundle from your data

The enrichment agent (pass 2)

Consumers — read a bundle back out

Use it inside your AI agent (MCP)

Sample bundles

Output layout

Design notes

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance