Skip to main content

A Python library to generate Google Open Knowledge Format (OKF) bundles from various data sources.

Project description

google-okf

PyPI version Python versions License: MIT

google-okf is a production-grade, open-source Python library designed to automatically connect to various enterprise data sources and convert them into the standard Google Open Knowledge Format (OKF).

By standardizing database schemas, collections, documentation, playbooks, and APIs into clean Markdown files with structured YAML frontmatter, google-okf acts as the critical intermediate context-assembly layer for Retrieval-Augmented Generation (RAG) pipelines and agentic AI systems.


What is Google OKF?

The Open Knowledge Format (OKF v0.1) is an open, vendor-neutral standard introduced by Google Cloud to formalize the "LLM Wiki" pattern. Instead of feeding raw, fragmented, and inconsistent documentation or database formats directly into vector databases, OKF structures knowledge into a unified, version-controlled (Git-friendly) directory tree:

  1. YAML Frontmatter: Self-describing metadata for routing, filtering, and identification (requires a type field; recommends title, description, resource URI, tags, and timestamp).
  2. Markdown Body: Structurally clean, human- and LLM-readable text content.
  3. Semantic Links: Standard markdown links mapping relationships between concepts, enabling AI agents to walk and traverse a semantic knowledge graph.

Features

  • BaseProducer Interface: Simple, abstract class to write custom connectors for any internal system.
  • Flat File Connector: Imports a directory of documents (PDFs, DOCX, Markdown, TXT), extracts text, and maps them to OKF concepts.
  • SQL Database Connector: Connects to relational databases (MySQL, PostgreSQL, SQLite, etc. using SQLAlchemy) and auto-generates Markdown schemas with interactive cross-referenced links for foreign key relationships.
  • MongoDB NoSQL Connector: Connects to MongoDB, samples collection documents, dynamically infers schema structures, and generates collection concepts.
  • CLI Tool (google-okf): Initialize bundles, run producers, and validate/lint OKF link consistency.

Installation

Install the package via pip:

pip install google-okf

Or using uv (recommended):

uv add google-okf

Command Line Interface (CLI)

The library provides a CLI tool named google-okf.

1. Initialize a Bundle

Create a blank OKF folder structure with default subdirectories:

google-okf init my_knowledge_bundle

2. Run a Producer

Run a connector to extract metadata from a source and write it into a bundle folder:

Flat Files:

google-okf produce --type files --src-dir ./raw_documents --out-dir my_knowledge_bundle

MySQL / Relational DB:

google-okf produce --type mysql --uri "mysql+pymysql://user:pass@host:port/dbname?ssl-mode=REQUIRED" --out-dir my_knowledge_bundle

MongoDB:

google-okf produce --type mongodb --uri "mongodb+srv://user:pass@cluster.mongodb.net/" --db-name "my_database" --out-dir my_knowledge_bundle

3. Using a Configuration File

You can also store connection details in a YAML config file (config.yaml):

producer: mysql
uri: "mysql+pymysql://user:pass@host:port/dbname?ssl-mode=REQUIRED"
output_dir: "my_knowledge_bundle"

And run it with:

google-okf produce --config config.yaml

4. Lint and Validate Compliance

Verify that all YAML frontmatter compiles, required fields are present, and all internal relative markdown links resolve correctly:

google-okf lint my_knowledge_bundle

Programmatic Usage

1. Flat Files Import

from google_okf import DocumentProducer, write_bundle

# Initialize producer for a folder of documents
producer = DocumentProducer(
    source_dir="./financial_statements",
    output_prefix="documents/financials",
    tags=["finance", "annual-report"]
)

# Extract and write to bundle
concepts = producer.produce()
write_bundle("my_okf_bundle", concepts)

2. MySQL / SQL Database Import

from google_okf import MySQLProducer, write_bundle

# Connection URI (uses SQLAlchemy syntax)
connection_uri = "mysql+pymysql://user:pass@host:16512/defaultdb?ssl-mode=REQUIRED"

producer = MySQLProducer(
    connection_uri=connection_uri,
    output_prefix="database/tables",
    schema="default"  # Optional schema filter
)

# Extract tables, column types, keys, and links
concepts = producer.produce()
write_bundle("my_okf_bundle", concepts)

3. MongoDB Collection Schema Inference

from google_okf import MongoDBProducer, write_bundle

producer = MongoDBProducer(
    connection_uri="mongodb+srv://user:pass@cluster.mongodb.net/?appName=Cluster",
    database_name="upi_db",
    output_prefix="database/collections",
    sample_size=15  # Number of documents to sample for type inference
)

# Sample documents, infer schemas, and write collections
concepts = producer.produce()
write_bundle("my_okf_bundle", concepts)

Troubleshooting

MongoDB SSL Handshake Failures (TLSV1_ALERT_INTERNAL_ERROR)

When connecting to a MongoDB Atlas cluster, you may encounter the following error:

SSL handshake failed: ac-xxx-shard.mongodb.net:27017: [SSL: TLSV1_ALERT_INTERNAL_ERROR] tlsv1 alert internal error (_ssl.c:1010)

Root Cause

This is a secure connection rejection enforced by the MongoDB Atlas cluster. It occurs because the client IP address from which your script is running is not whitelisted in your Atlas database security settings. Passing tlsAllowInvalidCertificates=True or modifying certificates in code will not bypass this, as the server firewall terminates the TLS negotiation immediately at the TCP/TLS layer.

Solution

  1. Log in to your MongoDB Atlas Console.
  2. Under the Security header in the left sidebar, click Network Access.
  3. Click + Add IP Address.
  4. Click Add Current IP Address to authorize your local machine, or input 0.0.0.0/0 (Allow Access from Anywhere) to permit connections from any network (useful for transient cloud deployments).
  5. Click Confirm and wait 1–2 minutes for the access list to deploy.

MySQL / PyMySQL SSL Connection Parameter Crashing

When passing connection URLs that require SSL (e.g. from providers like Aiven or AWS RDS), using ssl-mode=REQUIRED in query parameters can cause the PyMySQL driver to throw an exception: Connection.__init__() got an unexpected keyword argument 'ssl-mode'.

Solution

google-okf automatically intercepts query parameters containing ssl-mode, ssl_mode, or ssl=true. It strips them from the raw connection string so the driver does not crash, and configures the engine to pass the required SSL context parameters (connect_args={"ssl": {}}) under the hood to ensure an encrypted channel. No manual dictionary setup is necessary.


Contributing & Testing

Development is managed using uv or pip. To run unit tests locally:

# Clone the repository
git clone https://github.com/SachinMishra-ux/Open_Knowledge_Format.git
cd Open_Knowledge_Format

# Sync dependencies and run tests
uv sync
uv run python -m unittest discover tests

License

This project is licensed under the MIT License — see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

google_okf-0.1.0.tar.gz (24.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

google_okf-0.1.0-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file google_okf-0.1.0.tar.gz.

File metadata

  • Download URL: google_okf-0.1.0.tar.gz
  • Upload date:
  • Size: 24.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for google_okf-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ea3b5fe23fc7108bdd53248650c89f266ea3faeec0f6f105e45dc17273c5ab38
MD5 e59da51fbbfefc2c259904455bf28c75
BLAKE2b-256 b87628e4d0e0b887ac19a91c8e58c7d36fb0a3c7f96eb056119603709fb7240d

See more details on using hashes here.

File details

Details for the file google_okf-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: google_okf-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for google_okf-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9cb4a0bf757507c5610737f4bea41616e365977248ed031f96a44b630426ae1b
MD5 0b5c24d62d361f596cf44456ab115571
BLAKE2b-256 5c99e8ae1d7ad1524f183e2650a6889d64cd90c3956607433375c9e23ad9d0b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page