Skip to main content

Khora — knowledge graph + vector + SQL storage library

Project description

Khora

CI Release codecov Python 3.13+ License: Apache 2.0

"Khora is the receptacle, the space, the matrix in which all things come to be." — Plato, Timaeus

Khora is a Khora library for Python 3.13+. It stores knowledge as a mix of documents, vectors, and graph relationships and retrieves it through hybrid search (vector + graph + keyword), reranking, and temporal context.

Khora is a library, not an application. CLI tooling lives in sibling packages:

  • khora-cliextract / search commands for ingesting files and querying namespaces.
  • khora-explorer — ontology construction (construct / validate / preview).

Install

pip install khora                 # core (PostgreSQL + pgvector)
pip install khora[sqlite-lance]   # [experimental] embedded SQLite + LanceDB
pip install khora[surrealdb]      # [experimental] unified SurrealDB (single store)
pip install khora[all-backends]   # everything: Neo4j, SurrealDB, SQLite+LanceDB, Weaviate, AGE

See docs/configuration.md for the full extras list. The kuzu extra is deprecated in 0.9.0 and scheduled for removal in 0.10.

Production stack

The production-ready combination in v0.9.0 is PostgreSQL + pgvector + Neo4j:

  • VectorCypher (default engine) — runs on PostgreSQL + pgvector + Neo4j.
  • Chronicle — runs on PostgreSQL + pgvector (no graph DB required).
  • Skeleton — available; PostgreSQL + pgvector (no graph DB required).

Set KHORA_DATABASE_URL and KHORA_NEO4J_URL, run uv run alembic upgrade head, then instantiate Khora() with no arguments:

import asyncio
from khora import Khora

async def main() -> None:
    async with Khora() as lake:  # reads KHORA_DATABASE_URL / KHORA_NEO4J_URL
        ns = await lake.create_namespace("demo")
        await lake.remember(
            "Marie Curie won the Nobel Prize in Physics in 1903.",
            namespace=ns.namespace_id,
        )
        result = await lake.recall("What did Curie win?", namespace=ns.namespace_id)
        print(result.context_text)

asyncio.run(main())

Batch processing

submit_batch() stages documents as PENDING and returns a BatchHandle immediately. A background processor picks them up and calls on_result per document as each completes.

The processor is opt-in. Call lake.start_pending_processor() after connect() on services that write documents. Read-only services do not need it. The processor can be stopped with await lake.stop_pending_processor() and restarted at any time.

async with Khora() as lake:
    lake.start_pending_processor()   # opt-in; write-path services only
    handle = await lake.submit_batch(
        [{"content": "doc 1"}, {"content": "doc 2"}],
        on_result=lambda completed, total, result: print(result),
        namespace=ns_id,
    )
    await handle.wait()

Embedded options (experimental)

Khora ships two zero-infrastructure paths. Both are marked experimental in v0.9.0 — fine for demos, evaluation, tests, and small single-user CLIs; not yet stamped as a deployment story.

  • SQLite + LanceDB (pip install khora[sqlite-lance], set KHORA_STORAGE_BACKEND=sqlite_lance) — recommended embedded stack. Covers VectorCypher, Skeleton, and Chronicle via dialect-aware Alembic migrations and LanceDB-backed vector search. Documented scale ceiling: ~1M chunks, ~100k entities, ~500k edges, traversal depth ≤3. Known gaps: no point-in-time queries (DYT-3550), partial atomicity in coordinator.transaction(), FTS on chunks only. See configuration.md.
  • SurrealDB (pip install khora[surrealdb]) — unified relational + vector + graph in one store. Python SDK is on the alpha track (>=2.0.0a1), and KNN (<|K|>) is unreliable in embedded mode (uses brute-force cosine + HNSW fallback). Suitable for experimentation; not recommended for production.

Quickstart caveat. A literal Khora("memory://") call passes "memory://" as the PostgreSQL URL, not as a backend selector — there is no memory:// URL scheme parsed by the lake itself today. To use the embedded path, set KHORA_STORAGE_BACKEND=sqlite_lance (or surrealdb) and the corresponding db_path / connection settings. Routing a true memory:// URI to the SQLite+LanceDB stack is tracked for v0.10.

Observability

khora emits OpenTelemetry spans and metrics via Logfire and records structured LLMEvent / StorageEvent / PipelineEvent rows to PostgreSQL when a collector is configured. Both integrations are opt-in — without them, all instrumentation is a zero-cost no-op.

  • Public surface is documented in docs/telemetry-contract.json (with explainer at docs/telemetry-contract.md). It lists every public span, metric, pipeline stage, event-type field, and khora.telemetry.__all__ export. Items tagged stability: public are part of khora's API surface and follow standard semver — breaking changes require a major version bump. Drift is enforced in CI via tests/unit/telemetry/test_contract.py. See ADR-026.

  • OTel semantic conventions apply to attributes: gen_ai.* for LLM calls, db.* for storage, code.* for stack info. Vendor-neutral over the OTel exporter chain.

  • Logfire integration is opt-in via the [logfire] extra:

    pip install khora[logfire]
    
    import logfire
    from khora import Khora
    
    logfire.configure(service_name="my-service")
    # khora's @trace decorators and trace_span() context managers
    # now emit spans automatically; metrics like khora.memory.recall.duration,
    # khora.llm.tokens, khora.llm.cost_usd, khora.chronicle.abstention_signal
    # are exported on the standard OTel cadence.
    

    Without the logfire extra installed, trace_span() yields a no-op and metric_* registrations short-circuit.

  • Structured event recording is opt-in via KHORA_TELEMETRY_DATABASE_URL (PostgreSQL). When set, TelemetryCollector writes LLMEvent / StorageEvent / PipelineEvent rows for downstream cost tracking and incident reconstruction. Without it, NoOpCollector is used (zero cost).

  • Async logging caveat. Library consumers that import khora without configuring loguru sinks inherit the default sync stderr sink, which blocks the event loop on every log call inside async def. Either call khora.logging_config.setup_logging() (which configures sinks with enqueue=True and registers an atexit drain) or configure your own loguru sinks with enqueue=True explicitly.

Documentation

Start at docs/README.md. Key entry points:

Development

make dev         # start PostgreSQL + Neo4j (Docker)
make test        # pytest with coverage
make format      # ruff format + isort
make lint        # ruff + ty typecheck

See CHANGELOG.md for release history.

License

Copyright 2026 AllTheData Inc.

Licensed under the Apache License, Version 2.0. See LICENSE and NOTICE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

khora-0.10.4.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

khora-0.10.4-py3-none-any.whl (723.7 kB view details)

Uploaded Python 3

File details

Details for the file khora-0.10.4.tar.gz.

File metadata

  • Download URL: khora-0.10.4.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for khora-0.10.4.tar.gz
Algorithm Hash digest
SHA256 4d025f8ce94600fd33442d18cefad4b60522c8fc5cd12b78ea888b79c6594a0e
MD5 12005761af0e79aa7ff71bdd8601dccf
BLAKE2b-256 fe0a26f2a6cc71a840a9120d462962a66df68d2478fc4e674aa5e6cf721463aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for khora-0.10.4.tar.gz:

Publisher: release.yml on DeytaHQ/khora

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file khora-0.10.4-py3-none-any.whl.

File metadata

  • Download URL: khora-0.10.4-py3-none-any.whl
  • Upload date:
  • Size: 723.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for khora-0.10.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ab1c251139cd3281a5066d9188ef7bcde95e2e7788102cc896a58815d19717b8
MD5 97b4dc5b2219ac73d9d7ea00081ef83b
BLAKE2b-256 09ac03e6b083c2035248921df69fdcc1cd589668d934f236eada4f81dc8e1c1e

See more details on using hashes here.

Provenance

The following attestation bundles were made for khora-0.10.4-py3-none-any.whl:

Publisher: release.yml on DeytaHQ/khora

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page