Skip to main content

Opinionated knowledge-graph ingest, staged proposals, and retrieval for Riff (FastAPI) apps

Project description

riff-kg-kit

PyPI-first Python library for staged knowledge-graph ingestion and agent-approved commits, designed to plug into Riff apps (FastAPI + Postgres + pgvector).

Status

0.7.1 — Extraction durability. run_extraction_for_signal commits staged proposals per segment (when not inside a caller transaction), marks failed runs with error_message, and keeps proposals from completed segments if a later segment errors or times out. 0.7.0 added hybrid reranking, graph-hop retrieval, and pack_context helpers.

Install

pip install riff-kg-kit

Editable (local dev):

pip install -e ".[dev,yaml]"

Quick use

from riff_kg import KgConfig

cfg = KgConfig.model_validate_json('{"embedding_dimension": 768}')
assert cfg.embedding_dimension == 768

Chunking strategy

By default, chunking uses fixed character windows. You can opt into boundary-aware chunking that prefers paragraph/sentence/whitespace splits:

from riff_kg import KgConfig

cfg = KgConfig(
    chunking_strategy="semantic",  # "char" (default) or "semantic"
    chunk_if_longer_than_chars=12000,
    chunk_max_chars=8000,
    chunk_overlap_chars=400,
)

Roadmap (summary)

  1. Migrations + core tables (signal, signal_segment, staged_proposal, …)
  2. Normalize → segment → embed
  3. Extract → stage (LLM proposals only)
  4. Approve → commit (validated canonical graph)
  5. Search / pack-context / retrieval (implemented: riff_kg.search)

Retrieval (Phase 5)

from riff_kg import KgConfig
from riff_kg.search import pack_context, search_segments_vector

# After migrations and ingest with embeddings, pass a query embedding
# (same dimension as KgConfig.embedding_dimension, e.g. 768):
# hits = await search_segments_vector(conn, cfg, query_vec, scope_id="my_scope")
# text = pack_context(hits)

Hybrid rerank and graph hops

from riff_kg.search import graph_hop_subgraph, search_segments_hybrid

# Hybrid search with Reciprocal Rank Fusion (default)
# hits = await search_segments_hybrid(conn, cfg, query_embedding=qvec, fts_query="topic")

# Graph neighborhood around a committed entity
# nodes, edges = await graph_hop_subgraph(conn, root_entity_id=entity_id, max_hops=2, scope_id="my_scope")

Publishing checklist

  1. Bump version in pyproject.toml and src/riff_kg/__init__.py.
  2. Update CHANGELOG.md and README status/examples.
  3. Run python -m ruff check src tests and python -m pytest tests -q.
  4. Build distributions: python -m build.
  5. Upload to TestPyPI first, verify install, then publish to PyPI.

License

Apache-2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

riff_kg_kit-0.8.0.tar.gz (22.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

riff_kg_kit-0.8.0-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file riff_kg_kit-0.8.0.tar.gz.

File metadata

  • Download URL: riff_kg_kit-0.8.0.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for riff_kg_kit-0.8.0.tar.gz
Algorithm Hash digest
SHA256 6aad4265fd0219a579256d33ea571c609769745b8457ec995926f2e5d80a7d6a
MD5 3d8a04690dd23f55ef605abc72843ee9
BLAKE2b-256 d0935958b2e4fb77391b5551c363566d0502f7138b040066793f83546c1b5e15

See more details on using hashes here.

File details

Details for the file riff_kg_kit-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: riff_kg_kit-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 22.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for riff_kg_kit-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7c56214400c10748fd7d77d3f2c2e4bebcfe3da7bec24c1881f69c2fd138b74a
MD5 f8216af0c516695c72a303cebf5e4766
BLAKE2b-256 fafd5745941d5338a30428b050624134b715e08c061ad465e70570a4f6f776e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page