Skip to main content

Opinionated knowledge-graph ingest, staged proposals, and retrieval for Riff (FastAPI) apps

Project description

riff-kg-kit

PyPI-first Python library for staged knowledge-graph ingestion and agent-approved commits, designed to plug into Riff apps (FastAPI + Postgres + pgvector).

Status

0.7.1 — Extraction durability. run_extraction_for_signal commits staged proposals per segment (when not inside a caller transaction), marks failed runs with error_message, and keeps proposals from completed segments if a later segment errors or times out. 0.7.0 added hybrid reranking, graph-hop retrieval, and pack_context helpers.

Install

pip install riff-kg-kit

Editable (local dev):

pip install -e ".[dev,yaml]"

Quick use

from riff_kg import KgConfig

cfg = KgConfig.model_validate_json('{"embedding_dimension": 768}')
assert cfg.embedding_dimension == 768

Chunking strategy

By default, chunking uses fixed character windows. You can opt into boundary-aware chunking that prefers paragraph/sentence/whitespace splits:

from riff_kg import KgConfig

cfg = KgConfig(
    chunking_strategy="semantic",  # "char" (default) or "semantic"
    chunk_if_longer_than_chars=12000,
    chunk_max_chars=8000,
    chunk_overlap_chars=400,
)

Roadmap (summary)

  1. Migrations + core tables (signal, signal_segment, staged_proposal, …)
  2. Normalize → segment → embed
  3. Extract → stage (LLM proposals only)
  4. Approve → commit (validated canonical graph)
  5. Search / pack-context / retrieval (implemented: riff_kg.search)

Retrieval (Phase 5)

from riff_kg import KgConfig
from riff_kg.search import pack_context, search_segments_vector

# After migrations and ingest with embeddings, pass a query embedding
# (same dimension as KgConfig.embedding_dimension, e.g. 768):
# hits = await search_segments_vector(conn, cfg, query_vec, scope_id="my_scope")
# text = pack_context(hits)

Hybrid rerank and graph hops

from riff_kg.search import graph_hop_subgraph, search_segments_hybrid

# Hybrid search with Reciprocal Rank Fusion (default)
# hits = await search_segments_hybrid(conn, cfg, query_embedding=qvec, fts_query="topic")

# Graph neighborhood around a committed entity
# nodes, edges = await graph_hop_subgraph(conn, root_entity_id=entity_id, max_hops=2, scope_id="my_scope")

Publishing checklist

  1. Bump version in pyproject.toml and src/riff_kg/__init__.py.
  2. Update CHANGELOG.md and README status/examples.
  3. Run python -m ruff check src tests and python -m pytest tests -q.
  4. Build distributions: python -m build.
  5. Upload to TestPyPI first, verify install, then publish to PyPI.

License

Apache-2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

riff_kg_kit-0.7.2.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

riff_kg_kit-0.7.2-py3-none-any.whl (21.3 kB view details)

Uploaded Python 3

File details

Details for the file riff_kg_kit-0.7.2.tar.gz.

File metadata

  • Download URL: riff_kg_kit-0.7.2.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for riff_kg_kit-0.7.2.tar.gz
Algorithm Hash digest
SHA256 0aaeee75c6af9723cb6f913252e971e9d8b6067e0be45d2c227f917aa1866908
MD5 ac4dfb08c9a33140cd8877373699eb37
BLAKE2b-256 626d93b0397ea7ab128a6380dea0cc6e55b9727b774c0e381add5c615a8a1cd4

See more details on using hashes here.

File details

Details for the file riff_kg_kit-0.7.2-py3-none-any.whl.

File metadata

  • Download URL: riff_kg_kit-0.7.2-py3-none-any.whl
  • Upload date:
  • Size: 21.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for riff_kg_kit-0.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1b682e205a78aeebe61e1d38736fd8e6f663f476f4726f2aeffce0bac30e3a6d
MD5 eac2f65d6244928a012b0442296cfe56
BLAKE2b-256 3aaacafa6c3a9c71493ee80b179672eb6f7255ef83a18f4bc671cdffb91cc173

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page