Backend-agnostic one-shot GraphRAG: fuse vector + graph (label/class/relation) seeds with MMR diversity into a single synthesis. Zero-infra default.
Project description
OmniFuse
Backend-agnostic, one-shot GraphRAG. Fire several retrieval strategies at once — vector/lexical passages + graph label-linking + class enumeration + relation expansion — and fuse them with MMR diversity into a single LLM synthesis. No iterative ReAct tool loop. Zero infra, zero lock-in: the full algorithm runs on a pure-Python in-memory backend (dict + BM25), and swaps to Fuseki / Qdrant / any LLM by passing objects that match three small protocols.
from omnifuse import from_triples
of = from_triples( # nodes are inferred; no DB, no API key
[("담보", "instanceOf", "규정"), ("담보", "한도", "5억")],
chunks=[("c1", "담보 한도는 5억원이다", ["담보"])],
)
print(of.search("담보 한도").answer)
Load however you have the data — all zero-dep, same search():
from omnifuse import from_jsonl, from_csv, from_fuseki, build_inmemory
of = from_jsonl(triples="t.jsonl", chunks="c.jsonl")
of = from_csv(triples="triples.csv", chunks="chunks.csv")
of = from_fuseki("http://localhost:3030/ds/query", graph_uri="urn:g", user="admin", password="…")
of = build_inmemory(nodes, triples, chunks) # explicit Node/Triple/Chunk
Why graph fusion (not just vectors)
Pure vector RAG answers from the top-k passages it happens to embed near the query. A graph store also gives you operations cosine similarity can't:
- Complete enumeration — all instances of a class ("list every regulation"), exact counts.
- Relations / multi-hop — what an entity is connected to, 1-hop neighbors, paths.
- Minority evidence survives — MMR diversity keeps the decisive exception/warning that near-duplicate passages would otherwise crowd out of a fixed top-k.
OmniFuse fuses both: the vector seed for content, the graph seeds for structure.
Design — algorithm as a library
The algorithm only talks to three typing.Protocols, never to a database:
class GraphStore(Protocol):
def search_labels(self, query, *, limit=30) -> list[tuple[Node, float]]: ... # full-text label search
def class_instances(self, class_id, *, limit=1000) -> list[Node]: ... # enumeration
def neighbors(self, node_id, *, hops=1, limit=100) -> list[tuple[str,str,str]]: ... # traversal
def count_class(self, class_id) -> int: ...
def get_node(self, node_id) -> Node | None: ...
class VectorStore(Protocol):
def search(self, query, *, limit=20) -> list[tuple[Chunk, float]]: ...
def fetch(self, ids) -> list[Chunk]: ...
class LLM(Protocol):
def generate(self, prompt, *, system="", timeout=None) -> str: ...
- Zero-infra default —
InMemoryGraphindexes node labels with BM25 (CJK character n-grams, so Korean/CJK search works with no morphological analyzer), andInMemoryVectoruses cosine when embeddings are present, else BM25 lexical. dependencies = []— the core needs nothing but the standard library. Real backends are optional extras (pip install "xgen-omnifuse[fuseki,qdrant]").- Bring your own LLM — pass anything with
generate(...); the bundledEchoLLMreturns the fused evidence so the pipeline runs end-to-end with no API key.
The pipeline (OmniFuse.search)
- vector/lexical seed → adaptive top-k (score-distribution cut, not fixed k)
- graph label-linking → 1-hop relations
- class enumeration (complete list/count)
- HippoRAG — entities of the retrieved chunks → 1-hop expansion
- evidence assembled with MMR diversity (Jaccard, no embeddings needed)
- one LLM synthesis over the fused evidence
- honest
evidence_nodes— only the nodes the answer actually cites
Install
pip install xgen-omnifuse # core (zero deps)
pip install "xgen-omnifuse[dev]" # + pytest, ruff
Run the demo with no install:
python examples/quickstart.py
Layout
src/omnifuse/
protocols.py # GraphStore / VectorStore / LLM (the swap points)
models.py # Node, Triple, Chunk, SearchResult
text.py # tokenizer + BM25 (CJK n-grams)
fusion.py # MMR, adaptive top-k, relation ranking
oneshot.py # OmniFuse.search — the fusion algorithm
backends/memory.py # InMemoryGraph + InMemoryVector (zero infra)
llm.py # EchoLLM, CallableLLM
facade.py # build_inmemory(...)
examples/ tests/
Two interchangeable modes (same algorithm)
# (a) self-contained — zero infra
from omnifuse import build_inmemory
of = build_inmemory(nodes, triples, chunks)
# (b) backed by Apache Jena Fuseki (or any SPARQL endpoint) — graph-only or with a vector store
from omnifuse import OmniFuse, InMemoryVector
from omnifuse.backends.fuseki import FusekiGraph
graph = FusekiGraph("http://localhost:3030/ds/query", graph_uri="urn:my-graph", user="admin", password="…")
of = OmniFuse(graph, InMemoryVector([])) # search() unchanged
FusekiGraph is stdlib-only (urllib) and uses portable FILTER(CONTAINS(...)), so it
works on any SPARQL 1.1 store — not just jena-text.
Roadmap
backends/qdrant.pyvector adapter; jena-text fast path forFusekiGraph- async pipeline (parallel seeds via
asyncio.gather) - reranker / cross-encoder hook, query expansion
- configurable ISA predicates and prompt templates (per domain/language)
Vault — fuse / surface (omnifuse-native memory)
A growing knowledge store with two omnifuse-specific dynamics, not a generic remember/recall: fuse-on-write (facts deduped & merged by entity) and salience (frequently fused/surfaced nodes rank higher). Zero infra; notes auto-link to known entities; persists to JSONL.
from omnifuse import Vault
v = Vault()
v.fuse(facts=[("담보", "instanceOf", "규정")])
v.fuse("담보 한도는 5억원이다", facts=[("담보", "한도", "5억")])
print(v.surface("담보 한도").answer) # fusion search over everything fused, salience-ranked
v.save("vault.jsonl"); v2 = Vault.load("vault.jsonl")
CI / Releasing
ci.yml— runs pytest (3.10–3.12) +python -m build+twine checkon every push/PR.publish.yml— on a GitHub Release, builds and uploads to PyPI via Trusted Publishing (no token in the repo). One-time PyPI setup: project → Publishing → add pending publisherPlateerLab / xgen-omnifuse / publish.yml / pypi. (Token mode: addsecrets.PYPI_API_TOKEN.)
Build locally:
pip install build && python -m build # dist/*.tar.gz + *.whl
License
TBD.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xgen_omnifuse-0.4.0.tar.gz.
File metadata
- Download URL: xgen_omnifuse-0.4.0.tar.gz
- Upload date:
- Size: 20.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1cf77d2783a374bf0a038ec5b0436e9a1c08daa46fd3ed3de2651032873836d0
|
|
| MD5 |
eeeb20b6f5ea46a3183f6e763d3464ff
|
|
| BLAKE2b-256 |
483e1d38536f3ce8a44d026060f747ac9cf72558321c1e299f824a6d10b298e2
|
File details
Details for the file xgen_omnifuse-0.4.0-py3-none-any.whl.
File metadata
- Download URL: xgen_omnifuse-0.4.0-py3-none-any.whl
- Upload date:
- Size: 21.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c92d7849263c887bbfcab34eb0e230ff7822fc0ee64ba494eab84c753034d07
|
|
| MD5 |
4198c40c46e67db0057afabb264c698e
|
|
| BLAKE2b-256 |
e7bbcf5c224e8e8c343172118b00370fded27d4f77ba8574ee733a465d2579b0
|