Transactional Graph + Vector retrieval system for InterSystems IRIS with hybrid search, openCypher, and GraphQL APIs
Project description
iris-vector-graph
Knowledge graph engine for InterSystems IRIS — temporal property graph, vector search, openCypher, graph analytics, and pre-aggregated analytics.
Install
pip install iris-vector-graph # Core: just intersystems-irispython
pip install iris-vector-graph[full] # Full: + FastAPI, GraphQL, numpy, networkx
pip install iris-vector-graph[plaid] # + sklearn for PLAID K-means build
ObjectScript Only (IPM)
zpm "install iris-vector-graph-core"
Pure ObjectScript — VecIndex, PLAIDSearch, PageRank, Subgraph, GraphIndex, TemporalIndex. No Python. Works on any IRIS 2024.1+, all license tiers.
What It Does
| Capability | Description |
|---|---|
| Temporal Graph | Bidirectional time-indexed edges — ^KG("tout"/"tin"/"bucket"). O(results) window queries via B-tree traversal. 134K+ edges/sec ingest (RE2-TT benchmark). |
| Pre-aggregated Analytics | ^KG("tagg") per-bucket COUNT/SUM/AVG/MIN/MAX and HLL COUNT DISTINCT. O(1) aggregation queries — 0.085ms for 1-bucket, 0.24ms for 24-hour window. |
| BM25Index | Pure ObjectScript Okapi BM25 lexical search — ^BM25Idx globals, zero SQL tables. Automatic kg_TXT upgrade when "default" index exists. Cypher CALL ivg.bm25.search(name, query, k). 0.3ms median search. |
| VecIndex | RP-tree ANN vector search — pure ObjectScript + $vectorop SIMD. Annoy-style two-means splitting. |
| IVFFlat | Inverted File flat vector index — Python k-means build (sklearn), pure ObjectScript query. Tunable nprobe recall/speed tradeoff. nprobe=nlist → exact search. Cypher CALL ivg.ivf.search(name, vec, k, nprobe). |
| PLAID | Multi-vector retrieval (ColBERT-style) — centroid scoring → candidate gen → exact MaxSim. Single server-side call. |
| HNSW | Native IRIS VECTOR index via kg_KNN_VEC. Sub-2ms search. |
| Cypher | openCypher parser/translator — MATCH, WHERE, RETURN, CREATE, UNION, CASE WHEN, variable-length paths, shortestPath() / allShortestPaths(), CALL subqueries. Bolt 5.4 protocol (TCP + WebSocket) for standard driver connectivity. |
| Graph Analytics | PageRank, WCC, CDLP, PPR-guided subgraph — pure ObjectScript over ^KG globals. |
| FHIR Bridge | ICD-10→MeSH mapping via UMLS for clinical-to-KG integration. |
| GraphQL | Auto-generated schema from knowledge graph labels. |
| Embedded Python | EmbeddedConnection — zero-boilerplate dbapi2 adapter for IRIS Language=python methods. |
Quick Start
Python
import iris
from iris_vector_graph.engine import IRISGraphEngine
conn = iris.connect(hostname='localhost', port=1972, namespace='USER', username='_SYSTEM', password='SYS')
engine = IRISGraphEngine(conn)
engine.initialize_schema()
Inside IRIS (Language=python, no connection needed)
from iris_vector_graph.embedded import EmbeddedConnection
from iris_vector_graph.engine import IRISGraphEngine
engine = IRISGraphEngine(EmbeddedConnection())
engine.initialize_schema()
Graph Browser + Bolt Connectivity
A built-in Cypher server speaks the Bolt protocol, so standard graph tooling (drivers, visualization, LangChain) works out of the box:
IRIS_HOST=localhost IRIS_PORT=1972 IRIS_NAMESPACE=USER \
IRIS_USERNAME=_SYSTEM IRIS_PASSWORD=SYS \
python3 -m uvicorn iris_vector_graph.cypher_api:app --port 8000
- Browser —
http://localhost:8000/browser/(force-directed graph visualization) - Bolt TCP —
bolt://localhost:7687(Python/Java/Go/.NET drivers, LangChain, cypher-shell) - HTTP API —
http://localhost:8000/api/cypher(curl, httpie, REST clients)
Temporal Property Graph
Store and query time-stamped edges — service calls, events, metrics, log entries — with sub-millisecond window queries and O(1) aggregation.
Two edge APIs: structural vs. temporal
IVG has two distinct edge APIs that write to different storage and support different query patterns:
create_edge / bulk_create_edges |
create_edge_temporal / bulk_create_edges_temporal |
|
|---|---|---|
| Writes to | Graph_KG.rdf_edges SQL (durability) + ^KG("out",0,...) globals (query, synchronous) |
^KG("tout"/"tin") (time-ordered) + ^KG("out",0,...) (adjacency) |
| Query via | MATCH (a)-[:R]->(b) — immediately visible, no BuildKG() needed |
get_edges_in_window(), get_temporal_aggregate(), temporal Cypher WHERE r.ts >= $start; also visible in MATCH (a)-[:R]->(b) |
| Models | Structural relationship — "A is connected to B" | Event log — "A called B at time T with weight W" |
| Example | (service:auth)-[:DEPENDS_ON]->(service:payment) |
(service:auth)-[:CALLS_AT {ts: 1705000042, weight: 38ms}]->(service:payment) |
Use create_edge when the relationship is a permanent structural fact: schema dependencies, ontology hierarchies, entity co-occurrences, foreign key relationships.
Use create_edge_temporal when the relationship is a time-series event: service calls, metric emissions, log events, cost observations, anything you'll query by time window or aggregate over time.
The same node pair can have both: a structural DEPENDS_ON edge (created once) and thousands of temporal CALLS_AT events (one per call). Both are immediately visible in MATCH (a)-[r]->(b) — no rebuild required.
Deleting an edge:
engine.delete_edge("service:auth", "DEPENDS_ON", "service:payment")
# removes from rdf_edges SQL and kills ^KG("out",0,...) immediately
Note — bulk ingest:
bulk_create_edgesis optimized for high-volume ingest (535M edges validated) and intentionally skips the per-edge^KGwrite for performance. Edges inserted in bulk are visible toMATCH/BFS only after callingBuildKG()at the end of the ingest session.bulk_create_edges_temporaldoes write^KGimmediately.create_edge(single) always writes immediately.
Ingest
import time
# Single edge
engine.create_edge_temporal(
source="service:auth",
predicate="CALLS_AT",
target="service:payment",
timestamp=int(time.time()),
weight=42.7, # latency_ms, metric value, or 1.0
)
# Bulk ingest — 134K+ edges/sec (RE2-TT benchmark, 535M edges validated)
edges = [
{"s": "service:auth", "p": "CALLS_AT", "o": "service:payment", "ts": 1712000000, "w": 42.7},
{"s": "service:payment", "p": "CALLS_AT", "o": "db:postgres", "ts": 1712000001, "w": 8.1},
{"s": "service:auth", "p": "EMITS_METRIC_AT","o": "metric:cpu", "ts": 1712000000, "w": 73.2},
]
engine.bulk_create_edges_temporal(edges)
Window Queries
now = int(time.time())
# All calls from auth in the last 5 minutes
edges = engine.get_edges_in_window(
source="service:auth",
predicate="CALLS_AT",
start=now - 300,
end=now,
)
# [{"s": "service:auth", "p": "CALLS_AT", "o": "service:payment", "ts": 1712000042, "w": 38.2}, ...]
# Edge velocity — call count in last N seconds (reads pre-aggregated bucket, O(1))
velocity = engine.get_edge_velocity("service:auth", window_seconds=300)
# 847
# Burst detection — which nodes exceeded threshold in last N seconds
bursts = engine.find_burst_nodes(predicate="CALLS_AT", window_seconds=60, threshold=500)
# [{"id": "service:auth", "velocity": 1243}, {"id": "service:checkout", "velocity": 731}]
Pre-aggregated Analytics (O(1) per bucket)
now = int(time.time())
# Average latency for auth→payment calls in the last 5 minutes
avg_latency = engine.get_temporal_aggregate(
source="service:auth",
predicate="CALLS_AT",
metric="avg", # "count" | "sum" | "avg" | "min" | "max"
ts_start=now - 300,
ts_end=now,
)
# 41.3 (float, milliseconds)
# All metrics for count, and extremes
count = engine.get_temporal_aggregate("service:auth", "CALLS_AT", "count", now-300, now)
p_min = engine.get_temporal_aggregate("service:auth", "CALLS_AT", "min", now-300, now)
p_max = engine.get_temporal_aggregate("service:auth", "CALLS_AT", "max", now-300, now)
# GROUP BY source — all services, CALLS_AT, last 5 minutes
groups = engine.get_bucket_groups(predicate="CALLS_AT", ts_start=now-300, ts_end=now)
# [
# {"source": "service:auth", "predicate": "CALLS_AT", "count": 847, "avg": 41.3, "min": 2.1, "max": 312.0},
# {"source": "service:checkout", "predicate": "CALLS_AT", "count": 312, "avg": 28.7, "min": 1.4, "max": 189.0},
# ...
# ]
# COUNT DISTINCT targets — fanout detection (16-register HLL, ~26% error, good for threshold detection)
distinct_targets = engine.get_distinct_count("service:auth", "CALLS_AT", now-3600, now)
# 14 (distinct services called by auth in last hour)
Rich Edge Properties
# Attach arbitrary attributes to any temporal edge
engine.create_edge_temporal(
source="service:auth",
predicate="CALLS_AT",
target="service:payment",
timestamp=1712000000,
weight=42.7,
attrs={"trace_id": "abc123", "status": 200, "region": "us-east-1"},
)
# Retrieve attributes
attrs = engine.get_edge_attrs(
ts=1712000000,
source="service:auth",
predicate="CALLS_AT",
target="service:payment",
)
# {"trace_id": "abc123", "status": 200, "region": "us-east-1"}
NDJSON Import / Export
# Export temporal edges for a time window
engine.export_temporal_edges_ndjson(
path="traces_2026-04-01.ndjson",
start=1743465600,
end=1743552000,
)
# Import — resume an ingest from a file
engine.import_graph_ndjson("traces_2026-04-01.ndjson")
ObjectScript Direct
// Ingest
Do ##class(Graph.KG.TemporalIndex).InsertEdge("svc:auth","CALLS_AT","svc:pay",ts,42.7,"")
// Bulk ingest (JSON array)
Set n = ##class(Graph.KG.TemporalIndex).BulkInsert(edgesJSON)
// Query window — returns JSON array
Set result = ##class(Graph.KG.TemporalIndex).QueryWindow("svc:auth","CALLS_AT",tsStart,tsEnd)
// Pre-aggregated average latency
Set avg = ##class(Graph.KG.TemporalIndex).GetAggregate("svc:auth","CALLS_AT","avg",tsStart,tsEnd)
// GROUP BY source
Set groups = ##class(Graph.KG.TemporalIndex).GetBucketGroups("CALLS_AT",tsStart,tsEnd)
// COUNT DISTINCT targets (HLL)
Set n = ##class(Graph.KG.TemporalIndex).GetDistinctCount("svc:auth","CALLS_AT",tsStart,tsEnd)
Vector Search (VecIndex)
engine.vec_create_index("drugs", 384, "cosine")
engine.vec_insert("drugs", "metformin", embedding_vector)
engine.vec_build("drugs")
results = engine.vec_search("drugs", query_vector, k=5)
# [{"id": "metformin", "score": 0.95}, ...]
IVFFlat Vector Index
Inverted File with Flat quantization — Python k-means build, pure ObjectScript query. Tunable nprobe recall/speed tradeoff; nprobe=nlist gives exact results.
# Build: reads kg_NodeEmbeddings, runs MiniBatchKMeans, stores ^IVF globals
result = engine.ivf_build("kg_idx", nlist=256, metric="cosine")
# {"nlist": 256, "indexed": 10000, "dim": 768}
# Search: finds nprobe nearest centroids, scores their cells
results = engine.ivf_search("kg_idx", query_vector, k=10, nprobe=32)
# [("NCIT:C12345", 0.97), ("NCIT:C67890", 0.94), ...]
# Lifecycle
info = engine.ivf_info("kg_idx") # {"nlist":256,"dim":768,"indexed":10000,...}
engine.ivf_drop("kg_idx")
Cypher:
CALL ivg.ivf.search('kg_idx', $query_vec, 10, 32) YIELD node, score
RETURN node, score ORDER BY score DESC
Global storage: ^IVF(name, "cfg"|"centroid"|"list") — independent of ^KG, ^VecIdx, ^PLAID, ^BM25Idx.
PLAID Multi-Vector Search
# Build: Python K-means + ObjectScript inverted index
engine.plaid_build("colbert_idx", docs) # docs = [{"id": "x", "tokens": [[f1,...], ...]}, ...]
# Search: single server-side call, pure $vectorop
results = engine.plaid_search("colbert_idx", query_tokens, k=10)
# [{"id": "doc_3", "score": 0.94}, ...]
Cypher
Temporal edge filtering (v1.42.0+)
-- Filter edges by timestamp — routes to ^KG("tout") B-tree, O(results)
MATCH (a)-[r:CALLS_AT]->(b)
WHERE r.ts >= $start AND r.ts <= $end
RETURN r.ts, r.weight
ORDER BY r.ts DESC
-- Temporal + property filter
MATCH (a:Service)-[r:CALLS_AT]->(b)
WHERE r.ts >= $start AND r.ts <= $end
AND r.weight > 1000
RETURN a.id, b.id, r.ts, r.weight
ORDER BY r.weight DESC
-- Inbound direction — routes to ^KG("tin")
MATCH (b:Service)<-[r:CALLS_AT]-(a)
WHERE r.ts >= $start AND r.ts <= $end
RETURN a.id, b.id, r.ts
Sweet spot: Temporal Cypher is designed for trajectory-style queries (≤~50 edges, ordered output). For aggregation over large windows, use
get_temporal_aggregate()/get_bucket_groups()— these are O(1) pre-aggregated and 400× faster.
-- Named paths
MATCH p = (a:Service)-[r:CALLS]->(b:Service)
WHERE a.id = 'auth'
RETURN p, length(p), nodes(p), relationships(p)
-- Variable-length paths
MATCH (a:Service)-[:CALLS*1..3]->(b:Service)
WHERE a.id = 'auth'
RETURN b.id
-- Shortest path between two nodes (v1.49.0+)
MATCH p = shortestPath((a {id: $from})-[*..8]-(b {id: $to}))
RETURN p, length(p), nodes(p), relationships(p)
-- All shortest paths — returns every minimum-length path
MATCH p = allShortestPaths((a {id: $from})-[*..8]-(b {id: $to}))
RETURN p
-- CASE WHEN
MATCH (n:Service)
RETURN n.id,
CASE WHEN n.calls > 1000 THEN 'high' WHEN n.calls > 100 THEN 'medium' ELSE 'low' END AS load
-- UNION
MATCH (n:ServiceA) RETURN n.id
UNION
MATCH (n:ServiceB) RETURN n.id
-- Vector search in Cypher
CALL ivg.vector.search('Service', 'embedding', [0.1, 0.2, ...], 5) YIELD node, score
RETURN node, score
Graph Analytics
from iris_vector_graph.operators import IRISGraphOperators
ops = IRISGraphOperators(conn)
# Personalized PageRank
scores = ops.kg_PAGERANK(seed_entities=["service:auth"], damping=0.85)
# K-hop subgraph
subgraph = ops.kg_SUBGRAPH(seed_ids=["service:auth"], k_hops=3)
# PPR-guided subgraph (prevents k^n blowup)
guided = ops.kg_PPR_GUIDED_SUBGRAPH(seed_ids=["service:auth"], top_k=50, max_hops=5)
# Community detection
communities = ops.kg_CDLP()
components = ops.kg_WCC()
FHIR Bridge
# Load ICD-10→MeSH mappings from UMLS MRCONSO
# python scripts/ingest/load_umls_bridges.py --mrconso /path/to/MRCONSO.RRF
anchors = engine.get_kg_anchors(icd_codes=["J18.0", "E11.9"])
# → ["MeSH:D001996", "MeSH:D003924"] (filtered to nodes in KG)
Architecture
Global Structure
| Global | Purpose |
|---|---|
^KG("out", s, p, o) |
Knowledge graph — outbound edges |
^KG("in", o, p, s) |
Knowledge graph — inbound edges |
^KG("tout", ts, s, p, o) |
Temporal index — outbound, ordered by timestamp |
^KG("tin", ts, o, p, s) |
Temporal index — inbound, ordered by timestamp |
^KG("bucket", bucket, s) |
Pre-aggregated edge count per 5-minute bucket |
^KG("tagg", bucket, s, p, key) |
Pre-aggregated COUNT/SUM/MIN/MAX/HLL per bucket |
^KG("edgeprop", ts, s, p, o, key) |
Rich edge attributes |
^NKG |
Integer-encoded ^KG for Arno acceleration |
^VecIdx |
VecIndex RP-tree ANN |
^PLAID |
PLAID multi-vector |
^BM25Idx |
BM25 lexical search index |
Schema (Graph_KG)
| Table | Purpose |
|---|---|
nodes |
Node registry (node_id PK) |
rdf_edges |
Edges (s, p, o_id) |
rdf_labels |
Node labels (s, label) |
rdf_props |
Node properties (s, key, val) |
kg_NodeEmbeddings |
HNSW vector index (id, emb VECTOR) |
fhir_bridges |
ICD-10→MeSH clinical code mappings |
ObjectScript Classes
| Class | Key Methods |
|---|---|
Graph.KG.TemporalIndex |
InsertEdge, BulkInsert, QueryWindow, GetVelocity, FindBursts, GetAggregate, GetBucketGroups, GetDistinctCount, Purge |
Graph.KG.VecIndex |
Create, InsertJSON, Build, SearchJSON, SearchMultiJSON, InsertBatchJSON |
Graph.KG.PLAIDSearch |
StoreCentroids, BuildInvertedIndex, Search |
Graph.KG.PageRank |
RunJson, PageRankGlobalJson |
Graph.KG.Algorithms |
WCCJson, CDLPJson |
Graph.KG.Subgraph |
SubgraphJson, PPRGuidedJson |
Graph.KG.Traversal |
BuildKG, BuildNKG, BFSFastJson, ShortestPathJson |
Graph.KG.BulkLoader |
BulkLoad (INSERT %NOINDEX %NOCHECK + %BuildIndices) |
Graph.KG.BM25Index |
Build, Search, Insert, Drop, Info, SearchProc (kg_BM25 stored procedure) |
Graph.KG.IVFIndex |
Build, Search, Drop, Info, SearchProc (kg_IVF stored procedure) |
Graph.KG.EdgeScan |
MatchEdges (Graph_KG.MatchEdges stored procedure), WriteAdjacency, DeleteAdjacency |
Performance
| Operation | Latency | Dataset |
|---|---|---|
| Temporal edge ingest | 134K edges/sec | RE2-TT 535M edges, Enterprise IRIS |
| Window query (selective) | 0.1ms | O(results), B-tree traversal |
| GetAggregate (1 bucket, 5min) | 0.085ms | 50K-edge dataset |
| GetAggregate (288 buckets, 24hr) | 0.160ms | O(buckets), not O(edges) |
| GetBucketGroups (3 sources, 1hr) | 0.193ms | |
| GetDistinctCount (1 bucket) | 0.101ms | 16-register HLL |
| VecIndex search (1K vecs, 128-dim) | 4ms | RP-tree + $vectorop SIMD |
| HNSW search (143K vecs, 768-dim) | 1.7ms | Native IRIS VECTOR index |
| PLAID search (500 docs, 4 tokens) | ~14ms | Centroid scoring + MaxSim |
| BM25Index search (174 nodes, 3-term) | 0.3ms | Pure ObjectScript $Order posting-list |
| PPR (10K nodes) | 62ms | Pure ObjectScript |
| 1-hop neighbors | 0.3ms | $Order on ^KG |
Documentation
- Python SDK Reference
- Architecture
- Schema Reference
- Temporal Graph Full Spec
- Setup Guide
- Testing Policy
Changelog
v1.55.2 (2026-04-19)
- fix: Bug 6 (final) — SQLCODE -400 on rdf_edges index creation now falls back to ALTER TABLE ADD INDEX; all standard indexes created even when Graph.KG.Edge class was never compiled
v1.55.1 (2026-04-19)
- fix: Graph.KG.Edge/TestEdge persistent classes excluded from ObjectScript deploy (fix DDL table ownership conflict — Bug 6)
- fix: conftest removes conflicting .cls before LoadDir
- fix: apoc.meta.data() samples all nodes per label via JOIN on rdf_labels (no longer skips labels with no first-node properties)
v1.55.0 (2026-04-19)
- feat: import_rdf/bulk_create_edges/create_edge_temporal/bulk_create_edges_temporal all accept graph= parameter
- feat: USE GRAPH filtering now strict (exact graph_id match, no NULL leakage)
- feat: UNIQUE constraint updated to (s,p,o_id,graph_id) allowing same triple in multiple named graphs
- feat: db.schema.relTypeProperties() returns actual relationship property names
- fix: import_rdf _ensure_node uses WHERE NOT EXISTS (no duplicate key errors)
- fix: import_rdf edge INSERT scoped to graph_id in WHERE NOT EXISTS check
- fix: graph_id column uses %EXACT for case-sensitive storage
- test: 8 E2E tests proving fail-before/pass-after for all 5 FRs (spec 061)
v1.54.1 (2026-04-18)
- fix: initialize_schema() idempotent — "already has index" suppressed (Bug 1)
- fix: idx_props_val_ifind (iFind) and idx_edges_confidence (JSON_VALUE) now optional — graceful skip on Community (Bugs 2+3)
- test: 6 new E2E schema init tests covering idempotency, required tables, optional indexes, core procedures (spec 060)
v1.54.0 (2026-04-18)
- fix: materialize_inference respects named graphs — inferred triples use correct graph_id (spec 055)
- fix: materialize_inference/retract_inference accept graph= parameter
- feat: Cypher % (modulo → MOD) and ^ (power → POWER) operators (spec 056)
- feat: FOREACH clause —
FOREACH (x IN list | update_clause)(spec 057) - fix: EXISTS { (n)-[r]->(m) } with edge patterns now works; MATCH keyword optional inside EXISTS (spec 058)
- feat: Pattern comprehension
[(a)-[r]->(b) | proj]collecting edge projections (spec 059)
v1.53.1 (2026-04-18)
- feat:
engine.materialize_inference(rules="rdfs"|"owl")— transitive subClassOf/subPropertyOf closure, rdf:type inheritance, domain/range, OWL equivalentClass/inverseOf/TransitiveProperty/SymmetricProperty - feat:
engine.retract_inference()— removes all inferred triples, restoring asserted-only graph - feat:
import_rdf(path, infer="rdfs")— runs inference automatically after load - Inferred triples tagged
qualifiers={"inferred":true}for easy exclusion
v1.53.0 (2026-04-18)
- feat: Named graphs —
create_edge(graph='name'),list_graphs(),drop_graph(name) - feat:
USE GRAPH 'name' MATCH (a)-[r]->(b)Cypher syntax adds graph_id filter - feat: Schema migration —
graph_idcolumn added tordf_edges(idempotent, run on initialize_schema)
v1.52.1 (2026-04-18)
- feat:
engine.import_rdf(path)— load Turtle (.ttl), N-Triples (.nt), N-Quads (.nq) into the graph - Format auto-detected from extension; streaming batch ingest; blank node synthetic IDs; language tags preserved
v1.52.0 (2026-04-18)
- feat:
ALL/ANY/NONE/SINGLE(x IN list WHERE ...)list predicate expressions - feat:
[x IN list WHERE pred | proj]list comprehensions - feat:
reduce(acc = init, x IN list | body)reduce expressions - feat:
filter()/extract()legacy list functions as aliases - feat: Arithmetic operators
+,-,*,/in Cypher expressions
v1.51.1 (2026-04-18)
- feat:
apoc.meta.data()returns proper schema columns — LangChainNeo4jGraph()connects without error - feat:
apoc.meta.schema()returns schema summary
v1.51.0 (2026-04-18)
- feat:
keys(n)returns node property keys via rdf_props subquery - feat:
range(start, end)andrange(start, end, step)generate integer lists - feat:
size(list)uses JSON_ARRAYLENGTH;head(),last(),tail(),isEmpty()implemented
v1.50.3 (2026-04-18)
- Fix:
initialize_schema()createsSQLUser.*views automatically — no more manual DEFAULT_SCHEMA workaround - Fix:
initialize_schema()detects pre-compiled ObjectScript classes via%Dictionary— fast 0.2ms PPR path activates correctly instead of falling back to 1800ms Python path
v1.50.2 (2026-04-18)
- Fix:
MATCH (a)-[r]->(b)with unbound source falls back tordf_edgesSQL (avoids IRIS SqlProc 32KB string limit for large graphs with 88K+ edges) MatchEdgesis now only used when source node ID is bound — safe path for single-node traversal
v1.50.1 (2026-04-18)
- Fix:
bulk_create_edgesnow callsBuildKG()after batch SQL — bulk-inserted static edges immediately visible to MATCH/BFS - Fix:
BuildKG()already uses shard-0^KG("out",0,...)layout (confirmed, no code change needed)
v1.50.0 (2026-04-18)
- Unified edge store PR-A —
MATCH (a)-[r]->(b)now returns both static and temporal edges (spec 048) Graph.KG.EdgeScan—MatchEdges(sourceId, predicate, shard)SqlProc scans^KG("out",0,...)globalscreate_edgewrites^KGsynchronously;delete_edge(new) kills^KGentry synchronously- Cypher
MATCH (a)-[r]->(b)routes toMatchEdgesCTE — no SQL JOIN on rdf_edges TemporalIndexand all traversal code updated to shard-0 layout- IVF index fixes:
$vector("double"), JSON float arrays, leading-zero scores,VECTOR(DOUBLE)schema - Parser: negative float literals in list expressions now work
v1.49.0 (2026-04-18)
shortestPath()/allShortestPaths()openCypher syntax — fixes parse error reported by mindwalk (spec 047)MATCH p = shortestPath((a {id:$from})-[*..8]-(b {id:$to})) RETURN pnow works end-to-endRETURN p→ JSON{"nodes":[...],"rels":[...],"length":N};RETURN length(p),nodes(p),relationships(p)all supportedallShortestPaths(...)returns all minimum-length paths (diamond graphs return both paths)Graph.KG.Traversal.ShortestPathJson— pure ObjectScript BFS with multi-parent backtracking for all-paths support- Parser fix:
[*..N](dot-dot without leading integer) now parses correctly - Parser fix: bare
--undirected relationship pattern now parses correctly - Translator/engine fix:
CREATEwithout RETURN clause no longer throwsUnboundLocalError
v1.48.0 (2026-04-18)
- IVFFlat vector index —
Graph.KG.IVFIndexObjectScript class +^IVFglobals (spec 046) ivf_build(name, nlist, metric, batch_size)— Python MiniBatchKMeans build fromkg_NodeEmbeddings; stores centroids + inverted lists as$vectorin^IVFglobalsivf_search(name, query, k, nprobe)— pure ObjectScript centroid scoring → cell scan → top-k;nprobe=nlistgives exact searchivf_drop(name)/ivf_info(name)— lifecycle managementGraph_KG.kg_IVFSQL stored procedure — enablesJSON_TABLECTE pattern- Cypher
CALL ivg.ivf.search(name, query_vec, k, nprobe) YIELD node, score - Translator fix:
ORDER BY <alias> DESCnow resolves SELECT-level aliases (e.g.count(r) AS deg) withoutUndefinederror cypher_api.py: Bolt TCP/WS sessions use dedicated IRIS connections (_make_engine) to prevent connection contention with HTTP handlers;threading.Lockon shared engine cachetest_bolt_server.py: fixed 2TestBoltSessionHellotests using deprecatedasyncio.get_event_loop().run_until_complete()→asyncio.run()
v1.47.0 (2026-04-10)
- Bolt 5.4 protocol server — TCP (port 7687) + WebSocket (port 8000). Standard graph drivers (Python, Java, Go, .NET), LangChain, and visualization tools connect via
bolt:// - Graph browser — bundled at
/browser/with force-directed visualization, schema sidebar,:sysinfo - Cypher HTTP API —
/api/cypher+ Bolt-compatible transactional endpoints. API key auth viaX-API-Key - System procedures —
db.labels(),db.relationshipTypes(),db.schema.visualization(),dbms.queryJmx(),SHOW DATABASES/PROCEDURES/FUNCTIONS - Graph object encoding —
RETURN n, r, mproduces typed Node/Relationship structures for visualization - SQL audit —
FETCH FIRST→TOP,DISTINCT TOPorder, IN clause chunking at 499 - Translator fixes — anonymous nodes, BM25 CTE literals, var-length min-hop, UNION ALL with LIMIT
- Embedding fixes — probe false negative, string model loading
scripts/load_demo_data.py— canonical dataset loader (NCIT + HLA immunology + embeddings + BM25)- 456 tests, 0 skipped
v1.46.0 (2026-04-07)
- BM25Index — pure ObjectScript Okapi BM25 lexical search over
^BM25Idxglobals. Zero SQL tables, no Enterprise license required. Graph.KG.BM25Index.Build(name, propsCSV)— indexes all graph nodes by specified text properties; returns{"indexed":N,"avgdl":F,"vocab_size":V}Graph.KG.BM25Index.Search(name, query, k)— Robertson BM25 scoring via$Orderposting-list traversal; returns JSON[{"id":nodeId,"score":S},...]Graph.KG.BM25Index.Insert(name, docId, text)— incremental document add/replace; updates IDF only for new document's terms (O(doc_length))Graph.KG.BM25Index.Drop(name)— O(1) Kill of full indexGraph.KG.BM25Index.Info(name)— returns{"N":N,"avgdl":F,"vocab_size":V}or{}if not found- Python wrappers:
engine.bm25_build(),bm25_search(),bm25_insert(),bm25_drop(),bm25_info() kg_TXTautomatic upgrade:_kg_TXT_fallbackdetects a"default"BM25 index and routes through BM25 instead of LIKE-based fallback- Cypher
CALL ivg.bm25.search(name, $query, k) YIELD node, score— Stage CTE usingGraph_KG.kg_BM25SQL stored procedure - Translator fix:
BM25andPPRCTEs now use own column names in RETURN clause (BM25.nodenotBM25.node_id) - SC-002 benchmark: 0.3ms median search on 174-node community IRIS instance
v1.45.3 (2026-04-04)
translate_relationship_pattern: inline property filters on relationship nodes were silently dropped —MATCH (t)-[:R]->(c {id: 'x'})returned all nodes instead of filtering. Fixed by applyingsource_node.propertiesandtarget_node.propertiesafter JOIN construction.vector_search:TO_VECTOR(?, DOUBLE, {dim})now includes explicit dimension in query cast, resolving type mismatch on IRIS 2025.1 when column dimension is known- 2 regression tests added (375 unit tests total)
v1.45.2 (2026-04-03)
embedded.py: auto-fixessys.pathshadowing — ensures/usr/irissys/lib/pythonis first so the embeddedirismodule takes priority over pip-installedintersystems_irispythonembedded.py: clear error message when shadowed iris (noiris.sql) is detected, naming the root cause- Documented the XD timeout constraint and embed_daemon pattern for long-running ML operations in embedded context
- 3 new tests covering path-fix and shadowing detection
v1.45.1 (2026-04-03)
embed_nodes: FK-safe delete — DELETE failure onkg_NodeEmbeddings(spurious FK error in embedded Python context) is silently ignored; INSERT proceeds correctlyvector_search: usesVECTOR_COSINE(TO_VECTOR(col), ...)so it works on both native VECTOR columns AND VARCHAR-stored vectors (e.g. DocChunk.VectorChunk from fhir-017)
v1.45.0 (2026-04-03)
embed_nodes(model, where, text_fn, batch_size, force, progress_callback)— incremental node embedding overGraph_KG.nodeswith SQL WHERE filter, custom text builder, and per-call model override. Unblocks mixed-ontology graphs (embed only KG8 nodes without re-embedding NCIT's 200K nodes).vector_search(table, vector_col, query_embedding, top_k, id_col, return_cols, score_threshold)— search any IRIS VECTOR column, not justkg_NodeEmbeddings. Works on DocChunk tables, RAG corpora, custom HNSW indexes.multi_vector_search(sources, query_embedding, top_k, fusion='rrf')— unified search across multiple IRIS VECTOR tables with RRF fusion. Returnssource_tableper result. Powers hybrid KG+FHIR document search.validate_vector_table(table, vector_col)— returns{dimension, row_count}for any IRIS VECTOR column.
v1.44.0 (2026-04-03)
- SQL Table Bridge — map existing IRIS SQL tables as virtual graph nodes/edges with zero data copy
engine.map_sql_table(table, id_column, label)— register any IRIS table as a Cypher-queryable node set; no ETL, no data movementengine.map_sql_relationship(source, predicate, target, target_fk=None, via_table=None)— FK and M:M join relationships traversable via Cypherengine.attach_embeddings_to_table(label, text_columns, force=False)— overlay HNSW vector search on existing table rowsengine.list_table_mappings(),remove_table_mapping(),reload_table_mappings()— mapping lifecycle management- Cypher
MATCH (n:MappedLabel)routes to registered SQL table with WHERE pushdown — O(SQL query), not O(copy) - Mixed queries:
MATCH (p:MappedPatient)-[:HAS_DOC]->(d:NativeDocument)spans both mapped and native nodes seamlessly - SQL mapping wins over native
Graph_KG.nodesrows for the same label (FR-016) TableNotMappedErrorraised with helpful message whenattach_embeddings_to_tableis called on unregistered label
v1.43.0 (2026-04-03)
EmbeddedConnectionandEmbeddedCursornow importable directly fromiris_vector_graph(top-level)IRISGraphEngine(iris.sql)— acceptsiris.sqlmodule directly; auto-wraps inEmbeddedConnection(no manual wrapper needed inside IRIS Language=python methods)load_obo(encoding=, encoding_errors='replace')— handles UTF-8 BOM and Latin-1 bytes from IRIS-written files; fixes NCIT.obo loading edge caseload_obo/load_networkxacceptprogress_callback=lambda n_nodes, n_edges: ...— called every 10K items; enables progress reporting for large ontologies (NCIT.obo: 200K+ concepts)- Verified: temporal Cypher (
WHERE r.ts >= $start AND r.ts <= $end) works end-to-end viaEmbeddedConnectionpath
v1.42.0 (2026-04-03)
- Cypher temporal edge filtering:
WHERE r.ts >= $start AND r.ts <= $endroutes MATCH patterns to^KG("tout")B-tree — O(results), not O(total edges) r.tsandr.weightaccessible in RETURN and ORDER BY on temporal edges- Inbound direction
(b)<-[r:P]-(a) WHERE r.ts >= $startroutes to^KG("tin") r.tswithout WHERE filter → NULL + query-level warning (prevents accidental full scans)r.weight > exprin WHERE applies as post-filter on temporal result set- Uses IRIS-compatible derived table subquery (not WITH CTE) — works on protocol 65 xDBC
w→weightcanonical field name in temporal CTE (consistent with v1.41.0 API aliases)- Sweet spot: trajectory queries ≤50 edges. For aggregation, use
get_temporal_aggregate().
v1.41.0 (2026-04-03)
get_edges_in_window()now returnssource/target/predicate/timestamp/weightaliases alongsides/o/p/ts/w— backward compatibleget_edges_in_window(direction="in")— query inbound edges by target node (uses^KG("tin"))create_edge_temporal(..., upsert=True)andbulk_create_edges_temporal(..., upsert=True)— skip write if edge already exists at that timestamppurge_before(ts)— delete all temporal edges older thants, with^KG("tagg")and^KG("bucket")cleanupGraph.KG.TemporalIndex.PurgeBefore(ts)andQueryWindowInbound(target, predicate, ts_start, ts_end)ObjectScript methods
v1.40.0 (2026-04-02)
iris_vector_graph.embedded.EmbeddedConnection— dbapi2 adapter for IRIS Language=python methods- Zero-boilerplate:
IRISGraphEngine(EmbeddedConnection())works inside IRIS identically to externaliris.connect() commit()/rollback()are intentional no-ops (IRIS manages transactions in embedded context)START TRANSACTION/COMMIT/ROLLBACKviacursor.execute()silently dropped (avoids<COMMAND>in wgproto jobs)fetchmany(),rowcount,descriptionfully implemented
v1.39.0 (2026-04-01)
- Pre-aggregated temporal analytics:
^KG("tagg")COUNT/SUM/AVG/MIN/MAX at O(1) GetAggregate,GetBucketGroups,GetDistinctCountObjectScript methodsget_temporal_aggregate(),get_bucket_groups(),get_distinct_count()Python wrappers- 16-register HyperLogLog COUNT DISTINCT (SHA1, ~26% error — suitable for fanout threshold detection)
- Benchmark: 134K–157K edges/sec sustained across RE2-TT/RE2-OB/RE1-TT (535M edges total)
v1.38.0
- Rich edge properties:
^KG("edgeprop", ts, s, p, o, key)— arbitrary typed attributes per temporal edge get_edge_attrs(),create_edge_temporal(attrs={...})- NDJSON import/export:
import_graph_ndjson(),export_graph_ndjson(),export_temporal_edges_ndjson()
v1.37.0
- Temporal property graph:
create_edge_temporal(),bulk_create_edges_temporal() get_edges_in_window(),get_edge_velocity(),find_burst_nodes()^KG("tout"/"tin"/"bucket")globals — bidirectional time-indexed edge storeGraph.KG.TemporalIndexObjectScript class
v1.35.0
- UNION / UNION ALL in Cypher
- EXISTS {} subquery predicates
v1.34.0
- Variable-length paths:
MATCH (a)-[:REL*1..5]->(b)via BFSFastJson bridge
v1.33.0
- CASE WHEN / THEN / ELSE / END in Cypher RETURN and WHERE
v1.32.0
- CAST functions:
toInteger(),toFloat(),toString(),toBoolean()
v1.31.0
- RDF 1.2 reification API:
reify_edge(),get_reifications(),delete_reification()
v1.30.0
- BulkLoader:
INSERT %NOINDEX %NOCHECK+%BuildIndices— 46K rows/sec SQL ingest - RDF 1.2 reification schema DDL
v1.29.0
- OBO ontology ingest:
load_obo(),load_networkx()
v1.28.0
- Lightweight install — base requires only
intersystems-irispython - Optional extras:
[full],[plaid],[dev],[ml],[visualization],[biodata]
v1.26.0–v1.27.0
- PLAID multi-vector retrieval —
PLAIDSearch.clspure ObjectScript +$vectorop - PLAID packed token storage: 53
$Order→ 1$Get
v1.24.0–v1.25.1
- VecIndex nprobe recall fix (counts leaf visits, not branch points)
- Annoy-style two-means tree splitting (fixes degenerate trees)
- Batch APIs:
SearchMultiJSON,InsertBatchJSON
v1.21.0–v1.22.1
- VecIndex RP-tree ANN
SearchJSON/InsertJSON— eliminated xecute path (250ms → 4ms)
v1.20.0
- Arno acceleration wrappers:
khop(),ppr(),random_walk()
v1.19.0
^NKGinteger index for Arno acceleration
v1.18.0
- FHIR-to-KG bridge:
fhir_bridgestable,get_kg_anchors(), UMLS MRCONSO ingest
v1.17.0
- Cypher named path bindings, CALL subqueries, PPR-guided subgraph
Earlier versions →
License: MIT | Author: Thomas Dyar (thomas.dyar@intersystems.com)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iris_vector_graph-1.55.2.tar.gz.
File metadata
- Download URL: iris_vector_graph-1.55.2.tar.gz
- Upload date:
- Size: 648.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68f54d18dca4603e8549321003eb4d356ba9e54f732933a23b7fe2b9a37052b7
|
|
| MD5 |
a50ecb480c659350c4dbbcb5ffb40d10
|
|
| BLAKE2b-256 |
ffbb0e044dac0e74816e4a60ef1cc2cd21c8926c3d8bab6f52356a2174aaa565
|
File details
Details for the file iris_vector_graph-1.55.2-py3-none-any.whl.
File metadata
- Download URL: iris_vector_graph-1.55.2-py3-none-any.whl
- Upload date:
- Size: 143.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
267ff62e76174fa707e98694a249b43e47f88c9902f7ba5ceecafa5620e856dc
|
|
| MD5 |
012286c15725c86e4b2f88a9fb37235d
|
|
| BLAKE2b-256 |
b79e268beae12ebae9dd5c0cb10fba95e8aaff4f86411ba5e4c5aaa51ca263d9
|