Skip to main content

Graph intelligence engine — knowledge graph construction, neighborhood consensus, semantic linkage

Project description

sandx-graph

Graph intelligence engine — knowledge graph construction, neighborhood consensus, semantic linkage.

CI Python 3.10+ License: Apache 2.0

Part of the SandX Lab computational infrastructure ecosystem.


What It Does

sandx-graph is the graph reasoning layer that operates downstream of sandx-er. It constructs knowledge graphs from resolved entity clusters and computes neighborhood consensus — a measure of how strongly each node's local neighborhood agrees.

sandx-er clusters  →  GraphBuilder  →  KnowledgeGraph  →  ConsensusEngine  →  consensus scores

Status

v0.1 — Working

Component Status
GraphBuilder — construct graphs from clusters, DataFrames, similarity matrices Working
KnowledgeGraph — undirected weighted graph with adjacency traversal Working
ConsensusEngine — BFS neighborhood consensus computation Working
NetworkX export Working (optional dep)
PyPI package Planned

Installation

pip install sandx-graph

Or from source:

git clone https://github.com/sandxlab/sandx-graph
cd sandx-graph
pip install -e ".[dev]"

For NetworkX export:

pip install "sandx-graph[networkx]"

Quick Start

From sandx-er resolution output

import pandas as pd
from sandx_er import EntityResolver
from sandx_graph import GraphBuilder, ConsensusEngine

# Resolve records into entity clusters
records = pd.DataFrame({
    "name": ["Acme Corp", "Acme Corp.", "GlobalTech Inc", "Global Tech"],
    "city": ["Boston", "Boston", "New York", "New York"],
})
er = EntityResolver(blocking="lsh", similarity="jaccard", threshold=0.4)
result = er.resolve(records)

# Build knowledge graph from resolved clusters
builder = GraphBuilder()
graph = builder.from_clusters(result.clusters)
print(graph)  # KnowledgeGraph(n_nodes=2, n_edges=0)

# Add relationship edges (here via similarity matrix)
import numpy as np
ids = [c.canonical_id for c in result.clusters]
sim = np.array([[1.0, 0.3], [0.3, 1.0]])
graph = builder.from_similarity_matrix(ids, sim, threshold=0.5)

From DataFrames

import pandas as pd
from sandx_graph import GraphBuilder, ConsensusEngine

nodes_df = pd.DataFrame({"node_id": ["e1", "e2", "e3"], "label": ["Acme", "GlobalTech", "Initech"]})
edges_df = pd.DataFrame({"source": ["e1", "e2"], "target": ["e2", "e3"], "weight": [0.85, 0.62]})

builder = GraphBuilder()
graph = builder.from_dataframe(nodes_df, edges_df)

# Compute neighborhood consensus
engine = ConsensusEngine(graph)
score = engine.compute("e1", depth=2)
print(score)
# ConsensusScore(node='e1', score=0.735, support=2, conflict=0)

# Batch over all nodes
all_scores = engine.compute_all(depth=1)
stats = engine.summary(depth=1)
print(stats)
# {'mean': 0.735, 'median': 0.735, 'std': 0.115, 'min': 0.620, 'max': 0.850}

Consensus Score

ConsensusEngine runs BFS from a node up to a given depth, collecting all edge weights encountered. The consensus score is the weighted mean of those edges.

Score Interpretation
→ 1.0 Node connected to high-confidence, strongly agreeing neighbors
→ 0.5 Mixed neighborhood — some support, some conflict
→ 0.0 Weak or conflicting edges throughout the neighborhood

Isolated nodes (degree 0) return score 1.0 by convention.

API Reference

GraphBuilder

Method Description
from_clusters(clusters) One node per sandx-er EntityCluster; no edges
from_dataframe(nodes_df, edges_df, ...) Build from node/edge DataFrames
from_similarity_matrix(ids, similarity, threshold) Build from pairwise similarity matrix

KnowledgeGraph

Attribute / Method Description
n_nodes, n_edges Graph size
nodes Dict of node_id → attribute dict
edges List of (source, target, weight) triples
neighbors(node_id) Adjacent node IDs
neighbors_weighted(node_id) (neighbor_id, weight) pairs
degree(node_id) Number of incident edges
has_node(node_id), has_edge(a, b) Membership checks
to_dataframe() Edge list as pandas DataFrame
to_networkx() Export to NetworkX Graph

ConsensusEngine

Method Description
compute(node_id, depth=2) Consensus score for one node
compute_all(depth=2) Scores for all nodes
summary(depth=1) Mean/median/std/min/max over all nodes

Related

License

Apache 2.0 — see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sandx_graph-0.1.0.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sandx_graph-0.1.0-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file sandx_graph-0.1.0.tar.gz.

File metadata

  • Download URL: sandx_graph-0.1.0.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for sandx_graph-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d4de3beddee6887963957f9fab7032733c04020b75cc22de1a2e8769506f2f72
MD5 0a8f38bbaeec833cd8f8286f432c326f
BLAKE2b-256 5fa061180c4d5218d6cfe641d7229b982d0fb4f98af1413fe682b06934b0d1c4

See more details on using hashes here.

File details

Details for the file sandx_graph-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sandx_graph-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for sandx_graph-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fa619220d02e3d46a178fe03cd3b70494c33f93fd60639624599b80b7bfb0e10
MD5 a1aa276bab0660ba3a15a903f07b44b3
BLAKE2b-256 1df504e2871a2b599726b968bfb656c86dcae5a2e4caf344f47c1a00965a3085

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page