6D structural intelligence for directed graphs. Six numbers per node. Sub-millisecond.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jmurray10

These details have not been verified by PyPI

Project description

SemanticEmbed SDK

Structural intelligence for directed graphs. Six numbers per node. Sub-millisecond.

SemanticEmbed computes a 6-dimensional structural encoding for every node in a directed graph. From a bare edge list -- no runtime telemetry, no historical data, no tuning -- it produces six independent measurements that fully describe each node's structural role.

Validated against production incidents. In a blind test against a live production environment (100+ services, 2,500+ incidents over 30 days), the majority of topology-relevant incidents occurred on nodes that 6D structural analysis had flagged as risky -- from the call graph alone, before any incident occurred.

Why 6D?

Observability tools tell you what broke. SemanticEmbed tells you what will break -- from topology alone.

No agents, no instrumentation -- just an edge list
Sub-millisecond -- encodes 100+ node graphs in <1ms
Works on any directed graph -- microservices, AI agent pipelines, data workflows, CI/CD
Complementary structural axes -- six dimensions, each captures risk signals the others cannot

Try It Now

Open the Interactive Demo in Google Colab -- runs in your browser, nothing to install locally.

Install

pip install semanticembed

Free tier: Up to 50 nodes per graph. No signup required.

Quick Start

from semanticembed import encode, report

# Any directed graph as an edge list
edges = [
    ("frontend", "api-gateway"),
    ("api-gateway", "order-service"),
    ("api-gateway", "user-service"),
    ("order-service", "payment-service"),
    ("order-service", "inventory-service"),
    ("payment-service", "database"),
]

# Compute the 6D encoding (sub-millisecond)
result = encode(edges)

# Six structural measurements per node
for node, vector in result.vectors.items():
    print(f"{node}: {vector}")

# Structural risk report
print(report(result))

Output:

STRUCTURAL RISK REPORT
======================

AMPLIFICATION RISKS (high fanout, high criticality):
  - api-gateway    | fanout=0.667 | criticality=0.556

CONVERGENCE SINKS (low independence, many upstream callers):
  - database       | independence=0.000

STRUCTURAL SPOF (low independence, high upstream dependency):
  - api-gateway    | independence=0.000 | every request flows through this node

What It Finds That Other Tools Miss

Your current tools	SemanticEmbed
This service has high latency	This service is on 89% of all paths (structural SPOF)
This service had 5 errors	This service fans out to 12 downstream services (amplification risk)
This service is healthy	This service has zero lateral redundancy (convergence sink)

Runtime monitoring tells you what is slow now. Structural analysis tells you what will cause cascading failures regardless of current load.

The Six Dimensions

Every node gets six independent structural measurements:

Dimension	What It Measures	Risk Signal
Depth	Position in the execution pipeline (0.0 = entry, 1.0 = deepest)	Deep nodes accumulate upstream latency
Independence	Lateral redundancy at the same pipeline stage	Low independence = structural chokepoint
Hierarchy	Module or group membership	Cross-module dependencies = blast radius
Throughput	Fraction of total traffic flowing through the node	High throughput + low independence = hidden bottleneck
Criticality	Fraction of end-to-end paths depending on this node	High criticality = SPOF
Fanout	Broadcaster (1.0) vs aggregator (0.0)	High fanout = amplification risk

These six properties capture complementary structural information. Each dimension provides risk signals the others cannot.

See docs/dimensions.md for the full reference.

Use Cases

Microservice architectures -- Find SPOFs, amplification cascades, and convergence bottlenecks in any service mesh. Works with Kubernetes, Istio, OTel traces, or static architecture diagrams.

AI agent pipelines -- Identify vendor concentration risk, gateway bottlenecks, and guardrail single points of failure in LLM orchestration graphs.

CI/CD and data pipelines -- Detect structural fragility in build graphs, ETL workflows, and deployment pipelines before they cause cascading failures.

Architecture drift monitoring -- Compare structural fingerprints across releases. Know exactly which services changed structural role and by how much.

Notebooks

Step-by-step Colab notebooks. Click to open, run in your browser.

Notebook	Use Case	What You Learn
01 - Quickstart	Getting started	Install, encode a graph, read the risk report
02 - Dimensions Deep Dive	Understanding 6D	What each dimension means, with worked examples
03 - Drift Detection	Architecture drift	Compare graph versions, detect structural changes
04 - Bring Your Own Graph	Any graph	Load from JSON, OTel traces, or Kubernetes
05 - AI Agent Pipelines	AI/LLM agents	Vendor concentration, gateway bottlenecks, guardrail SPOFs
06 - CI/CD & Data Pipelines	CI/CD & ETL	Build graph fragility, pipeline bottlenecks, drift gates
07 - OpenTelemetry	OTel traces	Extract edges from traces, structural analysis, CI/CD gates
08 - Qwen Compression	LLM compression	Structural pruning of Qwen2.5-7B, 10% speedup at Grade A

Extract Edges from Infrastructure

Don't have an edge list? The extract module parses common infrastructure files automatically.

import semanticembed as se

# From Docker Compose
edges = se.extract.from_docker_compose("docker-compose.yml")

# From Kubernetes manifests
edges = se.extract.from_kubernetes("k8s/")

# From GitHub Actions workflows
edges = se.extract.from_github_actions(".github/workflows")

# From Terraform
edges = se.extract.from_terraform("infra/")

# From CloudFormation (YAML or JSON)
edges = se.extract.from_cloudformation("template.yaml")

# From AWS CDK (Python)
edges = se.extract.from_aws_cdk("app.py")

# From Pulumi (Python)
edges = se.extract.from_pulumi("__main__.py")

# From Python imports (module dependency graph)
edges = se.extract.from_python_imports("src/")

# From Node.js monorepo (inter-package dependencies)
edges = se.extract.from_package_json_workspaces(".")

# From OpenTelemetry traces (OTLP / Jaeger / Zipkin JSON)
edges = se.extract.from_otel_traces("traces.json")

# From AI agent frameworks (AST-only — no need to install the framework)
edges = se.extract.from_langgraph("workflow.py")   # StateGraph.add_edge / add_conditional_edges / set_entry_point
edges = se.extract.from_crewai("crew.py")          # Task(agent=...) / Task(context=...) / Crew(manager_agent=...)
edges = se.extract.from_autogen("agents.py")       # GroupChat(agents=...) / initiate_chat(...)

# Auto-detect everything in a directory
edges, sources = se.extract.from_directory(".")
print(f"Found {len(edges)} edges from {sources}")

# Then encode as usual
result = se.encode(edges)
print(result.table)

Requires pyyaml for YAML parsing: pip install 'semanticembed[extract]'

Trace ingestion (highest-fidelity edges)

Compose / k8s / Terraform describe deployment, not actual call edges. Real runtime traces are the only source with the actual call graph. v0.3.0 ships a deterministic parser for the three common JSON formats:

OTLP (OpenTelemetry Collector / SDK exports): {"resourceSpans": [...]}
Jaeger (jaeger-query API, jaeger-cli): {"data": [{"spans": [...]}]}
Zipkin (Zipkin v2 API): top-level array with localEndpoint.serviceName

Edges are emitted at the service level — same-service spans roll up. Place a traces.json (or otel.json / jaeger.json / zipkin.json) at your repo root and from_directory() will pick it up.

Live observability connectors

Static analysis is great for repos. For running infra, pull traces directly:

from semanticembed import live

# Dynatrace — Smartscape services + call relationships
edges = live.from_dynatrace(
    env_url="https://abc12345.live.dynatrace.com",
    api_token=os.environ["DYNATRACE_API_TOKEN"],
)

# Honeycomb — Query API over a dataset
edges = live.from_honeycomb(
    dataset="my-app-prod",
    api_key=os.environ["HONEYCOMB_API_KEY"],
    lookback_seconds=900,
)

# Datadog — Spans Search API
edges = live.from_datadog(
    api_key=os.environ["DD_API_KEY"],
    app_key=os.environ["DD_APP_KEY"],
    env="prod",
    lookback="now-30m",
)

AI agent frameworks

The three popular Python agent frameworks each have an explicit graph-building API. Static AST parsing extracts the actual call graph the framework will run. The SDK does not import or run the framework — you don't need pip install langgraph to extract from a LangGraph script.

LangGraph — g.add_edge, g.add_conditional_edges (with explicit path_map), g.set_entry_point, g.set_finish_point. The sentinels START and END are emitted as literal node names.

CrewAI — Task(agent=researcher) produces researcher -> task_var; Task(context=[t1, t2]) produces t1 -> task_var / t2 -> task_var; Crew(manager_agent=mgr) adds a mgr -> agent fan-out.

AutoGen — GroupChat(agents=[a, b, c]) with an explicit GroupChatManager produces a star (manager -> a, -> b, -> c). Without a manager, it's fully connected. x.initiate_chat(y) always produces x -> y.

from_directory() auto-detects these by scanning Python files for the relevant import statements and only running the matching parser on those files (cheap and accurate vs. walking the whole tree).

Blending sources cleanly

Combining traces + compose + Python imports usually produces the same logical service under several names (auth-svc, auth_svc, AuthService). Use dedupe_edges to canonicalize:

compose_edges, _ = se.extract.from_directory(".")
trace_edges = se.extract.from_otel_traces("traces.json")

edges = se.dedupe_edges(
    list(compose_edges) + trace_edges,
    normalize="snake",                          # AuthService -> auth_service
    aliases={"auth_svc": "auth_service"},       # explicit overrides
)
result = se.encode(edges)

Modes: "none" (default), "snake", "lower", "kebab". Self-loops are dropped by default.

LLM-Powered Analysis

Get plain-language explanations and actionable recommendations using your own LLM key.

import semanticembed as se

result = se.encode(edges)

# One-shot analysis (OpenAI, Anthropic, or local Ollama)
print(se.explain(result, model="gpt-4o-mini", api_key="sk-..."))
print(se.explain(result, model="claude-sonnet-4-5", api_key="sk-ant-..."))
print(se.explain(result, model="ollama/llama3"))  # local, no key needed

# Follow-up questions
answer = se.ask(result, "What happens if the database goes down?",
                model="gpt-4o-mini", api_key="sk-...")

The LLM sees only the encoding output (6D vectors, risk report) -- never the algorithm.

Structural Diff

Compare two graph versions in one call:

changes = se.encode_diff(edges_v1, edges_v2)
for node, deltas in changes.items():
    for dim, info in deltas.items():
        print(f"{node}.{dim}: {info['before']:.3f} -> {info['after']:.3f}")

Agent

An autonomous agent that scans your repo, extracts edges, encodes, and explains results interactively. Choose your LLM backend:

# Claude agent (installs the agent code + the Anthropic agent SDK)
pip install 'semanticembed[agent-claude]'
export ANTHROPIC_API_KEY=sk-ant-...
semanticembed-agent              # interactive
semanticembed-agent --ask "What is my biggest SPOF?"

# Gemini agent
pip install 'semanticembed[agent-gemini]'
export GOOGLE_API_KEY=...
semanticembed-gemini-agent

Both binaries are also reachable as python -m semanticembed.agent / python -m semanticembed.agent.gemini_agent.

The agent has 7 tools: scan, extract (docker-compose, k8s, Python imports), encode, diff, and simulate architecture changes. See src/semanticembed/agent/README.md for full docs.

What gets sent where

Be explicit about data egress before pointing the agent at private architecture:

Claude agent (python -m agent): the LLM reads tool outputs as conversation context, so the contents of docker-compose.yml, Kubernetes manifests, Terraform .tf files, Python source, and package.json files in your project go to Anthropic's API along with your prompts. Conversation history is governed by Anthropic's data-use policies.
Gemini agent (python -m agent.gemini_agent): same data flow, sent to Google's API instead.
Skill (skill/analyze.py): runs Ollama on your machine. Raw input never leaves localhost unless you set SEMBED_OLLAMA_URL to a remote host.
Cloud encode() call: only the edge list (node names, e.g. ["frontend", "auth"]) goes to the SemanticEmbed Railway endpoint. File contents are never sent.

If your topology is sensitive, prefer the skill (local Ollama) or pre-extract edges deterministically with se.extract.from_directory() and call se.encode() directly — that path sends only the edge list.

Example Graphs

The examples/ directory contains edge lists for well-known architectures:

File	Application	Nodes	Edges
google_online_boutique.json	Google Online Boutique (microservices)	11	15
weaveworks_sock_shop.json	Weaveworks Sock Shop (microservices)	15	15
ai_agent_pipeline.json	Multi-agent LLM orchestration	12	15
cicd_pipeline.json	CI/CD build pipeline	13	17

React Components

Drop-in React components for rendering SDK results. See examples/react/ for the full source.

Component	What it renders
`useSemanticEmbed.ts`	React hook — call `encode()` from your frontend
`RiskTable.tsx`	Sortable risk table with severity badges
`RadarChart.tsx`	6D radar chart comparing node profiles
`TopologySummary.tsx`	KPI cards + risk breakdown

import { useSemanticEmbed } from './useSemanticEmbed';
import { RiskTable } from './RiskTable';

function App() {
  const { result, loading, encode } = useSemanticEmbed();
  return (
    <>
      <button onClick={() => encode([["A","B"],["B","C"],["C","D"]])}>Analyze</button>
      {result && <RiskTable risks={result.risks} />}
    </>
  );
}

Input Format

SemanticEmbed accepts any directed graph as an edge list.

# Python tuples
edges = [("A", "B"), ("B", "C")]
result = encode(edges)

# JSON file
result = encode_file("my_graph.json")

JSON format:

{
  "edges": [
    {"source": "A", "target": "B"},
    {"source": "B", "target": "C"}
  ]
}

See docs/input_format.md for the full spec.

Documentation

Document	Description
docs/getting_started.md	Install, encode, read results, export -- one page
docs/api_reference.md	Every function, class, parameter, and return type
docs/dimensions.md	The six structural dimensions -- full reference
docs/input_format.md	Edge list input specification
docs/output_format.md	Encoding output and risk report format

License

SemanticEmbed SDK is proprietary software distributed as a compiled package. Free tier available for graphs up to 50 nodes. See LICENSE for terms.

Patent pending. Application #63/994,075.

Contact

Email jeffmurr@seas.upenn.edu

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jmurray10

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.7.3

May 2, 2026

0.7.2

Apr 30, 2026

This version

0.7.1

Apr 30, 2026

0.7.0

Apr 30, 2026

0.6.0

Apr 30, 2026

0.2.1

Apr 17, 2026

0.2.0

Apr 11, 2026

0.1.0

Mar 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semanticembed-0.7.1.tar.gz (113.5 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

semanticembed-0.7.1-py3-none-any.whl (54.5 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file semanticembed-0.7.1.tar.gz.

File metadata

Download URL: semanticembed-0.7.1.tar.gz
Upload date: Apr 30, 2026
Size: 113.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for semanticembed-0.7.1.tar.gz
Algorithm	Hash digest
SHA256	`e1c5c8e696c23b7b4d7872894f10f0aa0030d7c9505f6ee0da670a7ca9b00ada`
MD5	`2663b02519b5f78ff2d595bb44f6d518`
BLAKE2b-256	`ccd55f5b4d415fd51343f3f13fef4038f2cda42e32d47b3670878cb8d732e3ed`

See more details on using hashes here.

Provenance

The following attestation bundles were made for semanticembed-0.7.1.tar.gz:

Publisher: publish.yml on jmurray10/semanticembed-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: semanticembed-0.7.1.tar.gz
- Subject digest: e1c5c8e696c23b7b4d7872894f10f0aa0030d7c9505f6ee0da670a7ca9b00ada
- Sigstore transparency entry: 1408295286
- Sigstore integration time: Apr 30, 2026
Source repository:
- Permalink: jmurray10/semanticembed-sdk@460734b41101652060c3b0291b508796e3e7e642
- Branch / Tag: refs/tags/v0.7.1
- Owner: https://github.com/jmurray10
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@460734b41101652060c3b0291b508796e3e7e642
- Trigger Event: push

File details

Details for the file semanticembed-0.7.1-py3-none-any.whl.

File metadata

Download URL: semanticembed-0.7.1-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 54.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for semanticembed-0.7.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a74c74b3a2e595383616f339197a6a9cf4eb30b966649abcdf91595308aee8d1`
MD5	`7d7e6d515bf3a360e18848655c6017b9`
BLAKE2b-256	`8997edf393b9032df171e5feab1e8e1d76220faadbd12190f990edea4ad6e299`

See more details on using hashes here.

Provenance

The following attestation bundles were made for semanticembed-0.7.1-py3-none-any.whl:

Publisher: publish.yml on jmurray10/semanticembed-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: semanticembed-0.7.1-py3-none-any.whl
- Subject digest: a74c74b3a2e595383616f339197a6a9cf4eb30b966649abcdf91595308aee8d1
- Sigstore transparency entry: 1408295358
- Sigstore integration time: Apr 30, 2026
Source repository:
- Permalink: jmurray10/semanticembed-sdk@460734b41101652060c3b0291b508796e3e7e642
- Branch / Tag: refs/tags/v0.7.1
- Owner: https://github.com/jmurray10
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@460734b41101652060c3b0291b508796e3e7e642
- Trigger Event: push

semanticembed 0.7.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

SemanticEmbed SDK

Why 6D?

Try It Now

Install

Quick Start

What It Finds That Other Tools Miss

The Six Dimensions

Use Cases

Notebooks

Extract Edges from Infrastructure

Trace ingestion (highest-fidelity edges)

Live observability connectors

AI agent frameworks

Blending sources cleanly

LLM-Powered Analysis

Structural Diff

Agent

What gets sent where

Example Graphs

React Components

Input Format

Documentation

License

Contact

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance