Skip to main content

Architecture inspection toolkit for slice topology, throughput, and bottleneck analysis

Project description

Tensorearch

Tensorearch is a topology-aware model-architecture inspection tool for analyzing how slice structure, coupling topology, and execution geometry shape throughput in large language models.

The name should be read as:

  • Tensor
  • Research
  • Search

It is not a training framework. It is an architecture debugger.

What It Combines

Tensorearch combines four ideas:

  • OpenAI Transformer Debugger style inspection workflow
  • OTAL-inspired propagation topology
  • weighted-chain coupling analysis
  • local-vector-space structural modeling

Core Question

Given a model execution graph, Tensorearch asks:

  • which slices dominate latency?
  • which couplings amplify congestion?
  • which structural regimes create poor throughput despite similar parameter count?
  • where is the bottleneck actually localized?
  • is the model structurally obedient, overly free, or strategically adaptive?

Conceptual Position

Transformer Debugger focuses on:

  • neurons
  • heads
  • SAE latents
  • behavioral circuits

Tensorearch shifts the inspection unit upward to:

  • layers
  • sub-layers
  • routing blocks
  • communication edges
  • memory-transfer boundaries
  • execution slices

The intended result is an architecture debugger rather than a neuron debugger.

Relation To Similar Tools

Tensorearch is not trying to replace existing tooling. It sits at a different layer.

Compared with Transformer Debugger

Transformer Debugger is strongest when the question is:

  • why did the model produce this token?
  • which head / neuron / latent drove this behavior?
  • what circuit implements this behavior?

Tensorearch is strongest when the question is:

  • why is this architecture slower than another one?
  • which slice is the true bottleneck?
  • where does coupling topology fail to become useful structure?
  • why does a new architecture underperform a simpler baseline?

Short version:

  • Transformer Debugger = behavior / circuit microscope
  • Tensorearch = architecture / topology microscope

Compared with Profilers

Classic profilers answer:

  • where time is spent
  • where memory is spent
  • which kernels are hot

Tensorearch uses profiler-like inputs, but pushes one level higher:

  • how slices couple
  • how bottlenecks propagate
  • which structural regimes create congestion
  • whether a path is merely expensive or strategically meaningful

So a profiler is an input source; Tensorearch is an interpretation layer over model structure.

Compared with Benchmark Dashboards

Benchmark dashboards tell you:

  • final loss
  • final quality
  • throughput
  • memory footprint

Tensorearch tries to explain why those numbers happen.

It is intended for cases where benchmark outcomes alone are too coarse, especially for:

  • new architectures
  • routing-heavy models
  • non-standard attention replacements
  • models that may be paying extra compute cost without getting matching capability gains

Compared with Generic Interpretability Tools

Many interpretability tools stay close to:

  • token attribution
  • feature activation
  • neuron or head saliency

Tensorearch instead reasons over:

  • execution slices
  • transport edges
  • local vector spaces
  • weighted chains
  • structural obedience / freedom / intelligence

That makes it more suitable for architecture diagnosis than for token-level behavioral explanation.

Practical Positioning

In practice, Tensorearch is best used together with other tools:

  • profiler: collect timing / memory / kernel data
  • benchmark suite: establish empirical performance
  • interpretability tooling: inspect token-level behavior
  • Tensorearch: connect those observations back to architecture-level failure modes

Its purpose is not to say only that a model is worse than a baseline. Its purpose is to localize why it is worse, and whether the failure comes from:

  • routing
  • coupling
  • propagation
  • communication
  • structural inefficiency
  • or a mismatch between extra structure and actual downstream gain

Implemented Scope

Tensorearch already includes:

  • JSON trace schema for model/system/slice/edge data
  • feature enrichment for:
    • local vector spaces
    • transport-scale estimation
    • obedience-target inference
  • weighted-chain propagation
  • bottleneck and congestion metrics
  • freedom / compliance / intelligence metrics
  • intervention engine
  • cross-architecture comparison engine
  • agent-friendly JSON CLI
  • source-level modular flow analysis for logic distribution uniformity
  • Windows exe packaging

Current Metrics

Current reports can emit:

  • predicted bottleneck
  • global coupling efficiency
  • global obedience score
  • global intelligence score
  • per-slice:
    • propagated cost
    • bottleneck index
    • congestion
    • freedom index
    • compliance index
    • route entropy
    • effect entropy
    • intelligence index
  • edge attribution rankings
  • source-diagnose flow metrics:
    • modular shrinking number
    • modular uniformity
    • topological uniformity
    • modular flow hotspots

CLI

8 commands:

Command Description
inspect Analyze a trace, report bottlenecks and compliance metrics
compare Compare two architectures side by side
ablate Simulate an intervention (remove slice, scale edge, etc.) and show delta
adapt Convert source code or architecture description into a trace (16 families)
space Classify source code into one of 15 model families (18-dim projection)
diagnose Audit source-level logic: entropy clusters, modular flow, mutation tracking
export Write inspect or compare results to a file
help Show complete usage guide with all commands, families, and examples
# Quick reference
python -m tensorearch help
python -m tensorearch inspect <trace.json> --json
python -m tensorearch compare <left.json> <right.json> --json
python -m tensorearch ablate <trace.json> --kind <kind> --target <target> --json
python -m tensorearch adapt --adapter source --input model.py --output trace.json
python -m tensorearch adapt --adapter transformer --input desc.json --output trace.json
python -m tensorearch adapt --adapter family --family diffusion_unet --input desc.json --output trace.json
python -m tensorearch space --source-file path/to/model.py --json
python -m tensorearch diagnose --source-file path/to/script.py --json
python -m tensorearch export --mode inspect --left <trace.json> --output out.json --json

Packaged Windows CLI:

dist\tensorearch.exe help
dist\tensorearch.exe inspect examples\sample_trace.json --json
dist\tensorearch.exe space --source-file model.py --json
dist\tensorearch.exe diagnose --source-file script.py --json
dist\tensorearch.exe adapt --adapter source --input model.py --output trace.json

For source-level diagnose, each module/function/script scope now includes a modular_flow profile:

  • assessment
    • uniform_flow
    • mixed_flow
    • concentrated_flow
    • insufficient_signal
  • modular_shrinking_number
    • higher means logic events collapse into fewer modular / topological regions
  • modular_uniformity
    • how evenly logic events spread under modulo bucketing
  • topological_uniformity
    • how evenly logic events spread across source span bins
  • hotspots
    • modular residues or topological bins with unusually dense activity

This is especially useful when a system is "stuck but not obviously broken". In that situation, Tensorearch should be used as an instrumentation layer first:

  • do not keep tuning weights blindly
  • diagnose which fine-grained scopes have concentrated or mixed flow
  • separate training-loop bottlenecks from representation-layer bottlenecks
  • only then decide whether to change optimization, features, or core logic

Global CLI behavior:

  • --help
  • -v / --verbose
  • --json

Real Trace Workflow

Tensorearch is designed to consume either:

  • hand-authored trace JSON
  • adapted JSON from another source
  • real profiling traces exported from training scripts

Current integration work already connects to APT-Transformer quickcook profiling so short profiling runs can emit:

  • tensorearch_trace.json

This enables direct comparison of real short-profile traces from:

  • Transformer
  • Oscillator

Space Projection: Quadrupole + Multi-Family

Tensorearch projects source code into a structural coordinate system for architecture classification.

Legacy Quadrupole Axes

The original 4-axis projection for LLM-centric architectures:

  • X: residual
  • Y: latent-attention
  • Z: kv-transport
  • W: propagation

Extended Family Axes (14 dimensions)

For architectures beyond LLMs, Tensorearch now supports 14 additional family axes:

Axis Family Example Models
D diffusion-denoising Stable Diffusion, DDPM
T timestep-conditioning diffusion schedulers
U multiscale-unet U-Net encoder-decoder
A adapterization LoRA, hypernetworks, PEFT
R runtime-wrapper Triton kernels, quantization bridges
V temporal-video AnimateDiff, UNet3D
O audio-spectral mel spectrograms, vocoders
G 3d-generative NeRF, Gaussian splatting, point clouds
S speech-language Whisper, ASR/TTS
M world-model MDRNN, environment simulators
L multimodal-alignment BLIP-2, Q-Former, VLMs
H graph-message-passing GCN, GAT, PyG
I vision-detection Detectron2, Mask R-CNN, FPN
B bio-sequence ESM, Evoformer, protein folding

Family Classification

The space command classifies source files into one of 15 families:

python -m tensorearch space --source-file path/to/model.py --json

Output includes:

  • quadrupole_projection: legacy 4-axis (X/Y/Z/W)
  • space_family_projection: 15-family scores + classification
  • classification: the dominant family label

Verified Results

Source Family
Stable Diffusion U-Net diffusion-unet dominant
SD attention.py latent-attention dominant
Microsoft LoRA layers.py adapterization dominant
APT-Transformer wrapper runtime-wrapper dominant
diffusers UNet3D video-temporal dominant
audio-diffusion mel.py audio-spectral dominant
torch-splatting train.py 3d-generative dominant
GPT-SoVITS models.py audio-spectral dominant
Coqui TTS VITS audio-spectral dominant
OpenAI Whisper model.py speech-language dominant
world-models MDRNN world-model dominant
GameGAN trainer world-model dominant
BLIP-2 modeling multimodal-alignment dominant
PyG GCNConv graph-message-passing dominant
Detectron2 RCNN vision-detection dominant
ESM-2 bio-sequence dominant

Current Architectural Findings

On the current QuickCook + HLBD line:

  • Transformer remains the stronger practical baseline in long-run quickcook quality / throughput.
  • Oscillator currently pays extra cost for its phase/propagation path.
  • After finer-grained profiling, the main local hotspots are:
    • Transformer: score
    • Oscillator: adj

This is exactly the kind of localization Tensorearch is meant to provide.

Repository Layout

  • docs/MATH_MODEL.md
    • core mathematical model
  • docs/TDB_MAPPING.md
    • Transformer Debugger to Tensorearch mapping
  • docs/CLI_USAGE.md
    • full CLI usage and JSON behavior
  • docs/TRANSFORMER_TRACE_CHECKLIST.md
    • transformer trace checklist
  • docs/OSCILLATOR_TRACE_CHECKLIST.md
    • oscillator trace checklist
  • docs/AGENT_JSON_USAGE.md
    • agent / pipeline JSON consumption guide
  • docs/CLAUDE_DEVELOPMENT_GUIDE.md
    • collaboration guide for Claude
  • src/tensorearch/
    • package source
  • tests/
    • schema, fixtures, CLI, and comparison tests
  • examples/
    • sample traces and adapter inputs
  • CLAUDE_READY_INDEX.md
    • quick handoff index for agents

Testing Status

The current repository test suite passes locally with:

  • 22 passed (as of 2026-04-14)

This includes:

  • fixture traces
  • CLI subcommands (inspect, compare, ablate, export, adapt, space, diagnose)
  • comparison flows
  • intervention flows
  • export / adapt behavior
  • space family classification
  • diagnose modular flow profiles
  • short boolean helper entropy correction

Intended Outputs

Tensorearch emits:

  • slice bottleneck rankings
  • coupling-congestion maps
  • architecture-level compliance / intelligence summaries
  • intervention deltas
  • cross-architecture comparison reports
  • agent-friendly JSON summaries for downstream tooling

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tensorearch-1.0.0.tar.gz (43.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tensorearch-1.0.0-py3-none-any.whl (36.9 kB view details)

Uploaded Python 3

File details

Details for the file tensorearch-1.0.0.tar.gz.

File metadata

  • Download URL: tensorearch-1.0.0.tar.gz
  • Upload date:
  • Size: 43.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for tensorearch-1.0.0.tar.gz
Algorithm Hash digest
SHA256 adcec92c98a27a18d3aac19b6a69239ff09c777aab631c59aa28692f3fabfcea
MD5 1c15a7cdc1cb822f40de4bc43637b90f
BLAKE2b-256 41e705cd03f8e3c8aed07b31afe562e00bfe2f7456c78887c26805696f871d5b

See more details on using hashes here.

File details

Details for the file tensorearch-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: tensorearch-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 36.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for tensorearch-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cd1c3d40426fa983a07d5041e23aee5811fe8569374327933b347d798aee18d9
MD5 4fb5ce827091f5f38f1b1a430a1b7ac1
BLAKE2b-256 bc93d9a955ca71772615d94f9821e58e8c90f3133a8643ea7e9f26aaa64111f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page