Skip to main content

M-POLY-VTD AI Engine (Loom v0.79.0) — 21 Numerical Types, WebGPU, Transformer Inference, and 100% Determinism

Project description

welvet — Loom Python Bindings

PyPI version PyPI downloads License Python

M-POLY-VTD AI Engine (Loom v0.79.0) — Python bindings: 21 numerical types, volumetric grids, CPU/GPU training, DNA/NEAT, native save/reload.

welvet wraps the Loom C-ABI with zero runtime Python dependencies. The PyPI wheel ships prebuilt native libraries for every supported OS/arch; at import time only the matching binary is loaded (linux_amd64/welvet.so, windows_amd64/welvet.dll, etc.).

Bedrock validation (v0.79): seven-layer CPU suite (10 layer types × 21 dtypes × train × serialize). See docs/bedrock_validation.md.


Install

pip install welvet
import welvet
print(welvet.__version__)  # 0.79.0

Supported platforms (64-bit): Linux (x86-64, ARM64), macOS (x86-64, ARM64, universal fallback), Windows (x86-64, ARM64), Android (ARM64, x86-64), iOS (device / simulator / XCFramework when built into the wheel).

Build from source (monorepo)

PyPI wheels ship prebuilt .so / .dylib / .dll. To run latest main against your checkout:

Option A — build + copy in one step

cd welvet/cabi/internal/build
./build_unix.sh linux amd64    # native Linux x86_64 (+ --test for CABI smoke)
# or: ./build_unix.sh all       # every platform you have cross-toolchains for

build_unix.sh already copies dist/*welvet/python/src/welvet/.

Option B — you already compiled into dist/ (or copied builds there)

cd welvet/cabi/internal/build
./copy_to_python.sh              # default: ./dist → python/src/welvet/
# or if your tree is elsewhere:
./copy_to_python.sh ../../dist   # e.g. welvet/cabi/dist from build_macos.sh

One-liner from the Python folder:

cd welvet/python
./copy_from_cabi.sh              # copy + pip install -e .

Then install / verify

cd welvet/python
pip install -e .                 # editable install picks up src/welvet/*.so
python3 -m welvet.cabi_verify    # C-ABI symbols + smoke
python3 examples/run_all.py      # README examples
python3 benchmark_seven_layer.py --layer Dense

Artifacts land as python/src/welvet/linux_amd64/welvet.so (etc.). Without that, import welvet fails with “native library not found”.

Publish to PyPI (maintainers)

pip install build twine
cd welvet/cabi/internal/build && ./copy_to_python.sh   # all platforms → src/welvet/
cd ../../../python
pip install -e . && python3 examples/run_all.py        # smoke before release
./publish.sh                                           # python3 -m build + twine upload

The wheel is multi-platform: it contains every */welvet.{so,dylib,dll} you copied; each machine only loads its own folder.


Examples (runnable)

Scripts in examples/ mirror the snippets below. Run one file or verify all:

cd welvet/python
pip install -e .
python3 examples/01_dense_forward.py
python3 examples/run_all.py          # runs 01–05
Script What it shows
01_dense_forward.py Volumetric JSON → forward_polymorphic + forward
02_morph_and_train.py morph(INT8), CPU MC train() with shapes
03_save_reload.py serialize() / Network.deserialize()
04_mha_forward.py MHA with [batch, seq, d_model]
05_dna_compare.py dna() + compare_dna()

Quick start

Layers live on a 3D grid (z, y, x, l) — see docs/overview.md.

1. Build and forward

from welvet import Network

net = Network({
    "id": "demo",
    "depth": 1, "rows": 1, "cols": 1, "layers_per_cell": 2,
    "layers": [
        {"z": 0, "y": 0, "x": 0, "l": 0, "type": "dense",
         "dtype": "float32", "input_height": 16, "output_height": 8, "activation": "relu"},
        {"z": 0, "y": 0, "x": 0, "l": 1, "type": "dense",
         "dtype": "float32", "input_height": 8, "output_height": 4, "activation": "linear"},
    ],
})

inp = [0.1] * 16
out = net.forward_polymorphic(inp, [1, 16])   # preferred: explicit shape
print(out)                                    # length 4

net.free()

2. Morph precision (21 types)

from welvet import DType, Network

# ... same net as above ...
net.morph(0, DType.INT8)          # layer index 0 only
out_q = net.forward_polymorphic(inp, [1, 16])

Use net.morph_all(DType.INT8) to morph every layer that has weights (skips Residual / Softmax).

3. Training (CPU, shape-aware)

from welvet import Network, train

net = Network({...})  # single dense 16→8
inp, tgt = [0.1] * 16, [0.5] * 8
in_shape, out_shape = [1, 16], [1, 8]

net.set_training_mode(2)   # 1 = CPU SC, 2 = CPU MC (multicore on native C-ABI)
losses = train(
    net, [[inp]], [[tgt]],   # [[batch of input rows]], [[batch of target rows]]
    epochs=10, learning_rate=0.05, mode=2,
    input_shape=in_shape, target_shape=out_shape,
)
print(losses[-1])
net.free()

4. Save / reload

wire = net.serialize()
copy = Network.deserialize(wire)
# ... forward on copy, then copy.free()

Full scripts: examples/03_save_reload.py.


Core Concepts

Supported Layer Types

Type Description
dense Fully connected / linear layer
mha Multi-Head Attention (with RoPE, GQA/MQA, Causal Masking)
swiglu SwiGLU gated MLP (LLaMA-style)
rmsnorm Root Mean Square Normalization
layernorm Layer Normalization
cnn1 / cnn2 / cnn3 1D / 2D / 3D Convolution
convtransposed1d / 2d / 3d Transposed Convolution
rnn / lstm Recurrent layers
embedding Token embedding lookup
kmeans Differentiable K-Means clustering
softmax 10 softmax variants (Standard, Gumbel, Masked, Entmax, ...)
parallel MoE / ensemble branching
sequential Nested sequential sub-graph
residual Residual / skip connection

21 Numerical Types

float64, float32, float16, bfloat16, int64/32/16/8, uint64/32/16/8, fp8_e4m3, fp8_e5m2, int4, uint4, fp4_e2m1, int2, uint2, ternary, binary

Morph a layer's precision at runtime (zero realloc when cached):

from welvet import DType, morph_layer

morph_layer(net.handle, layer_index=0, target_dtype=DType.INT8)
# or: net.morph(0, DType.INT8)

WebGPU Acceleration

import welvet
from welvet import Network

net = Network({...})

# Upload weights to GPU
welvet.init_wgpu(net._handle)
welvet.sync_to_gpu(net._handle)

# GPU forward pass
output = welvet.forward_wgpu(net._handle, inputs)

net.free()

Numerical tiling (SC vs MC)

v0.79+ uses specialized tiling profiles to maximize throughput:

  • SC (Single-Core): Optimized for Edge/WASM/Small NPUs.
  • MC (Multi-Core): Optimized for high-bandwidth L1/L2 caches (Ryzen, RTX, M4).

GPU backward training is live for Dense, RMSNorm, CNN 1D/2D/3D — 17x–65x speedup over CPU on real workloads.


DNA & Network Comparison

Extract a topological fingerprint and compare networks:

from welvet import Network, compare_dna

dna_a = net_a.dna()
dna_b = net_b.dna()
result = compare_dna(dna_a, dna_b)
print(f"Overlap: {result['OverallOverlap']:.4f}")
print(f"Logic shifts: {len(result.get('LogicShifts', []))}")

NEAT Evolution

Genetically evolve a population of networks:

from welvet import (
    default_neat_config, neat_mutate,
    new_neat_population, neat_population_size,
    neat_population_get_network, neat_population_evolve,
    neat_population_best, neat_population_best_fitness,
    neat_population_summary, free_neat_population,
    build_network, free_network,
)

# Create a seed network
seed = Network({
    "id": "seed", "depth": 1, "rows": 1, "cols": 1, "layers_per_cell": 2,
    "layers": [
        {"z": 0, "y": 0, "x": 0, "l": 0, "type": "dense", 
         "input_height": 32, "output_height": 32},
        {"z": 0, "y": 0, "x": 0, "l": 1, "type": "dense", 
         "input_height": 32, "output_height": 1},
    ]
})

cfg = default_neat_config(32)
pop = seed.create_population(size=16, config=cfg)

for gen in range(5):
    fitnesses = [0.5 + 0.1 * i for i in range(pop.size())]
    pop.evolve(fitnesses)
    print(pop.summary(gen))

best = pop.best()
print(f"Best fitness: {pop.best_fitness():.6f}")
best.free()
pop.free()
seed.free()

DNA Splice / Genetic Crossover

Combine two parent networks into a child:

from welvet import default_splice_config, splice_dna, splice_dna_with_report

cfg = default_splice_config()
cfg["CrossoverMode"] = "blend"   # "blend" | "point" | "uniform"
cfg["FitnessA"] = 0.8
cfg["FitnessB"] = 0.5

child_handle = splice_dna(parent_a._handle, parent_b._handle, cfg)

# Or get a full diagnostic report
report = splice_dna_with_report(parent_a._handle, parent_b._handle, cfg)
print(f"Layers blended: {report['blended_count']}")
child_handle = report["child_handle"]

Step mesh (online learning)

The volumetric 3D grid supports clock-cycle accurate propagation with spatial feedback loops:

state = welvet.create_step_state(net._handle)
welvet.set_input(state, inputs)
welvet.mesh_step(net._handle, state)
output = welvet.get_output(state, layer_idx=-1)
welvet.free_step_state(state)

Training

  • train(net, batches, …) — poly LoomTrain, shape-aware (used in seven-layer suite). See examples/02_morph_and_train.py.
  • train_network(net, inputs, targets, …) — step-mesh clock-cycle path.

GPU backward dispatch and benchmarks: benchmark_training.py, benchmark_seven_layer.py.


Tween (neural target propagation)

An alternative to backpropagation using localized Hebbian gap-based learning. We call this tween in APIs; papers often say target propagation.

from welvet import (
    create_tween_state,
    get_default_tween_config,
    tween_forward,
    tween_backward,
)

_ = get_default_tween_config()
handle = create_tween_state(net.handle)
tween_forward(net.handle, handle, inputs)
tween_backward(net.handle, handle, targets)
net.free()

LLM Inference

Load a SafeTensors model and run token generation:

from welvet import Network, Tokenizer, sequential_forward

net = Network.from_file("path/to/model.safetensors")
tok = Tokenizer("path/to/tokenizer.json")

ids = tok.encode("Hello, world!")
output = net.forward([float(i) for i in ids])
tok.free()
net.free()

Seven-layer validation (Python → CABI)

Same bedrock gate as Lucy and @openfluke/welvet (WASM). Logic in seven_layer_spec.py; engine work stays in the .so.

cd welvet/python
pip install -e .
python3 benchmark_seven_layer.py --layer Dense
python3 benchmark_seven_layer.py --layer Embedding
python3 benchmark_seven_layer.py --layer Residual
# full suite (slow): python3 benchmark_seven_layer.py

Platform Support

Platform Architecture Binary
Windows x86-64 welvet.dll
Windows ARM64 welvet.dll
Linux x86-64 welvet.so
Linux ARM64 welvet.so
Linux ARM (v7) welvet.so
macOS ARM64 (M-series) welvet.dylib
macOS x86-64 welvet.dylib
macOS Universal welvet.dylib
Android ARM64 welvet.so
Android x86-64 welvet.so
iOS ARM64 (device) welvet.dylib
iOS Simulator (x86-64) welvet.dylib
iOS Simulator (ARM64) welvet.dylib
iOS XCFramework (all slices) .xcframework

At runtime, import welvet resolves welvet/<platform>_<arch>/welvet.{so,dylib,dll} for the current machine (see src/welvet/utils.py).


Version alignment

Component Version
Loom engine (C-ABI / poly) 0.79.0 — Bedrock Validation
PyPI welvet 0.79.0
npm @openfluke/welvet 0.79.0

Links


License

Apache 2.0 — see LICENSE.

Loom: Universal precision. Volumetric freedom. Bedrock performance.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

welvet-0.79.0.tar.gz (62.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

welvet-0.79.0-py3-none-any.whl (64.2 MB view details)

Uploaded Python 3

File details

Details for the file welvet-0.79.0.tar.gz.

File metadata

  • Download URL: welvet-0.79.0.tar.gz
  • Upload date:
  • Size: 62.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for welvet-0.79.0.tar.gz
Algorithm Hash digest
SHA256 b7e8a1fda09ebce38f1de4544f6ea14c5937efeddfe65bc5b149d9b0eab21840
MD5 aecf9aefd0bba25f22b7d9dfac8c578e
BLAKE2b-256 68b972b01854a3a0fd7fcc3af7b05610f087c8bb24e0b4299e5b84c2eb71e85c

See more details on using hashes here.

File details

Details for the file welvet-0.79.0-py3-none-any.whl.

File metadata

  • Download URL: welvet-0.79.0-py3-none-any.whl
  • Upload date:
  • Size: 64.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for welvet-0.79.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c30fcef5aa7564ccb4e41fc74b83c5d130acce50044ff78ab520220415f6ef79
MD5 2128402c8667012bb814b20b5e4b1e58
BLAKE2b-256 45dac584f17a797d973c2a10dc76f22ede0282fa40f8314473a758b3fb068f5b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page