Skip to main content

Universal Deep Learning Inference Engine — execute any AI model without model-specific code

Project description

NeuroBrix Logo

NeuroBrix

Universal Deep Learning Inference Engine
One engine. Any model. Any modality. Zero model-specific code.

PyPI Python 3.10 | 3.11 | 3.12 License GitLab NeuroBrix Hub

Hub  |  Docs  |  PyPI  |  GitLab  |  Roadmap  |  Contributing


The Problem

The AI inference landscape is fragmented. Every model family requires its own stack, its own pipeline code, its own deployment tooling. Want to run a diffusion model? Learn ComfyUI or write custom diffusers pipelines. Need an LLM? Pick between Ollama, vLLM, llama.cpp — each with its own limitations. Audio? Video? Start from scratch.

NeuroBrix eliminates this fragmentation entirely.

One engine. One CLI. One container format. Import a model, run it. The runtime doesn't know or care whether it's executing a diffusion transformer, a mixture-of-experts LLM, a speech recognizer, or a video generator. It sees tensors, graphs, and execution plans — nothing else.


Why NeuroBrix?

Capability Ollama llama.cpp vLLM ComfyUI NeuroBrix
LLMs Yes Yes Yes -- Yes
Image generation -- -- -- Yes Yes
Video generation -- -- -- -- Yes
Audio (STT + TTS) -- -- -- -- Yes
Multimodal (understand + generate) -- -- -- -- Yes
Mixture-of-Experts -- -- Yes -- Yes
Multi-GPU auto-allocation -- -- Yes -- Yes
Cross-platform (Linux, Windows, macOS) Yes Yes -- -- Yes
Universal model format -- GGUF (LLM only) -- -- NBX (any model)
No model-specific code -- -- -- -- Yes

Other tools solve one piece of the puzzle. NeuroBrix solves the whole puzzle.


Installation

Step 1: Install PyTorch with CUDA

# For CUDA 12.4 (RTX 30xx, 40xx, A100, H100)
pip install torch --index-url https://download.pytorch.org/whl/cu124

# For CUDA 12.1
pip install torch --index-url https://download.pytorch.org/whl/cu121

# For CUDA 11.8 (older GPUs like V100)
pip install torch --index-url https://download.pytorch.org/whl/cu118

Verify CUDA is available:

python -c "import torch; print(torch.cuda.is_available())"  # Should print: True

Step 2: Install NeuroBrix

pip install neurobrix

Platform Support

Platform GPU Support Notes
Linux CUDA, Triton kernels Full support, recommended for production
Windows CUDA Fully supported. Triton not available on Windows
macOS CPU only MPS/Metal support planned

Requirements: Python 3.10+ / PyTorch 2.1+ with CUDA / NVIDIA GPU


Quick Start

# Import a model from the hub
neurobrix import Vendor/Model_Name --no-keep

# Generate an image (hardware auto-detected)
neurobrix run --model Model_Name \
    --prompt "A sunset over mountains" --steps 20

# Or serve for instant repeat inference
neurobrix serve --model Model_Name
neurobrix run --prompt "A robot painting on canvas" --output robot.png
neurobrix stop

Serve Mode (Hot Run Mode , Recommended)

Loads weights into VRAM once and keeps the model warm. Every subsequent request runs with zero startup overhead.

neurobrix serve --model Model_Name

# Image generation (instant — model already loaded)
neurobrix run --prompt "A cat in a hat" --output cat.png

# LLM interactive chat
neurobrix chat --temperature 0.7

# Stop and free VRAM
neurobrix stop

Usage by Model Family

Each model family uses different CLI flags and defaults. Hardware is always auto-detected.

Image Generation

neurobrix run --model Sana_1600M_4Kpx_BF16 \
    --prompt "A sunset over mountains" \
    --steps 20 --cfg 5.0 --seed 42 \
    --height 1024 --width 1024 \
    --output sunset.png
Flag Description Default
--prompt Text description of the image to generate Required
--steps Number of diffusion steps (more = higher quality, slower) Model-dependent (20-50)
--cfg Classifier-free guidance scale (higher = closer to prompt) Model-dependent (4.5-7.5)
--height / --width Output resolution in pixels Model-dependent (1024-4096)
--seed Random seed for reproducible results Random
--output Output file path output.png

Large Language Models

# Single-shot
neurobrix run --model deepseek-moe-16b-chat \
    --prompt "Explain quantum computing in simple terms" \
    --temperature 0.7 --max-tokens 512 \
    --output response.txt

# Interactive chat (requires serve mode)
neurobrix serve --model deepseek-moe-16b-chat
neurobrix chat --temperature 0.7
neurobrix stop
Flag Description Default
--prompt Input text or question Required
--temperature Sampling randomness (0 = deterministic, 1 = creative) Model-dependent (0.6-1.0)
--max-tokens Maximum tokens to generate Model-dependent (512-32768)
--repetition-penalty Penalize repeated tokens (1.0 = off) 1.0
--output Save response to file stdout

Audio — Speech-to-Text (STT)

neurobrix run --model whisper-large --audio recording.wav
Flag Description Default
--audio Path to audio file (WAV, FLAC, MP3) Required
--output Save transcription to file stdout

Note: STT models use --audio, not --prompt. Temperature defaults to 0.0 (greedy decoding) for accurate transcription.

Audio — Text-to-Speech (TTS)

neurobrix run --model Kokoro-82M \
    --prompt "Hello, welcome to NeuroBrix!" \
    --output speech.wav
Flag Description Default
--prompt Text to synthesize into speech Required
--temperature Sampling variation (lower = more consistent) Model-dependent (0.6)
--output Output audio file path output.wav

Audio — Speech Understanding (audio_llm)

Audio-conditioned LLMs take both an audio file and a text instruction — they answer questions or transcribe-on-demand, not blind transcription.

neurobrix run --model Voxtral-Mini-3B-2507 \
    --audio meeting.wav \
    --prompt "Transcribe this audio." \
    --output transcript.txt
Flag Description Default
--audio Path to audio file Required
--prompt Instruction (e.g. "Transcribe this audio.", "Summarize the call.") Required
--output Save the text answer to file stdout

STT vs audio_llm: plain STT models (Whisper, Parakeet) need only --audio. audio_llm models (Voxtral, Canary-Qwen, Granite-Speech) need --audio and --prompt.

Image Upscaling (Super-Resolution)

Upscalers take an input image and emit a higher-resolution one (the scale factor is per-model). Use the dedicated upscale command, which also exposes --mode directly:

neurobrix upscale --model hat-l-x4 \
    --input photo.png --output photo_4x.png \
    --mode compiled
Flag Description Default
--model Upscaler model name (e.g. hat-l-x4, real-esrgan-x4, swinir-classical-x4) Required
--input Input image path (PNG/JPEG) Required
--output Output image path (PNG) Required
--mode Execution mode: compiled / sequential / triton / triton-sequential compiled

Via the universal run command, the same models take --input-image instead of --input.

Video Generation

neurobrix run --model SANA-Video_2B_720p \
    --prompt "A cat playing piano" \
    --steps 30 --cfg 5.0 --seed 42 \
    --output video.mp4
Flag Description Default
--prompt Text description of the video to generate Required
--steps Number of diffusion steps Model-dependent (20-50)
--cfg Guidance scale Model-dependent (5.0)
--seed Random seed Random
--output Output file path output.mp4

Execution Modes — Two Branches, Four Modes

Every model in NeuroBrix can run through two fully independent compute branches, each in a sequential (op-by-op) and a compiled (fused hot-loop) variant — four modes in total. The branches share the same .nbx container and the same Prism placement plan, but they do not share compute code. They are deliberately kept as parallel paths.

Flag Branch Variant Compute substrate
--compiled (default) PyTorch compiled hot-loop torch + cuDNN / cuBLAS / cuFFT
--sequential PyTorch op-by-op torch ATen, one op at a time
--triton Triton compiled hot-loop NeuroBrix @triton.jit kernels + NBXTensor
--triton-sequential Triton op-by-op NeuroBrix @triton.jit kernels, one op at a time

If you pass no mode flag, NeuroBrix runs --compiled.

The PyTorch branch is the pragmatic bridge to the mature PyTorch + NVIDIA-library ecosystem. --sequential is the oracle: it dispatches each ATen op to native PyTorch one at a time, so it is the reference that proves a model's graph and trace are sound. --compiled fuses that same graph into a zero-overhead execution sequence (pre-resolved tensor slots, direct SDPA, integer-indexed memory arena) — this is the production path.

The Triton branch is the NeuroBrix value-add: 100% NeuroBrix Triton kernels through NBXTensor, with no torch.*, no cuDNN/cuBLAS on the compute path — hardware-universal and vendor-agnostic. --triton-sequential is the kernel oracle (op-by-op, diffed against the PyTorch oracle to validate each kernel in isolation); --triton is its fused, zero-overhead form.

Which to use:

  • Just run a model → the default (--compiled). Fastest PyTorch path.
  • Vendor-agnostic / no NVIDIA-library lock-in → --triton.
  • Debugging a numerical discrepancy → compare --sequential (PyTorch oracle) against --triton-sequential (kernel oracle) op-by-op.
# Same model, four ways:
neurobrix run --model Sana_1600M_1024px_MultiLing --prompt "a red fox" --output fox.png                      # compiled (default)
neurobrix run --model Sana_1600M_1024px_MultiLing --prompt "a red fox" --sequential        --output fox.png  # PyTorch oracle
neurobrix run --model Sana_1600M_1024px_MultiLing --prompt "a red fox" --triton            --output fox.png  # Triton compiled
neurobrix run --model Sana_1600M_1024px_MultiLing --prompt "a red fox" --triton-sequential --output fox.png  # Triton oracle

All four modes are validated to produce the same output (modulo numerics) for the models in the matrix below. The four mode flags are also available on neurobrix serve and neurobrix upscale.


NeuroBrix Hub & Model Management

Models are hosted on the NeuroBrix Hub and managed locally through a two-tier storage system:

  • Store (~/.neurobrix/store/) — downloaded .nbx archives (compressed)
  • Cache (~/.neurobrix/cache/) — extracted models ready for inference

Browse & Import

# Browse the full hub catalog
neurobrix hub

# Filter by family
neurobrix hub --category IMAGE
neurobrix hub --category LLM
neurobrix hub --category AUDIO
neurobrix hub --category VIDEO

# Search by name
neurobrix hub --search sana

# Import a model (downloads .nbx → extracts to cache)
neurobrix import vendor/model_name

# Import and delete the .nbx archive to save disk space
neurobrix import pixart/sigma-xl-1024 --no-keep

# Force re-import (overwrites existing)
neurobrix import Vendor/Model_Name --force

List & Manage

# List installed models in cache (ready to run)
neurobrix list

# List downloaded .nbx archives in store
neurobrix list --store

# Show system info: installed models, hardware, disk usage
neurobrix info --models

# Remove a model from cache
neurobrix remove Model_Name

# Remove from both store and cache
neurobrix remove Model_Name --all

# Clean everything — free all disk space
neurobrix clean --all -y

How It Works

neurobrix import Vendor/Model_Name --no-keep
  │
  ├─ 1. Download .nbx from neurobrix.es → ~/.neurobrix/store/
  ├─ 2. Extract to ~/.neurobrix/cache/Model_Name/
  ├─ 3. Validate manifest, components, weights
  └─ 4. Delete .nbx from store (--no-keep)

neurobrix run --model Model_Name --prompt "..."
  │
  └─ Reads directly from cache — zero extraction overhead

Supported Models

NeuroBrix is a runtime engine — it executes models but does not train or create them. All models listed below are the work of their respective authors and are subject to their original licenses. Users must review and accept each model's license before use.

Image Generation

Model Author License Size
Sana 1600M 4K NVIDIA / MIT Apache 2.0 12 GB
PixArt-Sigma-XL-2-1024-MS PixArt OpenRAIL++ 20 GB
PixArt-XL-2-1024-MS PixArt OpenRAIL++ 20 GB
Flex.1-alpha Ostris Apache 2.0 24 GB
Janus-Pro-7B DeepSeek MIT 14 GB

Video Generation

Model Author License Size
SANA-Video 2B 720p NVIDIA / MIT Apache 2.0 17 GB

Audio (Speech-to-Text + Text-to-Speech)

Model Author License Size Type
Whisper Large OpenAI MIT 6 GB STT
Whisper Large V3 Turbo OpenAI MIT 3 GB STT
Parakeet TDT 1.1B NVIDIA CC-BY-4.0 4 GB STT
Voxtral Mini 3B Mistral AI Apache 2.0 7 GB audio_llm
Canary-Qwen 2.5B NVIDIA CC-BY-4.0 10 GB audio_llm
Granite-Speech 3.3 8B IBM Apache 2.0 16 GB audio_llm
Orpheus 3B (SNAC) Canopy Labs Apache 2.0 7 GB TTS
Kokoro 82M Hexgrad Apache 2.0 0.3 GB TTS
VibeVoice 1.5B Microsoft MIT 6 GB TTS
OpenAudio S1 Mini Fish Audio CC-BY-NC-SA-4.0 2 GB TTS
Chatterbox Resemble AI MIT 1 GB TTS

Image Upscalers (Super-Resolution)

Model Author License Scale
HAT-L x4 XPixelGroup Apache 2.0 4x
Real-ESRGAN x4 Xintao Wang et al. BSD-3-Clause 4x
SwinIR Classical x4 Jingyun Liang et al. Apache 2.0 4x
Swin2SR Classical x4 Marcos V. Conde et al. Apache 2.0 4x

Large Language Models

Model Author License Size
DeepSeek-MoE-16B DeepSeek MIT 31 GB
Qwen3-30B-A3B-Thinking Alibaba / Qwen Apache 2.0 57 GB
TinyLlama 1.1B TinyLlama Apache 2.0 4 GB

Non-commercial: OpenAudio S1 Mini uses CC-BY-NC-SA-4.0 — non-commercial use only. Check each model's license on the NeuroBrix Hub before commercial deployment.

Browse the full catalog and license details: neurobrix.es/models


The NBX Format

NeuroBrix introduces .nbx — a universal container format for AI models. Where GGUF is limited to LLMs and ONNX struggles with dynamic architectures, NBX captures any computation graph with full fidelity.

model.nbx (self-contained archive)
  ├── manifest.json              Model metadata and component list
  ├── topology.json              Execution flow and component connections
  ├── runtime/
  │   ├── defaults.json          Generation parameters, model config
  │   └── variables.json         Runtime tensor allocation rules
  ├── components/
  │   ├── text_encoder/          Text conditioning (CLIP, T5, etc.)
  │   │   ├── graph.json         Computation graph (TensorDAG)
  │   │   ├── profile.json       Component config
  │   │   └── weights/           Safetensors shards
  │   ├── transformer/           Core model (DiT, UNet, decoder, etc.)
  │   │   ├── graph.json
  │   │   ├── profile.json
  │   │   └── weights/
  │   ├── vae/                   Image/video decoder (diffusion models)
  │   │   ├── graph.json
  │   │   ├── profile.json
  │   │   └── weights/
  │   └── ...                    Any number of components per model
  └── modules/
      └── tokenizer/             Tokenizer files

The component structure adapts to each model: diffusion models have text_encoder + transformer + vae, LLMs have model + lm_head, audio models have encoder + decoder, etc.

What makes NBX different:

  • Framework-independent — no dependency on PyTorch, TensorFlow, or any framework at runtime interpretation level
  • Self-describing — the container carries everything needed to execute
  • Modality-agnostic — the same format works for diffusion, LLMs, MoE, audio, video, and any future architecture
  • Deterministic — the execution graph is fully resolved at build time

Prism: Automatic Hardware Allocation

You describe your hardware. NeuroBrix figures out the rest. Hardware is auto-detected — the --hardware flag is optional.

Strategy Description
single_gpu Model fits entirely in one GPU
single_gpu_lifecycle Components loaded/unloaded sequentially
pipeline_parallel Per-layer sequential fill across GPUs
block_scatter Block-level distribution across GPUs
weight_sharding Weight-file distribution across GPUs
lazy_sequential Stream components through limited VRAM
zero3 CPU offload with GPU compute

GPU support: NVIDIA, AMD, Intel, Apple (planned), plus Tenstorrent, Moore Threads, Biren, Iluvatar, Hygon DCU, Cambricon detection.


Architecture

.nbx Container ──> Prism Solver ──> Execution Plan ──> CompiledSequence ──> Output
                   (hardware)       (strategy)         (zero-overhead)

The runtime compiles the entire execution graph at load time into a CompiledSequence — a zero-overhead execution path with pre-resolved tensor slots, automatic mixed precision, direct SDPA calls, and integer-indexed memory arena. No dict lookups per step. No interpretation overhead.


Roadmap

Done

  • CompiledSequence — zero-overhead graph execution engine
  • Prism solver — automatic multi-GPU hardware allocation (7 strategies)
  • Image family — 6 diffusion models (PixArt, Sana, Flex, Janus)
  • LLM family — MoE (DeepSeek), dense (TinyLlama, Qwen3)
  • Audio family — 11 models across STT, audio_llm, and TTS (5 flow handlers)
  • Upscalers — 4 super-resolution families (HAT, Real-ESRGAN, SwinIR, Swin2SR)
  • Video family — SANA-Video 720p (first of 10 planned)
  • Cross-platform — Linux, Windows, macOS support
  • Hardware auto-detection — 10 GPU vendors, CPU-only fallback
  • Persistent serving — warm daemon with chat interface
  • DtypeEngine — automatic mixed precision (AMP)
  • TilingEngine — universal spatial tiling for large inputs
  • NBX Hub — model registry at neurobrix.es

Next

  • Video family expansion — remaining 9 models (Wan2.1, CogVideoX, Allegro, Mochi, Open-Sora)
  • Vision-Language Models — multimodal understanding at scale
  • Quantization — INT8/INT4 with NBX-native support
  • Apple Silicon — Metal/MPS backend
  • 3D generation — mesh and NeRF models
  • Embeddings — text and image embedding models
  • NeuroBrix Studio — desktop GUI for model management

CLI Reference

# Serving (recommended) — hardware auto-detected
neurobrix serve --model <name>
neurobrix chat [--temperature T] [--max-tokens N]
neurobrix run --prompt <text> [--output file] [--steps N] [--cfg F] [--seed N]
neurobrix stop

# Single-shot — hardware auto-detected
neurobrix run --model <name> --prompt <text> [options]

# Execution mode (default: --compiled). See "Execution Modes" above.
neurobrix run --model <name> --prompt <text> [--compiled | --sequential | --triton | --triton-sequential]

# Image upscaling (super-resolution)
neurobrix upscale --model <name> --input <img> --output <img> [--mode compiled|sequential|triton|triton-sequential]

# Per-family inputs:
#   llm        --prompt
#   image      --prompt [--steps --cfg --height --width --input-image --mask-image]
#   tts        --prompt --output out.wav
#   stt        --audio
#   audio_llm  --audio --prompt
#   vlm        --prompt --input-image
#   multimodal --prompt --mode text|image
#   upscaler   --input-image   (or: upscale --input)
#   video      --prompt [--num-frames --fps]

# Model management
neurobrix hub [--category IMAGE|LLM|AUDIO|VIDEO]
neurobrix import <org/name> [--no-keep] [--force]
neurobrix list [--store]
neurobrix remove <name> [--store|--all]
neurobrix clean [--store|--cache|--all] [-y]

# Inspection
neurobrix info [--models] [--hardware] [--system]
neurobrix inspect <model.nbx> [--topology] [--weights]
neurobrix validate <model.nbx> [--level deep] [--strict]

Contributing

NeuroBrix is open source under the Apache 2.0 license. Contributions are welcome.

See CONTRIBUTING.md for guidelines.


Model Licenses & Responsible Use

NeuroBrix is an inference engine — it does not create, train, or own any AI model.

All models listed in this repository are the intellectual property of their respective authors. NeuroBrix converts published model weights into the .nbx container format for efficient execution. The original model licenses remain in full effect.

User responsibilities:

  • Review the license of each model before downloading or using it
  • Non-commercial models (e.g., CC-BY-NC-SA-4.0) may not be used for commercial purposes
  • Gated models on Hugging Face require explicit license acceptance before access
  • Redistribution of model weights is governed by each model's license, not by NeuroBrix's license
  • You are solely responsible for ensuring your use complies with the applicable model license

NeuroBrix Hub (neurobrix.es):

The NeuroBrix Hub hosts pre-built .nbx packages for convenience. These packages contain model weights in their original precision, repackaged in the NBX container format. All models on the hub are sourced from publicly available releases with permissive or open licenses. If you are a model author and believe your work is hosted in violation of your license terms, please contact us at legal@neurobrix.es for immediate removal.


License

NeuroBrix Engine — Apache License 2.0

Copyright 2025-2026 Hocine Benkelaya

NeuroBrix is managed by WizWorks OÜ, a property of Neural Networks Holding LTD.

The Apache 2.0 license covers the NeuroBrix engine, CLI, runtime, and NBX format tooling. It does not cover the model weights executed by the engine — those are governed by their respective licenses as listed in the Supported Models section.

See LICENSE for the full text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neurobrix-0.2.1.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neurobrix-0.2.1-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file neurobrix-0.2.1.tar.gz.

File metadata

  • Download URL: neurobrix-0.2.1.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for neurobrix-0.2.1.tar.gz
Algorithm Hash digest
SHA256 0524eec1297b850c167aa189cc2b4bcb749ff0784bc058a9ef46b6b0253abbb6
MD5 395bbfa7390e3eed5052126a3067a622
BLAKE2b-256 f426ee0f5bca61685a494c203da089da35289ea4fa921520a292dc9ae5bf5f68

See more details on using hashes here.

Provenance

The following attestation bundles were made for neurobrix-0.2.1.tar.gz:

Publisher: publish.yml on NeuroBrix/neurobrix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file neurobrix-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: neurobrix-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for neurobrix-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 990f7dfa9535192d2006d2837312961e0eb66f9f92e5350f62f322edab11f4b3
MD5 64de55f8dc1d9777a4112f78bf749db2
BLAKE2b-256 5ce8d611d5b992f0ee3071d0624fec80df2f63dabb39776012efa4ae812dea45

See more details on using hashes here.

Provenance

The following attestation bundles were made for neurobrix-0.2.1-py3-none-any.whl:

Publisher: publish.yml on NeuroBrix/neurobrix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page