Universal Deep Learning Inference Engine — execute any AI model without model-specific code
Project description
NeuroBrix
Universal Deep Learning Inference Engine
One engine. Any model. Any modality. Zero model-specific code.
Hub | Docs | PyPI | GitLab | Roadmap | Contributing
The Problem
The AI inference landscape is fragmented. Every model family requires its own stack, its own pipeline code, its own deployment tooling. Want to run a diffusion model? Learn ComfyUI or write custom diffusers pipelines. Need an LLM? Pick between Ollama, vLLM, llama.cpp — each with its own limitations. Audio? Video? Start from scratch.
NeuroBrix eliminates this fragmentation entirely.
One engine. One CLI. One container format. Import a model, run it. The runtime doesn't know or care whether it's executing a diffusion transformer, a mixture-of-experts LLM, a speech recognizer, or a video generator. It sees tensors, graphs, and execution plans — nothing else.
Why NeuroBrix?
| Capability | Ollama | llama.cpp | vLLM | ComfyUI | NeuroBrix |
|---|---|---|---|---|---|
| LLMs | Yes | Yes | Yes | -- | Yes |
| Image generation | -- | -- | -- | Yes | Yes |
| Video generation | -- | -- | -- | -- | Yes |
| Audio (STT + TTS) | -- | -- | -- | -- | Yes |
| Multimodal (understand + generate) | -- | -- | -- | -- | Yes |
| Mixture-of-Experts | -- | -- | Yes | -- | Yes |
| Multi-GPU auto-allocation | -- | -- | Yes | -- | Yes |
| Cross-platform (Linux, Windows, macOS) | Yes | Yes | -- | -- | Yes |
| Universal model format | -- | GGUF (LLM only) | -- | -- | NBX (any model) |
| No model-specific code | -- | -- | -- | -- | Yes |
Other tools solve one piece of the puzzle. NeuroBrix solves the whole puzzle.
Installation
Step 1: Install PyTorch with CUDA
# For CUDA 12.4 (RTX 30xx, 40xx, A100, H100)
pip install torch --index-url https://download.pytorch.org/whl/cu124
# For CUDA 12.1
pip install torch --index-url https://download.pytorch.org/whl/cu121
# For CUDA 11.8 (older GPUs like V100)
pip install torch --index-url https://download.pytorch.org/whl/cu118
Verify CUDA is available:
python -c "import torch; print(torch.cuda.is_available())" # Should print: True
Step 2: Install NeuroBrix
pip install neurobrix
Platform Support
| Platform | GPU Support | Notes |
|---|---|---|
| Linux | CUDA, Triton kernels | Full support, recommended for production |
| Windows | CUDA | Fully supported. Triton not available on Windows |
| macOS | CPU only | MPS/Metal support planned |
Requirements: Python 3.10+ / PyTorch 2.1+ with CUDA / NVIDIA GPU
Quick Start
# Import a model from the hub
neurobrix import Vendor/Model_Name --no-keep
# Generate an image (hardware auto-detected)
neurobrix run --model Model_Name \
--prompt "A sunset over mountains" --steps 20
# Or serve for instant repeat inference
neurobrix serve --model Model_Name
neurobrix run --prompt "A robot painting on canvas" --output robot.png
neurobrix stop
Serve Mode (Hot Run Mode , Recommended)
Loads weights into VRAM once and keeps the model warm. Every subsequent request runs with zero startup overhead.
neurobrix serve --model Model_Name
# Image generation (instant — model already loaded)
neurobrix run --prompt "A cat in a hat" --output cat.png
# LLM interactive chat
neurobrix chat --temperature 0.7
# Stop and free VRAM
neurobrix stop
Usage by Model Family
Each model family uses different CLI flags and defaults. Hardware is always auto-detected.
Image Generation
neurobrix run --model Sana_1600M_4Kpx_BF16 \
--prompt "A sunset over mountains" \
--steps 20 --cfg 5.0 --seed 42 \
--height 1024 --width 1024 \
--output sunset.png
| Flag | Description | Default |
|---|---|---|
--prompt |
Text description of the image to generate | Required |
--steps |
Number of diffusion steps (more = higher quality, slower) | Model-dependent (20-50) |
--cfg |
Classifier-free guidance scale (higher = closer to prompt) | Model-dependent (4.5-7.5) |
--height / --width |
Output resolution in pixels | Model-dependent (1024-4096) |
--seed |
Random seed for reproducible results | Random |
--output |
Output file path | output.png |
Large Language Models
# Single-shot
neurobrix run --model deepseek-moe-16b-chat \
--prompt "Explain quantum computing in simple terms" \
--temperature 0.7 --max-tokens 512 \
--output response.txt
# Interactive chat (requires serve mode)
neurobrix serve --model deepseek-moe-16b-chat
neurobrix chat --temperature 0.7
neurobrix stop
| Flag | Description | Default |
|---|---|---|
--prompt |
Input text or question | Required |
--temperature |
Sampling randomness (0 = deterministic, 1 = creative) | Model-dependent (0.6-1.0) |
--max-tokens |
Maximum tokens to generate | Model-dependent (512-32768) |
--repetition-penalty |
Penalize repeated tokens (1.0 = off) | 1.0 |
--output |
Save response to file | stdout |
Audio — Speech-to-Text (STT)
neurobrix run --model whisper-large --audio recording.wav
| Flag | Description | Default |
|---|---|---|
--audio |
Path to audio file (WAV, FLAC, MP3) | Required |
--output |
Save transcription to file | stdout |
Note: STT models use
--audio, not--prompt. Temperature defaults to 0.0 (greedy decoding) for accurate transcription.
Audio — Text-to-Speech (TTS)
neurobrix run --model Kokoro-82M \
--prompt "Hello, welcome to NeuroBrix!" \
--output speech.wav
| Flag | Description | Default |
|---|---|---|
--prompt |
Text to synthesize into speech | Required |
--temperature |
Sampling variation (lower = more consistent) | Model-dependent (0.6) |
--output |
Output audio file path | output.wav |
Video Generation
neurobrix run --model SANA-Video_2B_720p \
--prompt "A cat playing piano" \
--steps 30 --cfg 5.0 --seed 42 \
--output video.mp4
| Flag | Description | Default |
|---|---|---|
--prompt |
Text description of the video to generate | Required |
--steps |
Number of diffusion steps | Model-dependent (20-50) |
--cfg |
Guidance scale | Model-dependent (5.0) |
--seed |
Random seed | Random |
--output |
Output file path | output.mp4 |
NeuroBrix Hub & Model Management
Models are hosted on the NeuroBrix Hub and managed locally through a two-tier storage system:
- Store (
~/.neurobrix/store/) — downloaded.nbxarchives (compressed) - Cache (
~/.neurobrix/cache/) — extracted models ready for inference
Browse & Import
# Browse the full hub catalog
neurobrix hub
# Filter by family
neurobrix hub --category IMAGE
neurobrix hub --category LLM
neurobrix hub --category AUDIO
neurobrix hub --category VIDEO
# Search by name
neurobrix hub --search sana
# Import a model (downloads .nbx → extracts to cache)
neurobrix import vendor/model_name
# Import and delete the .nbx archive to save disk space
neurobrix import pixart/sigma-xl-1024 --no-keep
# Force re-import (overwrites existing)
neurobrix import Vendor/Model_Name --force
List & Manage
# List installed models in cache (ready to run)
neurobrix list
# List downloaded .nbx archives in store
neurobrix list --store
# Show system info: installed models, hardware, disk usage
neurobrix info --models
# Remove a model from cache
neurobrix remove Model_Name
# Remove from both store and cache
neurobrix remove Model_Name --all
# Clean everything — free all disk space
neurobrix clean --all -y
How It Works
neurobrix import Vendor/Model_Name --no-keep
│
├─ 1. Download .nbx from neurobrix.es → ~/.neurobrix/store/
├─ 2. Extract to ~/.neurobrix/cache/Model_Name/
├─ 3. Validate manifest, components, weights
└─ 4. Delete .nbx from store (--no-keep)
neurobrix run --model Model_Name --prompt "..."
│
└─ Reads directly from cache — zero extraction overhead
Supported Models
NeuroBrix is a runtime engine — it executes models but does not train or create them. All models listed below are the work of their respective authors and are subject to their original licenses. Users must review and accept each model's license before use.
Image Generation
| Model | Author | License | Size |
|---|---|---|---|
| Sana 1600M 4K | NVIDIA / MIT | Apache 2.0 | 12 GB |
| PixArt-Sigma-XL-2-1024-MS | PixArt | OpenRAIL++ | 20 GB |
| PixArt-XL-2-1024-MS | PixArt | OpenRAIL++ | 20 GB |
| Flex.1-alpha | Ostris | Apache 2.0 | 24 GB |
| Janus-Pro-7B | DeepSeek | MIT | 14 GB |
Video Generation
| Model | Author | License | Size |
|---|---|---|---|
| SANA-Video 2B 720p | NVIDIA / MIT | Apache 2.0 | 17 GB |
Audio (Speech-to-Text + Text-to-Speech)
| Model | Author | License | Size | Type |
|---|---|---|---|---|
| Whisper Large | OpenAI | MIT | 6 GB | STT |
| Whisper Large V3 Turbo | OpenAI | MIT | 3 GB | STT |
| Parakeet TDT 1.1B | NVIDIA | CC-BY-4.0 | 4 GB | STT |
| Canary-Qwen 2.5B | NVIDIA | CC-BY-4.0 | 10 GB | STT |
| Voxtral Mini 3B | Mistral AI | Apache 2.0 | 7 GB | STT |
| Orpheus 3B | Canopy Labs | Apache 2.0 | 7 GB | TTS |
| Kokoro 82M | Hexgrad | Apache 2.0 | 0.3 GB | TTS |
| VibeVoice 1.5B | Will Held | Apache 2.0 | 6 GB | TTS |
| OpenAudio S1 Mini | Fish Audio | CC-BY-NC-SA-4.0 | 2 GB | TTS |
| Chatterbox | Resemble AI | MIT | 1 GB | TTS |
Large Language Models
| Model | Author | License | Size |
|---|---|---|---|
| DeepSeek-MoE-16B | DeepSeek | MIT | 31 GB |
| Qwen3-30B-A3B-Thinking | Alibaba / Qwen | Apache 2.0 | 57 GB |
| TinyLlama 1.1B | TinyLlama | Apache 2.0 | 4 GB |
Non-commercial: OpenAudio S1 Mini uses CC-BY-NC-SA-4.0 — non-commercial use only. Check each model's license on the NeuroBrix Hub before commercial deployment.
Browse the full catalog and license details: neurobrix.es/models
The NBX Format
NeuroBrix introduces .nbx — a universal container format for AI models. Where GGUF is limited to LLMs and ONNX struggles with dynamic architectures, NBX captures any computation graph with full fidelity.
model.nbx (self-contained archive)
├── manifest.json Model metadata and component list
├── topology.json Execution flow and component connections
├── runtime/
│ ├── defaults.json Generation parameters, model config
│ └── variables.json Runtime tensor allocation rules
├── components/
│ ├── text_encoder/ Text conditioning (CLIP, T5, etc.)
│ │ ├── graph.json Computation graph (TensorDAG)
│ │ ├── profile.json Component config
│ │ └── weights/ Safetensors shards
│ ├── transformer/ Core model (DiT, UNet, decoder, etc.)
│ │ ├── graph.json
│ │ ├── profile.json
│ │ └── weights/
│ ├── vae/ Image/video decoder (diffusion models)
│ │ ├── graph.json
│ │ ├── profile.json
│ │ └── weights/
│ └── ... Any number of components per model
└── modules/
└── tokenizer/ Tokenizer files
The component structure adapts to each model: diffusion models have text_encoder + transformer + vae, LLMs have model + lm_head, audio models have encoder + decoder, etc.
What makes NBX different:
- Framework-independent — no dependency on PyTorch, TensorFlow, or any framework at runtime interpretation level
- Self-describing — the container carries everything needed to execute
- Modality-agnostic — the same format works for diffusion, LLMs, MoE, audio, video, and any future architecture
- Deterministic — the execution graph is fully resolved at build time
Prism: Automatic Hardware Allocation
You describe your hardware. NeuroBrix figures out the rest. Hardware is auto-detected — the --hardware flag is optional.
| Strategy | Description |
|---|---|
single_gpu |
Model fits entirely in one GPU |
single_gpu_lifecycle |
Components loaded/unloaded sequentially |
pipeline_parallel |
Per-layer sequential fill across GPUs |
block_scatter |
Block-level distribution across GPUs |
weight_sharding |
Weight-file distribution across GPUs |
lazy_sequential |
Stream components through limited VRAM |
zero3 |
CPU offload with GPU compute |
GPU support: NVIDIA, AMD, Intel, Apple (planned), plus Tenstorrent, Moore Threads, Biren, Iluvatar, Hygon DCU, Cambricon detection.
Architecture
.nbx Container ──> Prism Solver ──> Execution Plan ──> CompiledSequence ──> Output
(hardware) (strategy) (zero-overhead)
The runtime compiles the entire execution graph at load time into a CompiledSequence — a zero-overhead execution path with pre-resolved tensor slots, automatic mixed precision, direct SDPA calls, and integer-indexed memory arena. No dict lookups per step. No interpretation overhead.
Roadmap
Done
- CompiledSequence — zero-overhead graph execution engine
- Prism solver — automatic multi-GPU hardware allocation (7 strategies)
- Image family — 6 diffusion models (PixArt, Sana, Flex, Janus)
- LLM family — MoE (DeepSeek), dense (TinyLlama, Qwen3)
- Audio family — 11 models, 5 flow handlers (STT + TTS)
- Video family — SANA-Video 720p (first of 10 planned)
- Cross-platform — Linux, Windows, macOS support
- Hardware auto-detection — 10 GPU vendors, CPU-only fallback
- Persistent serving — warm daemon with chat interface
- DtypeEngine — automatic mixed precision (AMP)
- TilingEngine — universal spatial tiling for large inputs
- NBX Hub — model registry at neurobrix.es
Next
- Video family expansion — remaining 9 models (Wan2.1, CogVideoX, Allegro, Mochi, Open-Sora)
- Vision-Language Models — multimodal understanding at scale
- Quantization — INT8/INT4 with NBX-native support
- Apple Silicon — Metal/MPS backend
- Upscalers — super-resolution models
- 3D generation — mesh and NeRF models
- Embeddings — text and image embedding models
- NeuroBrix Studio — desktop GUI for model management
CLI Reference
# Serving (recommended) — hardware auto-detected
neurobrix serve --model <name>
neurobrix chat [--temperature T] [--max-tokens N]
neurobrix run --prompt <text> [--output file] [--steps N] [--cfg F] [--seed N]
neurobrix stop
# Single-shot — hardware auto-detected
neurobrix run --model <name> --prompt <text> [options]
# Model management
neurobrix hub [--category IMAGE|LLM|AUDIO|VIDEO]
neurobrix import <org/name> [--no-keep] [--force]
neurobrix list [--store]
neurobrix remove <name> [--store|--all]
neurobrix clean [--store|--cache|--all] [-y]
# Inspection
neurobrix info [--models] [--hardware] [--system]
neurobrix inspect <model.nbx> [--topology] [--weights]
neurobrix validate <model.nbx> [--level deep] [--strict]
Contributing
NeuroBrix is open source under the Apache 2.0 license. Contributions are welcome.
See CONTRIBUTING.md for guidelines.
Model Licenses & Responsible Use
NeuroBrix is an inference engine — it does not create, train, or own any AI model.
All models listed in this repository are the intellectual property of their respective authors. NeuroBrix converts published model weights into the .nbx container format for efficient execution. The original model licenses remain in full effect.
User responsibilities:
- Review the license of each model before downloading or using it
- Non-commercial models (e.g., CC-BY-NC-SA-4.0) may not be used for commercial purposes
- Gated models on Hugging Face require explicit license acceptance before access
- Redistribution of model weights is governed by each model's license, not by NeuroBrix's license
- You are solely responsible for ensuring your use complies with the applicable model license
NeuroBrix Hub (neurobrix.es):
The NeuroBrix Hub hosts pre-built .nbx packages for convenience. These packages contain model weights in their original precision, repackaged in the NBX container format. All models on the hub are sourced from publicly available releases with permissive or open licenses. If you are a model author and believe your work is hosted in violation of your license terms, please contact us at legal@neurobrix.es for immediate removal.
License
NeuroBrix Engine — Apache License 2.0
Copyright 2025-2026 Hocine Benkelaya
NeuroBrix is managed by WizWorks OÜ, a property of Neural Networks Holding LTD.
The Apache 2.0 license covers the NeuroBrix engine, CLI, runtime, and NBX format tooling. It does not cover the model weights executed by the engine — those are governed by their respective licenses as listed in the Supported Models section.
See LICENSE for the full text.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file neurobrix-0.1.1.tar.gz.
File metadata
- Download URL: neurobrix-0.1.1.tar.gz
- Upload date:
- Size: 2.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0dc386bf6519fbb3dd07c06e37b4411389e5d43dc35743b8eba2c172663f2423
|
|
| MD5 |
6fc48d027439b9e305757e02f7516049
|
|
| BLAKE2b-256 |
3b2f1a938aaa20f4a42513d0f73494d991ace68fd4d3879c5eddc9cffc665e3a
|
File details
Details for the file neurobrix-0.1.1-py3-none-any.whl.
File metadata
- Download URL: neurobrix-0.1.1-py3-none-any.whl
- Upload date:
- Size: 3.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d14727253a3cfa94ee366c4172ca2ad1333eca71e5848cb51068e524adff341
|
|
| MD5 |
393cb2d8a02ce533292fc28ba2e94ec2
|
|
| BLAKE2b-256 |
10cd314015334dc90ba537ea730b64e29d2c4dc79f1c489333df27b10a47d928
|