Skip to main content

Universal Deep Learning Inference Engine — execute any AI model without model-specific code

Project description

NeuroBrix Logo

NeuroBrix

Universal Deep Learning Inference Engine
One engine. Any model. Any modality. Zero model-specific code.

PyPI Python 3.10 | 3.11 | 3.12 License GitLab NeuroBrix Hub

Hub  |  Docs  |  PyPI  |  GitLab  |  Roadmap  |  Contributing


The Problem

The AI inference landscape is fragmented. Every model family requires its own stack, its own pipeline code, its own deployment tooling. Want to run a diffusion model? Learn ComfyUI or write custom diffusers pipelines. Need an LLM? Pick between Ollama, vLLM, llama.cpp — each with its own limitations. Audio? Video? Start from scratch.

NeuroBrix eliminates this fragmentation entirely.

One engine. One CLI. One container format. Import a model, run it. The runtime doesn't know or care whether it's executing a diffusion transformer, a mixture-of-experts LLM, a speech recognizer, or a video generator. It sees tensors, graphs, and execution plans — nothing else.


Why NeuroBrix?

Capability Ollama llama.cpp vLLM ComfyUI NeuroBrix
LLMs Yes Yes Yes -- Yes
Image generation -- -- -- Yes Yes
Video generation -- -- -- -- Yes
Audio (STT + TTS) -- -- -- -- Yes
Multimodal (understand + generate) -- -- -- -- Yes
Mixture-of-Experts -- -- Yes -- Yes
Multi-GPU auto-allocation -- -- Yes -- Yes
Cross-platform (Linux, Windows, macOS) Yes Yes -- -- Yes
Universal model format -- GGUF (LLM only) -- -- NBX (any model)
No model-specific code -- -- -- -- Yes

Other tools solve one piece of the puzzle. NeuroBrix solves the whole puzzle.


Installation

Step 1: Install PyTorch with CUDA

# For CUDA 12.4 (RTX 30xx, 40xx, A100, H100)
pip install torch --index-url https://download.pytorch.org/whl/cu124

# For CUDA 12.1
pip install torch --index-url https://download.pytorch.org/whl/cu121

# For CUDA 11.8 (older GPUs like V100)
pip install torch --index-url https://download.pytorch.org/whl/cu118

Verify CUDA is available:

python -c "import torch; print(torch.cuda.is_available())"  # Should print: True

Step 2: Install NeuroBrix

pip install neurobrix

Platform Support

Platform GPU Support Notes
Linux CUDA, Triton kernels Full support, recommended for production
Windows CUDA Fully supported. Triton not available on Windows
macOS CPU only MPS/Metal support planned

Requirements: Python 3.10+ / PyTorch 2.1+ with CUDA / NVIDIA GPU


Quick Start

# Import a model from the hub
neurobrix import Vendor/Model_Name --no-keep

# Generate an image (hardware auto-detected)
neurobrix run --model Model_Name \
    --prompt "A sunset over mountains" --steps 20

# Or serve for instant repeat inference
neurobrix serve --model Model_Name
neurobrix run --prompt "A robot painting on canvas" --output robot.png
neurobrix stop

Serve Mode (Hot Run Mode , Recommended)

Loads weights into VRAM once and keeps the model warm. Every subsequent request runs with zero startup overhead.

neurobrix serve --model Model_Name

# Image generation (instant — model already loaded)
neurobrix run --prompt "A cat in a hat" --output cat.png

# LLM interactive chat
neurobrix chat --temperature 0.7

# Stop and free VRAM
neurobrix stop

Usage by Model Family

Each model family uses different CLI flags and defaults. Hardware is always auto-detected.

Image Generation

neurobrix run --model Sana_1600M_4Kpx_BF16 \
    --prompt "A sunset over mountains" \
    --steps 20 --cfg 5.0 --seed 42 \
    --height 1024 --width 1024 \
    --output sunset.png
Flag Description Default
--prompt Text description of the image to generate Required
--steps Number of diffusion steps (more = higher quality, slower) Model-dependent (20-50)
--cfg Classifier-free guidance scale (higher = closer to prompt) Model-dependent (4.5-7.5)
--height / --width Output resolution in pixels Model-dependent (1024-4096)
--seed Random seed for reproducible results Random
--output Output file path output.png

Large Language Models

# Single-shot
neurobrix run --model deepseek-moe-16b-chat \
    --prompt "Explain quantum computing in simple terms" \
    --temperature 0.7 --max-tokens 512 \
    --output response.txt

# Interactive chat (requires serve mode)
neurobrix serve --model deepseek-moe-16b-chat
neurobrix chat --temperature 0.7
neurobrix stop
Flag Description Default
--prompt Input text or question Required
--temperature Sampling randomness (0 = deterministic, 1 = creative) Model-dependent (0.6-1.0)
--max-tokens Maximum tokens to generate Model-dependent (512-32768)
--repetition-penalty Penalize repeated tokens (1.0 = off) 1.0
--output Save response to file stdout

Audio — Speech-to-Text (STT)

neurobrix run --model whisper-large --audio recording.wav
Flag Description Default
--audio Path to audio file (WAV, FLAC, MP3) Required
--output Save transcription to file stdout

Note: STT models use --audio, not --prompt. Temperature defaults to 0.0 (greedy decoding) for accurate transcription.

Audio — Text-to-Speech (TTS)

neurobrix run --model Kokoro-82M \
    --prompt "Hello, welcome to NeuroBrix!" \
    --output speech.wav
Flag Description Default
--prompt Text to synthesize into speech Required
--temperature Sampling variation (lower = more consistent) Model-dependent (0.6)
--output Output audio file path output.wav

Video Generation

neurobrix run --model SANA-Video_2B_720p \
    --prompt "A cat playing piano" \
    --steps 30 --cfg 5.0 --seed 42 \
    --output video.mp4
Flag Description Default
--prompt Text description of the video to generate Required
--steps Number of diffusion steps Model-dependent (20-50)
--cfg Guidance scale Model-dependent (5.0)
--seed Random seed Random
--output Output file path output.mp4

NeuroBrix Hub & Model Management

Models are hosted on the NeuroBrix Hub and managed locally through a two-tier storage system:

  • Store (~/.neurobrix/store/) — downloaded .nbx archives (compressed)
  • Cache (~/.neurobrix/cache/) — extracted models ready for inference

Browse & Import

# Browse the full hub catalog
neurobrix hub

# Filter by family
neurobrix hub --category IMAGE
neurobrix hub --category LLM
neurobrix hub --category AUDIO
neurobrix hub --category VIDEO

# Search by name
neurobrix hub --search sana

# Import a model (downloads .nbx → extracts to cache)
neurobrix import vendor/model_name

# Import and delete the .nbx archive to save disk space
neurobrix import pixart/sigma-xl-1024 --no-keep

# Force re-import (overwrites existing)
neurobrix import Vendor/Model_Name --force

List & Manage

# List installed models in cache (ready to run)
neurobrix list

# List downloaded .nbx archives in store
neurobrix list --store

# Show system info: installed models, hardware, disk usage
neurobrix info --models

# Remove a model from cache
neurobrix remove Model_Name

# Remove from both store and cache
neurobrix remove Model_Name --all

# Clean everything — free all disk space
neurobrix clean --all -y

How It Works

neurobrix import Vendor/Model_Name --no-keep
  │
  ├─ 1. Download .nbx from neurobrix.es → ~/.neurobrix/store/
  ├─ 2. Extract to ~/.neurobrix/cache/Model_Name/
  ├─ 3. Validate manifest, components, weights
  └─ 4. Delete .nbx from store (--no-keep)

neurobrix run --model Model_Name --prompt "..."
  │
  └─ Reads directly from cache — zero extraction overhead

Supported Models

NeuroBrix is a runtime engine — it executes models but does not train or create them. All models listed below are the work of their respective authors and are subject to their original licenses. Users must review and accept each model's license before use.

Image Generation

Model Author License Size
Sana 1600M 4K NVIDIA / MIT Apache 2.0 12 GB
PixArt-Sigma-XL-2-1024-MS PixArt OpenRAIL++ 20 GB
PixArt-XL-2-1024-MS PixArt OpenRAIL++ 20 GB
Flex.1-alpha Ostris Apache 2.0 24 GB
Janus-Pro-7B DeepSeek MIT 14 GB

Video Generation

Model Author License Size
SANA-Video 2B 720p NVIDIA / MIT Apache 2.0 17 GB

Audio (Speech-to-Text + Text-to-Speech)

Model Author License Size Type
Whisper Large OpenAI MIT 6 GB STT
Whisper Large V3 Turbo OpenAI MIT 3 GB STT
Parakeet TDT 1.1B NVIDIA CC-BY-4.0 4 GB STT
Canary-Qwen 2.5B NVIDIA CC-BY-4.0 10 GB STT
Voxtral Mini 3B Mistral AI Apache 2.0 7 GB STT
Orpheus 3B Canopy Labs Apache 2.0 7 GB TTS
Kokoro 82M Hexgrad Apache 2.0 0.3 GB TTS
VibeVoice 1.5B Will Held Apache 2.0 6 GB TTS
OpenAudio S1 Mini Fish Audio CC-BY-NC-SA-4.0 2 GB TTS
Chatterbox Resemble AI MIT 1 GB TTS

Large Language Models

Model Author License Size
DeepSeek-MoE-16B DeepSeek MIT 31 GB
Qwen3-30B-A3B-Thinking Alibaba / Qwen Apache 2.0 57 GB
TinyLlama 1.1B TinyLlama Apache 2.0 4 GB

Non-commercial: OpenAudio S1 Mini uses CC-BY-NC-SA-4.0 — non-commercial use only. Check each model's license on the NeuroBrix Hub before commercial deployment.

Browse the full catalog and license details: neurobrix.es/models


The NBX Format

NeuroBrix introduces .nbx — a universal container format for AI models. Where GGUF is limited to LLMs and ONNX struggles with dynamic architectures, NBX captures any computation graph with full fidelity.

model.nbx (self-contained archive)
  ├── manifest.json              Model metadata and component list
  ├── topology.json              Execution flow and component connections
  ├── runtime/
  │   ├── defaults.json          Generation parameters, model config
  │   └── variables.json         Runtime tensor allocation rules
  ├── components/
  │   ├── text_encoder/          Text conditioning (CLIP, T5, etc.)
  │   │   ├── graph.json         Computation graph (TensorDAG)
  │   │   ├── profile.json       Component config
  │   │   └── weights/           Safetensors shards
  │   ├── transformer/           Core model (DiT, UNet, decoder, etc.)
  │   │   ├── graph.json
  │   │   ├── profile.json
  │   │   └── weights/
  │   ├── vae/                   Image/video decoder (diffusion models)
  │   │   ├── graph.json
  │   │   ├── profile.json
  │   │   └── weights/
  │   └── ...                    Any number of components per model
  └── modules/
      └── tokenizer/             Tokenizer files

The component structure adapts to each model: diffusion models have text_encoder + transformer + vae, LLMs have model + lm_head, audio models have encoder + decoder, etc.

What makes NBX different:

  • Framework-independent — no dependency on PyTorch, TensorFlow, or any framework at runtime interpretation level
  • Self-describing — the container carries everything needed to execute
  • Modality-agnostic — the same format works for diffusion, LLMs, MoE, audio, video, and any future architecture
  • Deterministic — the execution graph is fully resolved at build time

Prism: Automatic Hardware Allocation

You describe your hardware. NeuroBrix figures out the rest. Hardware is auto-detected — the --hardware flag is optional.

Strategy Description
single_gpu Model fits entirely in one GPU
single_gpu_lifecycle Components loaded/unloaded sequentially
pipeline_parallel Per-layer sequential fill across GPUs
block_scatter Block-level distribution across GPUs
weight_sharding Weight-file distribution across GPUs
lazy_sequential Stream components through limited VRAM
zero3 CPU offload with GPU compute

GPU support: NVIDIA, AMD, Intel, Apple (planned), plus Tenstorrent, Moore Threads, Biren, Iluvatar, Hygon DCU, Cambricon detection.


Architecture

.nbx Container ──> Prism Solver ──> Execution Plan ──> CompiledSequence ──> Output
                   (hardware)       (strategy)         (zero-overhead)

The runtime compiles the entire execution graph at load time into a CompiledSequence — a zero-overhead execution path with pre-resolved tensor slots, automatic mixed precision, direct SDPA calls, and integer-indexed memory arena. No dict lookups per step. No interpretation overhead.


Roadmap

Done

  • CompiledSequence — zero-overhead graph execution engine
  • Prism solver — automatic multi-GPU hardware allocation (7 strategies)
  • Image family — 6 diffusion models (PixArt, Sana, Flex, Janus)
  • LLM family — MoE (DeepSeek), dense (TinyLlama, Qwen3)
  • Audio family — 11 models, 5 flow handlers (STT + TTS)
  • Video family — SANA-Video 720p (first of 10 planned)
  • Cross-platform — Linux, Windows, macOS support
  • Hardware auto-detection — 10 GPU vendors, CPU-only fallback
  • Persistent serving — warm daemon with chat interface
  • DtypeEngine — automatic mixed precision (AMP)
  • TilingEngine — universal spatial tiling for large inputs
  • NBX Hub — model registry at neurobrix.es

Next

  • Video family expansion — remaining 9 models (Wan2.1, CogVideoX, Allegro, Mochi, Open-Sora)
  • Vision-Language Models — multimodal understanding at scale
  • Quantization — INT8/INT4 with NBX-native support
  • Apple Silicon — Metal/MPS backend
  • Upscalers — super-resolution models
  • 3D generation — mesh and NeRF models
  • Embeddings — text and image embedding models
  • NeuroBrix Studio — desktop GUI for model management

CLI Reference

# Serving (recommended) — hardware auto-detected
neurobrix serve --model <name>
neurobrix chat [--temperature T] [--max-tokens N]
neurobrix run --prompt <text> [--output file] [--steps N] [--cfg F] [--seed N]
neurobrix stop

# Single-shot — hardware auto-detected
neurobrix run --model <name> --prompt <text> [options]

# Model management
neurobrix hub [--category IMAGE|LLM|AUDIO|VIDEO]
neurobrix import <org/name> [--no-keep] [--force]
neurobrix list [--store]
neurobrix remove <name> [--store|--all]
neurobrix clean [--store|--cache|--all] [-y]

# Inspection
neurobrix info [--models] [--hardware] [--system]
neurobrix inspect <model.nbx> [--topology] [--weights]
neurobrix validate <model.nbx> [--level deep] [--strict]

Contributing

NeuroBrix is open source under the Apache 2.0 license. Contributions are welcome.

See CONTRIBUTING.md for guidelines.


Model Licenses & Responsible Use

NeuroBrix is an inference engine — it does not create, train, or own any AI model.

All models listed in this repository are the intellectual property of their respective authors. NeuroBrix converts published model weights into the .nbx container format for efficient execution. The original model licenses remain in full effect.

User responsibilities:

  • Review the license of each model before downloading or using it
  • Non-commercial models (e.g., CC-BY-NC-SA-4.0) may not be used for commercial purposes
  • Gated models on Hugging Face require explicit license acceptance before access
  • Redistribution of model weights is governed by each model's license, not by NeuroBrix's license
  • You are solely responsible for ensuring your use complies with the applicable model license

NeuroBrix Hub (neurobrix.es):

The NeuroBrix Hub hosts pre-built .nbx packages for convenience. These packages contain model weights in their original precision, repackaged in the NBX container format. All models on the hub are sourced from publicly available releases with permissive or open licenses. If you are a model author and believe your work is hosted in violation of your license terms, please contact us at legal@neurobrix.es for immediate removal.


License

NeuroBrix Engine — Apache License 2.0

Copyright 2025-2026 Hocine Benkelaya

NeuroBrix is managed by WizWorks OÜ, a property of Neural Networks Holding LTD.

The Apache 2.0 license covers the NeuroBrix engine, CLI, runtime, and NBX format tooling. It does not cover the model weights executed by the engine — those are governed by their respective licenses as listed in the Supported Models section.

See LICENSE for the full text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neurobrix-0.1.1.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neurobrix-0.1.1-py3-none-any.whl (3.3 MB view details)

Uploaded Python 3

File details

Details for the file neurobrix-0.1.1.tar.gz.

File metadata

  • Download URL: neurobrix-0.1.1.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for neurobrix-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0dc386bf6519fbb3dd07c06e37b4411389e5d43dc35743b8eba2c172663f2423
MD5 6fc48d027439b9e305757e02f7516049
BLAKE2b-256 3b2f1a938aaa20f4a42513d0f73494d991ace68fd4d3879c5eddc9cffc665e3a

See more details on using hashes here.

File details

Details for the file neurobrix-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: neurobrix-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 3.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for neurobrix-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4d14727253a3cfa94ee366c4172ca2ad1333eca71e5848cb51068e524adff341
MD5 393cb2d8a02ce533292fc28ba2e94ec2
BLAKE2b-256 10cd314015334dc90ba537ea730b64e29d2c4dc79f1c489333df27b10a47d928

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page