Production-ready audio AI platform — ASR, TTS, Translation, Speaker Verification, Multi-GPU, VRAM Optimization

These details have not been verified by PyPI

Project links

Project description

Omni-VRAM: Zero-Copy CUDA Audio-to-LLM Bridge

零拷贝跨硬件语音大模型底层直通桥

CUDA: 11.0+ Platform: Windows/Linux Python: 3.8+

📖 Overview

Omni-VRAM is a high-performance, lightweight CUDA extension designed to eliminate VRAM fragmentation and memory transfer bottlenecks in real-time LLM (Large Language Model) audio applications.

Traditional Python-based audio processing pipelines and PyTorch native operations (such as torch.cat for KV-Cache updates) introduce significant overhead and non-deterministic latency. Omni-VRAM solves this by implementing Operator Fusion and Zero-Copy Memory Injection directly at the hardware level, enabling consumer-grade GPUs (e.g., RTX 30/40 series) to achieve sub-millisecond end-to-end latency for real-time voice agents.

✨ Core Features

Zero-Copy KV-Cache Appender: Bypasses PyTorch's dynamic memory reallocation (torch.cat) by pre-allocating continuous VRAM blocks and directly injecting hardware-level token embeddings ($O(1)$ complexity).
Fused Audio Frontend: Performs Voice Activity Detection (VAD), Pre-emphasis, and Windowing (Hann) in a single CUDA kernel execution, eliminating intermediate VRAM allocations.
Hardware-Aware Radar: Dynamically scans GPU architecture (sm_XX) and SM counts at runtime to dispatch the most optimal computation strategy.
Whisper Multi-Backend: Supports faster-whisper (CTranslate2, recommended), whisper.cpp CLI, OpenAI API, and legacy Python whisper with automatic fallback chain.
Real-Time Streaming ASR: Sliding-window VAD-based speech recognition with concurrent worker threads, partial/final result callbacks, and GPU batch acceleration.
Web API Server: FastAPI-based REST + WebSocket server for transcription — file upload, base64 input, and real-time streaming endpoint.
Stream Processing: Chunk-based audio stream processor with built-in VAD, segment extraction, and callback-driven architecture.
Audio Format Utilities: Automatic format detection, sample rate conversion, stereo-to-mono, normalization, and WAV encoding.
Noise Reduction: STFT-based spectral subtraction with adaptive noise estimation — three presets (light / medium / aggressive), automatically applied in the stream processing pipeline.
Emotion Recognition: Real-time speech emotion detection (happy / sad / angry / neutral / surprised) based on acoustic features — energy, zero-crossing rate, pitch (F0), and temporal dynamics.
Speaker Diarization: MFCC-based speaker clustering with cosine similarity — identifies "who spoke when" without external speaker embedding models.
Multi-GPU Support: Pipeline, data, and tensor parallelism with round-robin load balancing, NVLink peer detection, and collective operations.
VRAM Optimizer: KV-Cache memory management with LRU eviction, auto-recovery on OOM, memory pressure monitoring, and dynamic batch sizing.
TTS Engine: Multi-backend text-to-speech synthesis — edge-tts (online, 300+ voices) and pyttsx3 (offline, cross-platform).
Voice Translator: Speech-to-speech translation pipeline — ASR → text translation (MarianMT / Google Translate) → TTS, with 50+ language pairs.
Audio Event Detection: Real-time classification of ambient sounds (speech, music, alarm, silence, noise) using YAMNet or energy-based analysis.
gRPC Server: High-performance gRPC + HTTP REST dual-protocol API server with streaming transcription support.
Plugin System: Extensible plugin architecture with discovery, loading, lifecycle management, and hook-based event system.

📁 Project Structure

Omni-VRAM/
├── vram_hacker.cu              # CUDA kernel source (KV-Cache injection)
├── setup.py                    # Build & install script
├── test_run.py                 # Quick integration test
├── .env.example                # Configuration template
│
├── vram_core/                  # Python core library
│   ├── __init__.py             # Package exports (v0.6.0)
│   ├── config.py               # Configuration management (.env loader)
│   ├── audio_utils.py          # Audio format detection & conversion
│   ├── whisper_bridge.py       # Whisper multi-backend integration
│   ├── stream_processor.py     # Real-time stream processor + VAD
│   ├── streaming_asr.py        # Real-time streaming ASR engine
│   ├── api_server.py           # FastAPI REST + WebSocket API
│   ├── noise_reduction.py      # STFT spectral subtraction noise reduction
│   ├── emotion_recognition.py  # Acoustic feature-based emotion recognition
│   ├── speaker_diarization.py  # MFCC speaker diarization & clustering
│   ├── multi_gpu.py            # Multi-GPU management & parallelism
│   ├── vram_optimizer.py       # KV-Cache VRAM optimization & OOM recovery
│   ├── tts_engine.py           # Multi-backend text-to-speech (edge-tts / pyttsx3)
│   ├── voice_translator.py     # Speech-to-speech translation pipeline
│   ├── audio_event_detection.py # Audio event detection (YAMNet / energy-based)
│   ├── grpc_server.py          # gRPC + HTTP REST dual-protocol server
│   └── plugin_manager.py       # Plugin discovery, loading & lifecycle
│
├── examples/                   # Example applications
│   ├── realtime_voice_assistant.py  # Real-time voice assistant
│   ├── meeting_transcriber.py       # Meeting transcription & summary
│   ├── voice_chat_bot.py            # Multi-turn voice chat bot
│   ├── benchmark_suite.py           # Performance benchmark suite
│   ├── api_demo.py                  # API server demo client
│   ├── test_whisper_local.py        # Whisper local test script
│   ├── test_emotion.py              # Emotion recognition test
│   └── test_tts_translator.py       # TTS & translator test
│
├── tests/                      # Unit tests
│   ├── test_audio_utils.py
│   ├── test_whisper_bridge.py
│   ├── test_stream_processor.py
│   ├── test_noise_reduction.py
│   ├── test_emotion_recognition.py
│   └── test_speaker_diarization.py
│
└── docs/                       # Documentation
    ├── installation.md
    ├── quickstart.md
    ├── api_reference.md
    ├── examples.md
    └── faq.md

🧩 Examples

Example	Description	Command
Real-time Voice Assistant	Microphone → VAD → Whisper → Display, with file recording	`python examples/realtime_voice_assistant.py`
Meeting Transcriber	Long-form recording with silence auto-segmentation and export	`python examples/meeting_transcriber.py --output meeting.txt`
Voice Chat Bot	Multi-turn dialogue with history tracking and LLM-ready architecture	`python examples/voice_chat_bot.py`
Benchmark Suite	Performance testing for all modules with Markdown report	`python examples/benchmark_suite.py --skip-whisper`
TTS & Translation	Text-to-speech and speech-to-speech translation test	`python examples/test_tts_translator.py`
Emotion Recognition	Speech emotion analysis demo	`python examples/test_emotion.py`

📊 Performance Benchmarks

Hardware: NVIDIA RTX 3060 (12GB) | Platform: Windows WDDM | CUDA: 12.1

1. KV-Cache Memory Injection

Task: Appending 100 updates (50 tokens each) to a 100,000-capacity KV-Cache tensor (Dimension: 4096).

Engine / Method	Latency	Complexity	OOM Risk
PyTorch Native (`torch.cat`)	90.32 ms	$O(N)$ (Reallocation)	High (VRAM Fragmentation)
Omni-VRAM (Zero-Copy)	8.07 ms	$O(1)$ (Pointer Offset)	None
Improvement	11.19x	-	-

2. Audio Processing Pipeline

Pipeline Stage	Input Size	PyTorch / CPU Baseline	Omni-VRAM C++ Kernel	Speedup
Concurrent VAD	10 Minutes (16kHz)	9.45 ms (CPU `unfold`)	0.33 ms	~28x
Fused Frontend	60 Seconds (16kHz)	20.33 ms (VRAM Stacking)	1.05 ms	~19x

3. Whisper Transcription (CPU)

Model	1s Audio	5s Audio	10s Audio
tiny	~200ms	~500ms	~900ms
base	~400ms	~1200ms	~2200ms

Run python examples/benchmark_suite.py for automated benchmarks on your hardware.

🛠️ Installation

# Clone the repository
git clone https://github.com/Liangchenxu/Omni-VRAM.git
cd Omni-VRAM

# Install all dependencies (core + audio + faster-whisper)
pip install -r requirements.txt

# Build and install the CUDA extension
# Note: Ensure NVCC and Visual Studio C++ Build Tools are properly configured.
python setup.py install

# (Optional) Install Web API server dependencies
pip install fastapi uvicorn python-multipart

# (Optional) Install whisper.cpp for local transcription
# See docs/installation.md for detailed instructions

Configuration

# Copy the configuration template
cp .env.example .env

# Edit .env with your settings
# At minimum, set WHISPER_CPP_PATH and WHISPER_MODEL_PATH for local transcription

See docs/installation.md for detailed installation guide.

💻 Quick Start

Basic CUDA Operations

import torch
import vram_core

# 1. Hardware Initialization
print(vram_core.scan_hardware_dna())

# 2. Fused Audio Processing
audio_stream = torch.randn(960000, device='cuda', dtype=torch.float32)
# Performs VAD, pre-emphasis, and windowing in ~1 ms
is_speaking, features = vram_core.smart_audio_listen(audio_stream, threshold=0.5)

# 3. Zero-Copy LLM KV-Cache Update
hidden_dim = 4096
max_seq_len = 100000
# Pre-allocate VRAM once
kv_cache = torch.zeros((max_seq_len, hidden_dim), device='cuda', dtype=torch.float32)
current_pos = torch.tensor([0], device='cuda', dtype=torch.int32)

if is_speaking.item():
    # Direct memory injection (0 reallocation overhead)
    new_tokens = torch.randn((50, hidden_dim), device='cuda', dtype=torch.float32)
    vram_core.append_to_kv_cache(kv_cache, new_tokens, current_pos)

Whisper Transcription

from vram_core import WhisperBridge, WhisperBackend

# Initialize with automatic backend detection
whisper = WhisperBridge(
    backend=WhisperBackend.AUTO,
    whisper_model="base",
    language="zh",
)

# Transcribe an audio file
result = whisper.transcribe("audio.wav")
print(f"Text: {result.text}")
print(f"Confidence: {result.confidence}")
print(f"Duration: {result.audio_duration}s")

Real-Time Stream Processing

import numpy as np
from vram_core import StreamProcessor, StreamConfig, WhisperBridge, WhisperBackend

# Initialize components
whisper = WhisperBridge(backend=WhisperBackend.AUTO, whisper_model="base")
config = StreamConfig(sample_rate=16000, chunk_duration_ms=100, vad_threshold=0.02)
processor = StreamProcessor(config=config, whisper_bridge=whisper)

# Set up callbacks
processor.on_transcription = lambda result: print(f"Transcribed: {result.text}")

# Feed audio chunks (e.g., from microphone)
audio_chunk = np.random.randn(1600).astype(np.float32)
processor.feed(audio_chunk)

Streaming ASR (Real-time Microphone Transcription)

import numpy as np
from vram_core import WhisperBridge, WhisperBackend, StreamASR, StreamASRConfig

# Initialize whisper
whisper = WhisperBridge(backend=WhisperBackend.AUTO, whisper_model="base")

# Configure streaming ASR
config = StreamASRConfig(
    sample_rate=16000,
    vad_threshold=0.015,
    language="zh",
)
asr = StreamASR(config=config, whisper_bridge=whisper)

# Set up callbacks
asr.on_partial_result = lambda text: print(f"[Partial] {text}")
asr.on_final_result = lambda result: print(f"[Final] {result.text}")

# Start and feed audio
asr.start()
audio_chunk = np.random.randn(3200).astype(np.float32)  # from microphone
asr.feed(audio_chunk)

Web API Server

# Start the API server
python vram_core/api_server.py --model base --language zh --port 8000

# Client: File upload transcription
import requests
with open("audio.wav", "rb") as f:
    resp = requests.post("http://localhost:8000/transcribe", files={"file": f})
    print(resp.json()["text"])

# Client: WebSocket streaming
import websockets, asyncio
async def stream():
    async with websockets.connect("ws://localhost:8000/stream") as ws:
        await ws.send(audio_bytes)  # 16-bit PCM, 16kHz mono
        result = await ws.recv()
        print(result)

See docs/quickstart.md for more examples.

⚠️ Disclaimer & Liability Waiver

Hardware Interaction Warning: Omni-VRAM interfaces directly with physical GPU hardware at the CUDA C++ level, employing aggressive zero-copy pointer manipulation to maximize throughput. While extensively tested, this software is provided "as is", without warranty of any kind. The authors shall NOT be held liable for any kernel panics, system freezes, data loss, or hardware instability resulting from the use of this engine. Use in production environments at your own risk.

📜 License

Released under the MIT License. You are free to use, modify, and distribute this software in both commercial and non-commercial projects, provided that the original copyright notice and this permission notice are included.

📖 简介 (Overview)

Omni-VRAM 是一款高性能、轻量级的 CUDA 底层扩展库，专为解决大语言模型（LLM）实时语音应用中的显存碎片化与数据搬运瓶颈而设计。

传统的基于 Python 的音频处理流以及 PyTorch 原生操作（例如使用 torch.cat 更新 KV-Cache）会引发严重的内存重新分配开销和不可控的延迟。Omni-VRAM 通过在硬件底层实现算子融合（Operator Fusion）与零拷贝内存注入（Zero-Copy Memory Injection），使得消费级显卡（如 RTX 30/40 系列）能够为实时语音助手提供亚毫秒级的端到端计算延迟。

✨ 核心特性

零拷贝 KV-Cache 注入器: 完全绕过 PyTorch 的动态内存分配（torch.cat），通过预分配连续的物理显存块，以硬件指针偏移的方式直接写入 Token 向量（$O(1)$ 时间复杂度）。
融合音频前处理核心: 在单一 CUDA 核函数中并行完成语音活动检测（VAD）、预加重（Pre-emphasis）与汉宁窗（Hann Window）处理，彻底消除中间显存开销。
跨硬件自适应雷达: 运行时动态扫描 GPU 架构（sm_XX）与流处理器簇（SM）数量，自动调度最优级别的计算策略。
Whisper 语音转写集成: 多后端支持——faster-whisper（CTranslate2，推荐）、whisper.cpp 命令行、OpenAI API、Python whisper 库，自动回退链。
实时流式语音识别: 基于滑动窗口 VAD 的流式 ASR，支持并发 Worker 线程、部分/最终结果回调、GPU 批处理加速。
Web API 服务: 基于 FastAPI 的 REST + WebSocket 转写服务——文件上传、Base64 输入、实时流式端点。
实时流处理引擎: 基于分块的音频流处理器，内置 VAD 检测、语音片段提取，支持回调驱动架构。
音频格式工具链: 自动格式检测、采样率转换、立体声转单声道、归一化、WAV 编码。
噪声抑制: 基于 STFT 的谱减法噪声抑制，自适应噪声估计——三档预设（轻度/中度/强力），自动集成到流处理管线中。
情绪识别: 实时语音情绪检测（开心/悲伤/愤怒/中性/惊讶），基于声学特征——能量、过零率、基频（F0）及时序动态。
说话人识别: 基于 MFCC 的说话人聚类，余弦相似度匹配——无需外部模型即可识别"谁在什么时候说话"。
多 GPU 支持: 流水线并行、数据并行、张量并行，轮询负载均衡，NVLink 对端检测，集合通信操作。
显存优化器: KV-Cache 显存管理，LRU 淘汰策略，OOM 自动恢复，显存压力监控，动态批处理大小调整。
语音合成引擎: 多后端 TTS——edge-tts（在线，300+ 语音）和 pyttsx3（离线，跨平台）。
语音翻译: 语音到语音翻译管线——ASR → 文本翻译（MarianMT / Google Translate） → TTS，支持 50+ 语言对。
音频事件检测: 环境声音实时分类（语音/音乐/警报/静音/噪声），基于 YAMNet 或能量分析。
gRPC 服务: gRPC + HTTP REST 双协议 API 服务器，支持流式转写。
插件系统: 可扩展插件架构，支持插件发现、加载、生命周期管理和钩子事件系统。

📁 目录结构

Omni-VRAM/
├── vram_hacker.cu              # CUDA 核函数源码（KV-Cache 注入）
├── setup.py                    # 编译安装脚本
├── test_run.py                 # 快速集成测试
├── .env.example                # 配置模板
│
├── vram_core/                  # Python 核心库
│   ├── __init__.py             # 包导出（v0.6.0）
│   ├── config.py               # 配置管理（.env 加载）
│   ├── audio_utils.py          # 音频格式检测与转换
│   ├── whisper_bridge.py       # Whisper 多后端集成
│   ├── stream_processor.py     # 实时流处理器 + VAD
│   ├── streaming_asr.py        # 实时流式语音识别引擎
│   ├── api_server.py           # FastAPI REST + WebSocket API
│   ├── noise_reduction.py      # STFT 谱减法噪声抑制
│   ├── emotion_recognition.py  # 声学特征情绪识别
│   ├── speaker_diarization.py  # MFCC 说话人识别与聚类
│   ├── multi_gpu.py            # 多 GPU 管理与并行
│   ├── vram_optimizer.py       # KV-Cache 显存优化与 OOM 恢复
│   ├── tts_engine.py           # 多后端语音合成（edge-tts / pyttsx3）
│   ├── voice_translator.py     # 语音到语音翻译管线
│   ├── audio_event_detection.py # 音频事件检测（YAMNet / 能量分析）
│   ├── grpc_server.py          # gRPC + HTTP REST 双协议服务器
│   └── plugin_manager.py       # 插件发现、加载与生命周期管理
│
├── examples/                   # 示例应用
│   ├── realtime_voice_assistant.py  # 实时语音助手
│   ├── meeting_transcriber.py       # 会议录音转写与摘要
│   ├── voice_chat_bot.py            # 多轮语音对话机器人
│   ├── benchmark_suite.py           # 性能基准测试套件
│   ├── api_demo.py                  # API 服务演示客户端
│   ├── test_whisper_local.py        # Whisper 本地测试
│   ├── test_emotion.py              # 情绪识别测试
│   └── test_tts_translator.py       # 语音合成与翻译测试
│
├── tests/                      # 单元测试
│   ├── test_audio_utils.py
│   ├── test_whisper_bridge.py
│   ├── test_stream_processor.py
│   ├── test_noise_reduction.py
│   ├── test_emotion_recognition.py
│   └── test_speaker_diarization.py
│
└── docs/                       # 文档
    ├── installation.md
    ├── quickstart.md
    ├── api_reference.md
    ├── examples.md
    └── faq.md

🧩 示例项目

示例	说明	运行命令
实时语音助手	麦克风 → VAD → Whisper → 显示，支持文件录制	`python examples/realtime_voice_assistant.py`
会议录音转写	长时间录音，自动静音分段，导出文字记录	`python examples/meeting_transcriber.py --output meeting.txt`
语音对话机器人	多轮对话，对话历史追踪，LLM 可接入架构	`python examples/voice_chat_bot.py`
性能基准测试	全模块性能测试，自动生成 Markdown 报告	`python examples/benchmark_suite.py --skip-whisper`
语音合成与翻译	TTS 语音合成和语音到语音翻译测试	`python examples/test_tts_translator.py`
情绪识别	语音情绪分析演示	`python examples/test_emotion.py`

📊 性能基准测试 (Benchmarks)

硬件环境: NVIDIA RTX 3060 (12GB) | 平台: Windows WDDM | CUDA 版本: 12.1

1. KV-Cache 显存注入

任务：在一个容量为 100,000、维度为 4096 的 KV-Cache 张量中，连续追加 100 次（每次 50 个 token）的新特征。

引擎 / 方法	延迟	复杂度	爆显存 (OOM) 风险
PyTorch 原生 (`torch.cat`)	90.32 ms	$O(N)$ (显存重新分配)	极高 (显存碎片化)
Omni-VRAM (零拷贝)	8.07 ms	$O(1)$ (底层指针偏移)	无
性能提升	11.19 倍	-	-

2. 音频处理管线

管线阶段	输入数据规模	PyTorch / CPU 基准线	Omni-VRAM C++ 算子	加速比
并发 VAD 检测	10 分钟 (16kHz)	9.45 ms (CPU `unfold`)	0.33 ms	约 28 倍
融合特征提取	60 秒 (16kHz)	20.33 ms (VRAM 堆叠)	1.05 ms	约 19 倍

3. Whisper 语音转写 (CPU)

模型	1 秒音频	5 秒音频	10 秒音频
tiny	~200ms	~500ms	~900ms
base	~400ms	~1200ms	~2200ms

运行 python examples/benchmark_suite.py 在你的硬件上进行自动化基准测试。

🛠️ 安装 (Installation)

# 克隆项目仓库
git clone https://github.com/Liangchenxu/Omni-VRAM.git
cd Omni-VRAM

# 安装所有依赖（核心 + 音频 + faster-whisper）
pip install -r requirements.txt

# 编译并安装 CUDA 扩展模块
# 注意：请确保已正确配置 NVCC 与 Visual Studio C++ 编译工具
python setup.py install

# (可选) 安装 Web API 服务依赖
pip install fastapi uvicorn python-multipart

# (可选) 安装 whisper.cpp 用于本地语音转写
# 详见 docs/installation.md

配置文件

# 复制配置模板
cp .env.example .env

# 编辑 .env 文件设置你的配置
# 至少需要设置 WHISPER_CPP_PATH 和 WHISPER_MODEL_PATH 用于本地转写

详细安装指南请参阅 docs/installation.md。

💻 快速开始 (Quick Start)

基本 CUDA 操作

import torch
import vram_core

# 1. 硬件底层雷达初始化
print(vram_core.scan_hardware_dna())

# 2. 算子融合音频处理
audio_stream = torch.randn(960000, device='cuda', dtype=torch.float32)
# 1毫秒内并发完成 VAD 检测、预加重与加窗
is_speaking, features = vram_core.smart_audio_listen(audio_stream, threshold=0.5)

# 3. 零拷贝大模型 KV-Cache 更新
hidden_dim = 4096
max_seq_len = 100000
# 仅进行一次物理显存预分配
kv_cache = torch.zeros((max_seq_len, hidden_dim), device='cuda', dtype=torch.float32)
current_pos = torch.tensor([0], device='cuda', dtype=torch.int32)

if is_speaking.item():
    # 物理级显存直通注入（0 内存重新分配开销）
    new_tokens = torch.randn((50, hidden_dim), device='cuda', dtype=torch.float32)
    vram_core.append_to_kv_cache(kv_cache, new_tokens, current_pos)

Whisper 语音转写

from vram_core import WhisperBridge, WhisperBackend

# 自动后端检测初始化
whisper = WhisperBridge(
    backend=WhisperBackend.AUTO,
    whisper_model="base",
    language="zh",
)

# 转写音频文件
result = whisper.transcribe("audio.wav")
print(f"文本: {result.text}")
print(f"置信度: {result.confidence}")
print(f"时长: {result.audio_duration}秒")

实时流处理

import numpy as np
from vram_core import StreamProcessor, StreamConfig, WhisperBridge, WhisperBackend

# 初始化组件
whisper = WhisperBridge(backend=WhisperBackend.AUTO, whisper_model="base")
config = StreamConfig(sample_rate=16000, chunk_duration_ms=100, vad_threshold=0.02)
processor = StreamProcessor(config=config, whisper_bridge=whisper)

# 设置回调
processor.on_transcription = lambda result: print(f"转写结果: {result.text}")

# 喂入音频分块（如来自麦克风）
audio_chunk = np.random.randn(1600).astype(np.float32)
processor.feed(audio_chunk)

实时流式语音识别 (Streaming ASR)

import numpy as np
from vram_core import WhisperBridge, WhisperBackend, StreamASR, StreamASRConfig

# 初始化 Whisper
whisper = WhisperBridge(backend=WhisperBackend.AUTO, whisper_model="base")

# 配置流式 ASR
config = StreamASRConfig(
    sample_rate=16000,
    vad_threshold=0.015,
    language="zh",
)
asr = StreamASR(config=config, whisper_bridge=whisper)

# 设置回调
asr.on_partial_result = lambda text: print(f"[部分] {text}")
asr.on_final_result = lambda result: print(f"[最终] {result.text}")

# 启动并喂入音频
asr.start()
audio_chunk = np.random.randn(3200).astype(np.float32)  # 来自麦克风
asr.feed(audio_chunk)

Web API 服务

# 启动 API 服务
python vram_core/api_server.py --model base --language zh --port 8000

# 客户端：文件上传转写
import requests
with open("audio.wav", "rb") as f:
    resp = requests.post("http://localhost:8000/transcribe", files={"file": f})
    print(resp.json()["text"])

# 客户端：WebSocket 流式转写
import websockets, asyncio
async def stream():
    async with websockets.connect("ws://localhost:8000/stream") as ws:
        await ws.send(audio_bytes)  # 16-bit PCM, 16kHz 单声道
        result = await ws.recv()
        print(result)

更多示例请参阅 docs/quickstart.md。

⚠️ 免责声明 (Disclaimer)

硬件交互警告： Omni-VRAM 在 CUDA C++ 级别直接与物理 GPU 硬件交互，并采用激进的零拷贝指针操作以压榨极限吞吐量。尽管经过了测试，但本软件按**"原样 (as is)"**提供，不作任何形式的担保。对于因使用本底层引擎而导致的任何内核崩溃、系统死锁、数据丢失或硬件不稳定，作者概不负责。在生产环境中使用本软件，请自行承担一切风险。

📜 协议 (License)

🤝 贡献指南 (Contributing)

我们欢迎任何形式的贡献！

Fork 本仓库
创建你的特性分支：git checkout -b feature/amazing-feature
提交你的修改：git commit -m 'feat: add amazing feature'
推送到分支：git push origin feature/amazing-feature
提交 Pull Request

请确保：

所有单元测试通过：pytest tests/ -v
新功能附带相应的测试用例
遵循项目代码风格

详细信息请参阅 CHANGELOG.md 了解版本历史，docs/faq.md 了解常见问题。

⭐ Star 历史

⬆ 回到顶部

Made with ❤️ by Liangchenxu

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.5.0

Jun 18, 2026

2.2.1

Jun 16, 2026

2.1.1

Jun 16, 2026

2.1.0

Jun 15, 2026

2.0.0

Jun 15, 2026

1.1.0

Jun 14, 2026

This version

1.0.0

Jun 14, 2026

0.4.0

Jun 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omni_vram-1.0.0.tar.gz (110.7 kB view details)

Uploaded Jun 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

omni_vram-1.0.0-cp310-cp310-win_amd64.whl (196.2 kB view details)

Uploaded Jun 14, 2026 CPython 3.10Windows x86-64

File details

Details for the file omni_vram-1.0.0.tar.gz.

File metadata

Download URL: omni_vram-1.0.0.tar.gz
Upload date: Jun 14, 2026
Size: 110.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for omni_vram-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`366b028194da35c7ff598ffe33b4b6730f29511ef1cb0772e5720ed21abd3164`
MD5	`a15dabc48f7f6fb5d9fb3e8c9b640fda`
BLAKE2b-256	`75add48e9cf382e92575dd809e758337c30b5dc84f23b345b75e81d64a1abcaf`

See more details on using hashes here.

File details

Details for the file omni_vram-1.0.0-cp310-cp310-win_amd64.whl.

File metadata

Download URL: omni_vram-1.0.0-cp310-cp310-win_amd64.whl
Upload date: Jun 14, 2026
Size: 196.2 kB
Tags: CPython 3.10, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for omni_vram-1.0.0-cp310-cp310-win_amd64.whl
Algorithm	Hash digest
SHA256	`720703fa3d41e0190ee14bcb9980698047cc14ff0211d732eb2485ca002ced39`
MD5	`e618fee25084a4ce0723c63fc4537a0e`
BLAKE2b-256	`14e5a387df64069f8923a9dae2c4e564d13e43ef6c04a84a3db48b7e1c6509cf`

See more details on using hashes here.

omni-vram 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Omni-VRAM: Zero-Copy CUDA Audio-to-LLM Bridge

零拷贝跨硬件语音大模型底层直通桥

📖 Overview

✨ Core Features

📁 Project Structure

🧩 Examples

📊 Performance Benchmarks

1. KV-Cache Memory Injection

2. Audio Processing Pipeline

3. Whisper Transcription (CPU)

🛠️ Installation

Configuration

💻 Quick Start

Basic CUDA Operations

Whisper Transcription

Real-Time Stream Processing

Streaming ASR (Real-time Microphone Transcription)

Web API Server

⚠️ Disclaimer & Liability Waiver

📜 License

📖 简介 (Overview)

✨ 核心特性

📁 目录结构

🧩 示例项目

📊 性能基准测试 (Benchmarks)

1. KV-Cache 显存注入

2. 音频处理管线

3. Whisper 语音转写 (CPU)

🛠️ 安装 (Installation)

配置文件

💻 快速开始 (Quick Start)

基本 CUDA 操作

Whisper 语音转写

实时流处理

实时流式语音识别 (Streaming ASR)

Web API 服务

⚠️ 免责声明 (Disclaimer)

📜 协议 (License)

🤝 贡献指南 (Contributing)

⭐ Star 历史

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes