GLM-ASR - All-in-One Speech Recognition Service based on GLM-ASR-Nano
Project description
GLM-ASR
All-in-One Speech Recognition Service based on GLM-ASR-Nano
Web UI • REST API • SSE Streaming • Swagger Docs
🖥️ Screenshot
✨ Features
- 🎯 High Accuracy - Based on GLM-ASR-Nano-2512 (1.5B), outperforms Whisper V3
- 🌍 17 Languages - Chinese, English, Cantonese, Japanese, Korean, and more
- 🎤 Long Audio - VAD smart segmentation for unlimited audio length
- 🚀 SSE Streaming - Real-time progress and results for long audio
- 🖥️ Web UI - Modern dark-mode interface with 4 language support
- 🔌 REST API - Full API with Swagger documentation
- 💾 GPU Management - Manual load/unload for memory control
- 🐳 Docker Ready - One-command deployment with pre-loaded model
🚀 Quick Start
Docker (Recommended)
docker run -d --gpus all -p 7860:7860 neosun/glm-asr:v2.0.1
Access:
- Web UI: http://localhost:7860
- Swagger Docs: http://localhost:7860/docs
- ReDoc: http://localhost:7860/redoc
Docker Compose
git clone https://github.com/neosun100/glm-asr.git
cd glm-asr
docker compose up -d
📖 API Reference
Base URL
http://localhost:7860
Endpoints
Health Check
GET /health
{"status": "ok", "model_loaded": true}
Transcribe (Sync) - For short audio
POST /api/transcribe
Content-Type: multipart/form-data
| Parameter | Type | Default | Description |
|---|---|---|---|
| file | File | required | Audio file (wav/mp3/flac/m4a/ogg/webm) |
| max_new_tokens | int | 512 | Max output tokens (1-2048) |
curl -X POST http://localhost:7860/api/transcribe \
-F "file=@audio.mp3" \
-F "max_new_tokens=512"
{"status": "success", "text": "Transcribed text here..."}
Transcribe (SSE Stream) - For long audio
POST /api/transcribe/stream
Content-Type: multipart/form-data
Returns Server-Sent Events with real-time progress:
| Event Type | Description | Example |
|---|---|---|
start |
Processing started | {"type": "start"} |
progress |
Segment progress | {"type": "progress", "current": 3, "total": 10, "duration": 22.5} |
partial |
Segment result | {"type": "partial", "text": "Segment text..."} |
done |
Complete | {"type": "done", "text": "Full transcription..."} |
error |
Error occurred | {"type": "error", "message": "Error details"} |
curl -X POST http://localhost:7860/api/transcribe/stream \
-F "file=@long_audio.mp3"
GPU Status
GET /gpu/status
{
"model_loaded": true,
"device": "cuda",
"gpu_memory_used_mb": 4320.5,
"gpu_memory_total_mb": 24576.0
}
Load/Unload Model
POST /gpu/load
POST /gpu/unload
Interactive Documentation
- Swagger UI: http://localhost:7860/docs
- ReDoc: http://localhost:7860/redoc
⚙️ Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
MODEL_CHECKPOINT |
zai-org/GLM-ASR-Nano-2512 |
HuggingFace model path |
PORT |
7860 |
Service port |
HF_HOME |
/app/cache |
Model cache directory |
docker-compose.yml
services:
glm-asr:
image: neosun/glm-asr:v2.0.1
container_name: glm-asr
ports:
- "7860:7860"
volumes:
- ./cache:/app/cache
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
🏗️ Tech Stack
| Component | Technology |
|---|---|
| Model | GLM-ASR-Nano-2512 (1.5B) |
| Backend | FastAPI + Uvicorn |
| Streaming | Server-Sent Events (SSE) |
| Frontend | HTML5 + Vanilla JS |
| Container | Docker + NVIDIA CUDA |
| API Docs | Swagger / ReDoc |
📊 Benchmark
GLM-ASR-Nano achieves the lowest average error rate (4.10) among comparable models:
📝 Changelog
v2.0.1 (2025-12-28)
- ✅ Migrated to FastAPI async framework
- ✅ SSE streaming for real-time progress
- ✅ Complete Swagger API documentation
- ✅ Dual API mode: sync + streaming
- ✅ Fixed browser timeout for long audio
- ✅ Modern dark UI with progress display
v1.1.0 (2025-12-15)
- ✅ VAD smart segmentation (silero-vad)
- ✅ Support unlimited audio length
v1.0.0 (2025-12-14)
- ✅ Initial release
- ✅ Web UI with 4 language support
- ✅ REST API with Swagger docs
- ✅ Docker all-in-one image
📄 License
⭐ Star History
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iflow_mcp_neosun100_glm_asr-1.0.0.tar.gz.
File metadata
- Download URL: iflow_mcp_neosun100_glm_asr-1.0.0.tar.gz
- Upload date:
- Size: 14.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c247f4567594cc0c21fc9b3703bef1ac712fc93665cf6f80bd191d914075852
|
|
| MD5 |
4ff4578c323c62d77bdee9c9b253228c
|
|
| BLAKE2b-256 |
5186b974f58db1ff4a470a7a849de9f1bcbf33049606bd4d1f1324d7c9b9f3e1
|
File details
Details for the file iflow_mcp_neosun100_glm_asr-1.0.0-py3-none-any.whl.
File metadata
- Download URL: iflow_mcp_neosun100_glm_asr-1.0.0-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76dcd03013bdf00d9d88e477b882bb6bf1ca0f8126edb4a13ca5af44e13dfe84
|
|
| MD5 |
2bdba77d26b6cff1a1deb58dbadfaa5a
|
|
| BLAKE2b-256 |
f093bdb0452511bb4406edb400e1943dd6a82638c04b11094c182ea7f9d00e3b
|