音频下载 + 语音转文字 CLI 工具,支持 B站/YouTube 等视频平台及直链音频,中文简繁体可选
Project description
audio-to-text-cli
音频下载 + 语音转文字命令行工具。
给一个链接,自动下载音频并转录为文字。支持多语言自动识别,中文支持简繁体切换。
支持的音频来源
| 来源 | 示例 |
|---|---|
| Bilibili | https://www.bilibili.com/video/BVxxxxx |
| YouTube | https://www.youtube.com/watch?v=xxxxx |
| 抖音 | https://www.douyin.com/video/xxxxx |
| TikTok | https://www.tiktok.com/@user/video/xxxxx |
| Twitter/X | https://x.com/user/status/xxxxx |
| Vimeo | https://vimeo.com/xxxxx |
| 直链音频 | https://example.com/audio.mp3 |
| 其他 | 任何 yt-dlp 支持的平台 |
系统要求
- Python 3.10+(唯一前置条件)
- 无需单独安装 FFmpeg — 依赖包自带
- 无需 C/C++ 编译器 — 所有依赖均有预编译包
支持的操作系统
| 平台 | 架构 | 支持 |
|---|---|---|
| Windows | x86-64 | ✅ |
| Linux | x86-64 | ✅ |
| Linux | ARM64 | ✅ |
| macOS | Intel (x86-64) | ✅ |
| macOS | Apple Silicon (M系列) | ✅ |
GPU 加速(可选)
默认使用 CPU 运行,无需额外配置。如需 GPU 加速,需安装:
- NVIDIA CUDA 12
- cuDNN 9 for CUDA 12
安装
从 PyPI 安装:
pip install audio-to-text-cli
安装后如果提示 audio-to-text 命令找不到,可以用 python -m audio_to_text.cli 代替,或将 Python Scripts 目录加入系统 PATH。
或从源码安装:
git clone https://github.com/afuaide/audio-to-text-cli.git
cd audio-to-text-cli
pip install -e .
使用方法
安装后会注册 audio-to-text 命令:
# 基本用法 — 自动检测语言,中文默认输出简体
audio-to-text "https://www.bilibili.com/video/BV1wqLV6JEga/"
# 用较小的模型(更快,精度稍低)
audio-to-text -m small "https://www.youtube.com/watch?v=xxxxx"
# 指定语言(跳过自动检测,更快)
audio-to-text -l zh "URL"
# 输出到文件
audio-to-text -o result.txt "URL"
# 中文输出繁体
audio-to-text -c traditional "URL"
# 中文不做简繁转换,保留模型原始输出
audio-to-text -c raw "URL"
# 保留下载的音频文件
audio-to-text --keep-audio "URL"
# 指定音频下载目录
audio-to-text -d ./downloads "URL"
也可以通过 python -m 运行:
python -m audio_to_text.cli "URL"
参数说明
| 参数 | 说明 | 默认值 |
|---|---|---|
url |
音频URL或视频平台链接 | 必填 |
-m, --model |
Whisper 模型大小: tiny / base / small / medium / large-v3 |
large-v3 |
-l, --language |
语言代码 (zh 中文, en 英语, ja 日语, ko 韩语 等) |
自动检测 |
-c, --chinese |
中文输出格式: simplified(简体) / traditional(繁体) / raw(不转换) |
simplified |
-o, --output |
转录结果保存路径 | 终端输出 |
-d, --download-dir |
音频下载目录 | 临时目录 |
--keep-audio |
保留下载的音频文件 | 否 |
模型选择
| 模型 | 大小 | 速度 | 精度 | 建议场景 |
|---|---|---|---|---|
tiny |
~75MB | 极快 | 一般 | 快速预览 |
base |
~150MB | 很快 | 尚可 | 日常使用 |
small |
~500MB | 快 | 较好 | 推荐起步 |
medium |
~1.5GB | 中等 | 很好 | 精度优先 |
large-v3 |
~3GB | 较慢 | 最佳 | 追求最高精度 |
首次使用某个模型会自动从 HuggingFace 下载,之后会缓存到本地。
依赖说明
| 依赖 | 作用 |
|---|---|
yt-dlp |
从视频平台提取音频 |
faster-whisper |
语音转文字引擎(本地离线运行) |
requests |
下载直链音频 |
imageio-ffmpeg |
自带 ffmpeg,用于音频格式转换 |
opencc-python-reimplemented |
中文简繁体转换 |
所有依赖通过 pip install 安装,无需额外安装系统软件。
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file audio_to_text_cli-1.0.1.tar.gz.
File metadata
- Download URL: audio_to_text_cli-1.0.1.tar.gz
- Upload date:
- Size: 7.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1bd08cf57d7d06b316d5ceb4bcc225972a3f9b8241f25800809900e53eaf1aa0
|
|
| MD5 |
f7c782f7fcc3fe5a251d3aa2b833a90a
|
|
| BLAKE2b-256 |
55498b06af449a512f58e535566eab555e35c2a59fe3de68f40e1e0c3b80b961
|
Provenance
The following attestation bundles were made for audio_to_text_cli-1.0.1.tar.gz:
Publisher:
publish.yml on afuaide/audio-to-text
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
audio_to_text_cli-1.0.1.tar.gz -
Subject digest:
1bd08cf57d7d06b316d5ceb4bcc225972a3f9b8241f25800809900e53eaf1aa0 - Sigstore transparency entry: 1590648747
- Sigstore integration time:
-
Permalink:
afuaide/audio-to-text@7d0b0118c898e7dca0743dc9acf5d84db0f3451b -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/afuaide
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7d0b0118c898e7dca0743dc9acf5d84db0f3451b -
Trigger Event:
release
-
Statement type:
File details
Details for the file audio_to_text_cli-1.0.1-py3-none-any.whl.
File metadata
- Download URL: audio_to_text_cli-1.0.1-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c874ebf441183e1835337645b91979b95d1480cfd3eb4e3a7c3e76eb6b38a318
|
|
| MD5 |
2f8ce305837f05d82797329f9f0b2ee1
|
|
| BLAKE2b-256 |
4c9a25fef9bd2302b822135bafa3fad12607ffe9ac3c716af5f781322be1a1f8
|
Provenance
The following attestation bundles were made for audio_to_text_cli-1.0.1-py3-none-any.whl:
Publisher:
publish.yml on afuaide/audio-to-text
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
audio_to_text_cli-1.0.1-py3-none-any.whl -
Subject digest:
c874ebf441183e1835337645b91979b95d1480cfd3eb4e3a7c3e76eb6b38a318 - Sigstore transparency entry: 1590648801
- Sigstore integration time:
-
Permalink:
afuaide/audio-to-text@7d0b0118c898e7dca0743dc9acf5d84db0f3451b -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/afuaide
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7d0b0118c898e7dca0743dc9acf5d84db0f3451b -
Trigger Event:
release
-
Statement type: