麻瓜语音
Project description
MUGGLE-SPEECH - 麻瓜中文语音识别
注:MUGGLE-OCR当初是配套 Captcha-Trainer 项目的SDK调用,经过几轮的重构和迭代,新的框架已经准备好了,将在未来不久的某一天,重新归来。
MUGGLE-SPEECH 是基于 Transformer 的端到端语音识别模型,采用ONNX部署方案。目前在语音识别领域只是试试水,毕竟样本受限,目前免费白嫖到的公开样本来自:AISHELL-1, AISHELL-3, MAGICDATA,尝试申请AISHELL-2,很难过被告知不对个人提供开源。十分欢迎愿意贡献数据集的朋友们,希望能够给社区尽一份绵薄之力。
训练相关代码将整合到 MUGGLE-DL 框架中再开源,初次接触语音识别,还有很多不足的地方,将会慢慢改进。
以下是Python-SDK调用方法
import time
from muggle_speech import MuggleSpeech
sdk = MuggleSpeech(mode='wave')
# 文件格式必须是 wav 格式
if __name__ == '__main__':
for i in range(1000):
st = time.time()
# 从 bytes 打开
# wav_bytes = open(r"test.wav", "rb").read()
# inputs = sdk.from_bytes(wav_bytes)
# 从 文件 打开
inputs = sdk.from_file(r"test.wav")
predict_text = sdk.predict_file(inputs)
print(predict_text, time.time() - st)
测试结果:
CPU 识别12个字大概 80-100毫秒 左右。
安装命令
pip install muggle-speech
交流群
857149419 (1群,已满)
934889548(2群)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
muggle_speech-0.1.1.tar.gz
(65.4 MB
view details)
File details
Details for the file muggle_speech-0.1.1.tar.gz
.
File metadata
- Download URL: muggle_speech-0.1.1.tar.gz
- Upload date:
- Size: 65.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 587ea4e57d81a3f280c86becd4373766076e3181c55df97e60e293885835c656 |
|
MD5 | efbf19935a90be16d41d2d7766229b3c |
|
BLAKE2b-256 | e9f37d5d3862afff11e1e1b4f58cb8cfebc47c2d30faf484ced6a686d32a5f43 |