Automatic speech recognition toolkit on Pytorch

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- Chinese (Simplified)
Operating System
- OS Independent
Programming Language
Topic
- Utilities

Project description

MASR流式与非流式语音识别

python version GitHub forks GitHub Repo stars GitHub 支持系统

MASR是一款基于Pytorch实现的自动语音识别框架，MASR全称是神奇的自动语音识别框架（Magical Automatic Speech Recognition），MASR致力于简单，实用的语音识别项目。可部署在服务器，Nvidia Jetson设备，未来还计划支持Android等移动设备。

欢迎大家扫码入QQ群讨论，或者直接搜索QQ群号1169600237，问题答案为博主Github的IDyeyupiaoling。

本项目使用的环境：

Anaconda 3
Python 3.7
Pytorch 1.10.0
Windows 10 or Ubuntu 18.04

更新记录

2022.08.27: 修改使用kaldi实现fbank和mfcc预处理方法。
2022.08.22: 增加非流式模型deepspeech2_no_stream和deepspeech2_big_no_stream。
2022.08.04: 发布1.0版本，优化实时识别流程。
2022.07.12: 完成GUI界面的录音实时识别。
2022.06.14: 支持deepspeech2_big模型，适合WenetSpeech大数据集训练模型。
2022.01.16: 支持多种预处理方法。
2022.01.15: 支持英文语音识别。
2022.01.13: 支持给识别结果加标点符号
2021.12.26: 支持pip方式安装。
2021.12.25: 初步完成基本程序。

模型下载

本项目支持流式识别模型deepspeech2、deepspeech2_big，非流式模型deepspeech2_no_stream、deepspeech2_big_no_stream。

使用模型	数据集	预处理方式	语言	测试集字错率（词错率）	下载地址
deepspeech2_big	WenetSpeech (10000小时)	fbank	中文		点击下载
deepspeech2	aishell (179小时)	fbank	中文	0.07321	点击下载
deepspeech2_big	aishell (179小时)	fbank	中文	0.04879*	点击下载
deepspeech2_no_stream	aishell (179小时)	fbank	中文	0.06518	点击下载
deepspeech2_big_no_stream	aishell (179小时)	fbank	中文		点击下载
deepspeech2	aishell (179小时)	linear	中文	0.07991	点击下载
deepspeech2_big	aishell (179小时)	linear	中文	0.09148	点击下载
deepspeech2_no_stream	aishell (179小时)	linear	中文	0.06865	点击下载
deepspeech2_big_no_stream	aishell (179小时)	linear	中文		点击下载
deepspeech2	Librispeech (960小时)	fbank	英文		点击下载
deepspeech2_big	Librispeech (960小时)	fbank	英文		点击下载
deepspeech2_no_stream	Librispeech (960小时)	fbank	英文		点击下载
deepspeech2_big_no_stream	Librispeech (960小时)	fbank	英文		点击下载
deepspeech2	超大数据集(1600多小时真实数据)+(1300多小时合成数据)	linear	中文	0.06215	点击下载(需要重新导出模型)
deepspeech2_big	超大数据集(1600多小时真实数据)+(1300多小时合成数据)	linear	中文	0.05517	先`star`项目再点击下载

说明：

这里字错率是使用eval.py程序并使用集束搜索解码ctc_beam_search方法计算得到的。
中文解码参数为：alpha=2.2，beta=4.3，beam_size=300，cutoff_prob=0.99，cutoff_top_n=40。
英文解码参数为：alpha=1.9，beta=0.3，beam_size=500，cutoff_prob=1.0，cutoff_top_n=40。
除了aishell数据集按照数据集本身划分的训练数据和测试数据，其他的都是按照项目设置的固定比例划分训练数据和测试数据。
下载的压缩文件已经包含了mean_std.npz和vocabulary.txt，需要把解压得到的全部文件复制到项目根目录下。
模型名称包含no_stream为非流式模型，不能用于流式识别。
带有*的使用了WenetSpeech作为预训练模型。
由于算力不足，大部分的模型都没有训练足够轮数，有算力的同学，欢迎提供模型。

有问题欢迎提 issue 交流

文档教程

快速预测

下载作者提供的模型或者训练模型，然后执行导出模型，使用infer_path.py预测音频，通过参数--wav_path指定需要预测的音频路径，完成语音识别，详情请查看模型部署。

python infer_path.py --wav_path=./dataset/test.wav

输出结果：

-----------  Configuration Arguments -----------
alpha: 1.2
beam_size: 10
beta: 0.35
cutoff_prob: 1.0
cutoff_top_n: 40
decoding_method: ctc_greedy
enable_mkldnn: False
is_long_audio: False
lang_model_path: ./lm/zh_giga.no_cna_cmn.prune01244.klm
mean_std_path: ./dataset/mean_std.npz
model_dir: ./models/infer/
to_an: True
use_gpu: True
use_tensorrt: False
vocab_path: ./dataset/zh_vocab.txt
wav_path: ./dataset/test.wav
------------------------------------------------
消耗时间：132, 识别结果: 近几年不但我用书给女儿儿压岁也劝说亲朋不要给女儿压岁钱而改送压岁书, 得分: 94

长语音预测

python infer_path.py --wav_path=./dataset/test_vad.wav --is_long_audio=True

Web部署

录音测试页面

GUI界面部署

GUI界面

参考资料

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- Chinese (Simplified)
Operating System
- OS Independent
Programming Language
Topic
- Utilities

Release history Release notifications | RSS feed

2.3.8

May 1, 2024

2.3.7

Apr 27, 2024

2.3.6

Sep 7, 2023

2.3.5

Mar 22, 2023

2.3.4

Mar 17, 2023

2.3.3

Feb 21, 2023

2.3.2

Feb 19, 2023

2.3.1

Feb 15, 2023

2.3.0

Jan 28, 2023

2.2.0

Jan 16, 2023

2.1.1

Jan 3, 2023

2.1.0

Dec 29, 2022

2.0.1

Dec 3, 2022

2.0.0

Dec 3, 2022

1.2.1

Oct 26, 2022

1.1.9

Oct 12, 2022

1.1.8

Oct 11, 2022

1.1.7

Oct 9, 2022

1.1.6

Oct 2, 2022

1.1.5

Sep 25, 2022

1.1.4

Sep 18, 2022

1.1.3

Sep 11, 2022

This version

1.1.2

Sep 3, 2022

1.1.0

Aug 23, 2022

1.0.0

Aug 4, 2022

0.1.7

Aug 3, 2022

0.1.6

Jun 14, 2022

0.1.5

Mar 2, 2022

0.1.4

Jan 20, 2022

0.1.3

Jan 13, 2022

0.1.2

Jan 10, 2022

0.1.1

Jan 9, 2022

0.1.0

Dec 26, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

masr-1.1.2-py3-none-any.whl (70.0 kB view hashes)

Uploaded Sep 3, 2022 Python 3

Hashes for masr-1.1.2-py3-none-any.whl

Hashes for masr-1.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`67978d8920bd18e668579e8a3a9010421cd01e0561cf186f076c51664629ff1b`
MD5	`60fbf4a1e2ffcce86f8ecbcf4ac6631c`
BLAKE2b-256	`5d547ef57aaede0117729ce0d496fd536b5d934b98787cd3a5ef8b88a033b804`