Package to utilize the speech to text API powered by AILabs.tw
Project description
AILabs ASR Python software development kit
Development Environment
- Python 3.9
# install portaudio first if you develop on MAC OS X
brew install portaudio
pip install --global-option='build_ext' --global-option='-I/usr/local/include' --global-option='-L/usr/local/lib' -r requirements_dev.txt
# please check PyAudio site: https://people.csail.mit.edu/hubert/pyaudio/
# if you encouter some issues while installing PyAudio
Installation
pip install ailabs-asr
Samples
# init the streaming client
asr_client = StreamingClient('api-key-applied-from-devconsole')
# start streaming with wav file
asr_client.start_streaming_wav(
pipeline='asr-zh-en-std',
file='voice.wav'
verbose=False, # enable verbose to show detailed recognition result
on_processing_sentence=on_processing_sentence,
on_final_sentence=on_final_sentence)
# without file to start streaming with the computer's microphone
asr_client.start_streaming_wav(
pipeline='asr-zh-en-std',
on_processing_sentence=on_processing_sentence,
on_final_sentence=on_final_sentence)
:bulb: start_streaming_wav() method allow users to provide callback function to handle the recognition result see the result format below
:bulb: lookup the available pipelines in the next section
:bulb: see more samples in the sample respository
Support Language(pipeline)
| pipeline | Info | language |
|---|---|---|
| asr-zh-en-std | Use it when speakers speak Chinese more than English | Mandarin and English |
| asr-zh-tw-std | Use it when speakers speak Chinese and Taiwanese. | Mandarin and Taiwanese |
| asr-en-std | English | English |
| asr-jp-std | Japanese | Japanese |
Message Format
There are 2 kinds of recognized result:
The Processing Sentence(Segment)
{
"asr_sentence": "範例句子"
}
The Final Sentence(Complete Sentence)
{
"asr_final": true,
"asr_begin_time": 9.314,
"asr_end_time": 11.314,
"asr_sentence": "完整的範例句子",
"asr_confidence": 0.5263263653207881,
"asr_word_time_stamp": [
{
"word": "完整的",
"begin_time": 9.74021875,
"end_time": 10.100875
},
{
"word": "範例句子",
"begin_time": 10.100875,
"end_time": 10.1664375
}
],
"text_segmented": "完整的 範例句子"
}
Limitation
Audio Data
:warning: Send audio data with binary frame with following spec:
- Audio data format
- 16kHz, mono
- 16 bits per sample
- PCM
- Sample rate per secs: 16K(16000)
- Sample sizes per sec: 16000(samples) x 1(sec) x 16/8(2 bytes) = 32000 bytes ~= 32 KB(/sec)
- Each chunk size: 2000 bytes, 1/16 secs
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ailabs-asr-0.1.0.tar.gz.
File metadata
- Download URL: ailabs-asr-0.1.0.tar.gz
- Upload date:
- Size: 20.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18dea95b985ec2f0ea0dbf75ebe47170b61423b78506ce051e9a5379429b1ff2
|
|
| MD5 |
b0147b22ec590f92d618242fba168171
|
|
| BLAKE2b-256 |
4c2aa66cfc171917a52279296452fe52c1ed9d1046951cec53bf523c64fe9095
|
File details
Details for the file ailabs_asr-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ailabs_asr-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
752522dd307eda55eebbacaa586dad250ebef2b8da751a5529d13ed7ec1ee326
|
|
| MD5 |
996976cafa2eee7df7a072de8723a84b
|
|
| BLAKE2b-256 |
c372317f09b546658b5056d566381d270f41c2e402eb8d26a3f427e96707fe56
|