ailia AI Speech
Project description
ailia AI Speech Python API
!! CAUTION !! “ailia” IS NOT OPEN SOURCE SOFTWARE (OSS). As long as user complies with the conditions stated in License Document, user may use the Software for free of charge, but the Software is basically paid software.
About ailia AI Speech
ailia AI Speech is a library to perform speech recognition using AI. It provides a C API for native applications, as well as a C# API well suited for Unity applications. Using ailia AI Speech, you can easily integrate AI powered speech recognition into your applications.
Install from pip
You can install the ailia AI Speech free evaluation package with the following command.
pip3 install ailia_speech
Install from package
You can install the ailia AI Speech from Package with the following command.
python3 bootstrap.py
pip3 install .
Usage
Batch mode
In batch mode, the entire audio is transcribed at once.
import ailia_speech
import librosa
import os
import urllib.request
# Load target audio
input_file_path = "demo.wav"
if not os.path.exists(input_file_path):
urllib.request.urlretrieve(
"https://github.com/ailia-ai/ailia-models/raw/refs/heads/master/audio_processing/whisper/demo.wav",
"demo.wav"
)
audio_waveform, sampling_rate = librosa.load(input_file_path, mono = True)
# Model Initialize
speech = ailia_speech.Whisper()
model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO
# When using sensevoice
#speech = ailia_speech.SenseVoice()
#model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_SENSEVOICE_SMALL
# Infer
speech.initialize_model(model_path = "./models/", model_type = model_type)
recognized_text = speech.transcribe(audio_waveform, sampling_rate)
for text in recognized_text:
print(text)
Step mode
In step mode, the audio is input in chunks and transcribed sequentially.
import ailia_speech
import librosa
import os
import urllib.request
# Load target audio
input_file_path = "demo.wav"
if not os.path.exists(input_file_path):
urllib.request.urlretrieve(
"https://github.com/ailia-ai/ailia-models/raw/refs/heads/master/audio_processing/whisper/demo.wav",
"demo.wav"
)
audio_waveform, sampling_rate = librosa.load(input_file_path, mono = True)
# Infer
speech = ailia_speech.Whisper()
speech.initialize_model(model_path = "./models/", model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO)
speech.set_silent_threshold(silent_threshold = 0.5, speech_sec = 1.0, no_speech_sec = 0.5)
for i in range(0, audio_waveform.shape[0], sampling_rate):
complete = False
if i + sampling_rate >= audio_waveform.shape[0]:
complete = True
recognized_text = speech.transcribe_step(audio_waveform[i:min(audio_waveform.shape[0], i + sampling_rate)], sampling_rate, complete)
for text in recognized_text:
print(text)
Dialization mode
By specifying dialization_type, speaker diarization can be performed. When speaker diarization is enabled, speaker_id becomes valid.
speech.initialize_model(model_path = "./models/", model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO, diarization_type = ailia_speech.AILIA_SPEECH_DIARIZATION_TYPE_PYANNOTE_AUDIO)
Available model types
It is possible to select multiple models according to accuracy and speed. LARGE_V3_TURBO is the most recommended.
Whisper
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_TINY
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_BASE
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_SMALL
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_MEDIUM
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO
SenseVoice
ailia_speech.AILIA_SPEECH_MODEL_TYPE_SENSEVOICE_SMALL
Available vad versions
By default, version "4" of SileroVAD is used. The version can be specified from "4", "5", "6", and "6_2".
speech.initialize_model(model_path = "./models/", vad_type = AILIA_SPEECH_VAD_TYPE_SILERO, vad_version = "6_2")
API specification
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ailia_speech-1.5.2.tar.gz.
File metadata
- Download URL: ailia_speech-1.5.2.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40500ade8ddaee23eb7be7164a03eb9b3403bba86ddc8dad0ab3688cc0e47e18
|
|
| MD5 |
143d80fafe52829aea1af4a57014975c
|
|
| BLAKE2b-256 |
097b06b29bc2cc225ff98a45b08b8ef32efa5978aa206361ec995a9582e6144b
|
File details
Details for the file ailia_speech-1.5.2-py3-none-any.whl.
File metadata
- Download URL: ailia_speech-1.5.2-py3-none-any.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3fa3afe929c27014787940dc0c62a0a36c8a6dddd514a0997d1f2740d1ea642a
|
|
| MD5 |
9b2f0abc325a74a286fe9769d452f61a
|
|
| BLAKE2b-256 |
5a892de10f8d11392c315b035a1b2af46c7cfc4d4fa8f5987b6408122c964c09
|