ailia AI Speech

Project description

ailia AI Speech Python API

!! CAUTION !! “ailia” IS NOT OPEN SOURCE SOFTWARE (OSS). As long as user complies with the conditions stated in License Document, user may use the Software for free of charge, but the Software is basically paid software.

About ailia AI Speech

ailia AI Speech is a library to perform speech recognition using AI. It provides a C API for native applications, as well as a C# API well suited for Unity applications. Using ailia AI Speech, you can easily integrate AI powered speech recognition into your applications.

Install from pip

You can install the ailia AI Speech free evaluation package with the following command.

pip3 install ailia_speech

Install from package

You can install the ailia AI Speech from Package with the following command.

python3 bootstrap.py
pip3 install .

Usage

Batch mode

In batch mode, the entire audio is transcribed at once.

import ailia_speech

import librosa

import os
import urllib.request

# Load target audio
input_file_path = "demo.wav"
if not os.path.exists(input_file_path):
	urllib.request.urlretrieve(
		"https://github.com/ailia-ai/ailia-models/raw/refs/heads/master/audio_processing/whisper/demo.wav",
		"demo.wav"
	)
audio_waveform, sampling_rate = librosa.load(input_file_path, mono = True)

# Model Initialize
speech = ailia_speech.Whisper()
model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO

# When using sensevoice
#speech = ailia_speech.SenseVoice()
#model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_SENSEVOICE_SMALL

# Infer
speech.initialize_model(model_path = "./models/", model_type = model_type)
recognized_text = speech.transcribe(audio_waveform, sampling_rate)
for text in recognized_text:
	print(text)

Step mode

In step mode, the audio is input in chunks and transcribed sequentially.

import ailia_speech

import librosa

import os
import urllib.request

# Load target audio
input_file_path = "demo.wav"
if not os.path.exists(input_file_path):
	urllib.request.urlretrieve(
		"https://github.com/ailia-ai/ailia-models/raw/refs/heads/master/audio_processing/whisper/demo.wav",
		"demo.wav"
	)
audio_waveform, sampling_rate = librosa.load(input_file_path, mono = True)

# Infer
speech = ailia_speech.Whisper()
speech.initialize_model(model_path = "./models/", model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO)
speech.set_silent_threshold(silent_threshold = 0.5, speech_sec = 1.0, no_speech_sec = 0.5)
for i in range(0, audio_waveform.shape[0], sampling_rate):
	complete = False
	if i + sampling_rate >= audio_waveform.shape[0]:
		complete = True
	recognized_text = speech.transcribe_step(audio_waveform[i:min(audio_waveform.shape[0], i + sampling_rate)], sampling_rate, complete)
	for text in recognized_text:
		print(text)

Dialization mode

By specifying dialization_type, speaker diarization can be performed. When speaker diarization is enabled, speaker_id becomes valid.

speech.initialize_model(model_path = "./models/", model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO, diarization_type = ailia_speech.AILIA_SPEECH_DIARIZATION_TYPE_PYANNOTE_AUDIO)

Available model types

It is possible to select multiple models according to accuracy and speed. LARGE_V3_TURBO is the most recommended.

Whisper

ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_TINY
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_BASE
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_SMALL
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_MEDIUM
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO

SenseVoice

ailia_speech.AILIA_SPEECH_MODEL_TYPE_SENSEVOICE_SMALL

Available vad versions

By default, version "4" of SileroVAD is used. The version can be specified from "4", "5", "6", and "6_2".

speech.initialize_model(model_path = "./models/", vad_type = AILIA_SPEECH_VAD_TYPE_SILERO, vad_version = "6_2")

API specification

https://github.com/ailia-ai/ailia-sdk

Project details

Release history Release notifications | RSS feed

This version

1.5.3

Jul 10, 2026

1.5.2

Jan 20, 2026

1.5.1

Jan 7, 2026

1.5.0.1

Jan 1, 2026

1.5.0

Dec 25, 2025

1.4.0.1

Sep 18, 2025

1.4.0

Sep 15, 2025

1.3.2.3

Jan 5, 2025

1.3.2.2

Dec 2, 2024

1.3.2.1

Nov 1, 2024

1.3.2.0

Nov 1, 2024

1.3.1.1

Oct 12, 2024

1.3.1.0

Oct 12, 2024

1.3.0.5

Sep 25, 2024

1.3.0.4

Sep 25, 2024

1.3.0.3

Sep 25, 2024

1.3.0.2

Sep 25, 2024

1.3.0.1

Sep 24, 2024

1.3.0.0

Sep 24, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ailia_speech-1.5.3.tar.gz (1.5 MB view details)

Uploaded Jul 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ailia_speech-1.5.3-py3-none-any.whl (1.5 MB view details)

Uploaded Jul 10, 2026 Python 3

File details

Details for the file ailia_speech-1.5.3.tar.gz.

File metadata

Download URL: ailia_speech-1.5.3.tar.gz
Upload date: Jul 10, 2026
Size: 1.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.13

File hashes

Hashes for ailia_speech-1.5.3.tar.gz
Algorithm	Hash digest
SHA256	`b8f2ecd7e660211c3a17e4483d16bdfc69981869628c0f2b29d8377f51a4c786`
MD5	`cdeb661a28f382052cb75f173a717472`
BLAKE2b-256	`fbd38c414a706d13277f8a0c519b814614df11cf3f55b8278713483e1d94e19e`

See more details on using hashes here.

File details

Details for the file ailia_speech-1.5.3-py3-none-any.whl.

File metadata

Download URL: ailia_speech-1.5.3-py3-none-any.whl
Upload date: Jul 10, 2026
Size: 1.5 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.13

File hashes

Hashes for ailia_speech-1.5.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8f9f03626457033abf4b99ea382a5bcdfbe355eb4d055630bbfbdcc7d9f213c8`
MD5	`369dfc1a67b95a656d22c481463ffe7c`
BLAKE2b-256	`17a361ee2e9cef64587733108b93ce53f8700080bedbe28fc54447c80f5597d3`

See more details on using hashes here.

ailia-speech 1.5.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

ailia AI Speech Python API

About ailia AI Speech

Install from pip

Install from package

Usage

Batch mode

Step mode

Dialization mode

Available model types

Available vad versions

API specification

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes