Skip to main content

ailia AI Speech

Project description

ailia AI Speech Python API

!! CAUTION !! “ailia” IS NOT OPEN SOURCE SOFTWARE (OSS). As long as user complies with the conditions stated in License Document, user may use the Software for free of charge, but the Software is basically paid software.

About ailia AI Speech

ailia AI Speech is a library to perform speech recognition using AI. It provides a C API for native applications, as well as a C# API well suited for Unity applications. Using ailia AI Speech, you can easily integrate AI powered speech recognition into your applications.

Install from pip

You can install the ailia AI Speech free evaluation package with the following command.

pip3 install ailia_speech

Install from package

You can install the ailia AI Speech from Package with the following command.

python3 bootstrap.py
pip3 install ./

Usage

Batch mode

In batch mode, the entire audio is transcribed at once.

import ailia_speech

import librosa

import os
import urllib.request

# Load target audio
input_file_path = "demo.wav"
if not os.path.exists(input_file_path):
	urllib.request.urlretrieve(
		"https://github.com/axinc-ai/ailia-models/raw/refs/heads/master/audio_processing/whisper/demo.wa",
		"demo.wav"
	)
audio_waveform, sampling_rate = librosa.load(input_file_path, mono = True)

# Infer
speech = ailia_speech.Whisper()
speech.initialize_model(model_path = "./models/", model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO)
recognized_text = speech.transcribe(audio_waveform, sampling_rate)
for text in recognized_text:
	print(text)

Step mode

In step mode, the audio is input in chunks and transcribed sequentially.

import ailia_speech

import librosa

import os
import urllib.request

# Load target audio
input_file_path = "demo.wav"
if not os.path.exists(input_file_path):
	urllib.request.urlretrieve(
		"https://github.com/axinc-ai/ailia-models/raw/refs/heads/master/audio_processing/whisper/demo.wa",
		"demo.wav"
	)
audio_waveform, sampling_rate = librosa.load(input_file_path, mono = True)

# Infer
speech = ailia_speech.Whisper()
speech.initialize_model(model_path = "./models/", model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO)
speech.set_silent_threshold(silent_threshold = 0.5, speech_sec = 1.0, no_speech_sec = 0.5)
for i in range(0, audio_waveform.shape[0], sampling_rate):
	complete = False
	if i + sampling_rate >= audio_waveform.shape[0]:
		complete = True
	recognized_text = speech.transcribe_step(audio_waveform[i:min(audio_waveform.shape[0], i + sampling_rate)], sampling_rate, complete)
	for text in recognized_text:
		print(text)

Available model types

It is possible to select multiple models according to accuracy and speed. LARGE_V3_TURBO is the most recommended.

ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_TINY
ilia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_BASE
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_SMALL
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_MEDIUM
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO

API specification

https://github.com/axinc-ai/ailia-sdk

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ailia_speech-1.3.2.0.tar.gz (555.0 kB view details)

Uploaded Source

Built Distribution

ailia_speech-1.3.2.0-py3-none-any.whl (559.5 kB view details)

Uploaded Python 3

File details

Details for the file ailia_speech-1.3.2.0.tar.gz.

File metadata

  • Download URL: ailia_speech-1.3.2.0.tar.gz
  • Upload date:
  • Size: 555.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for ailia_speech-1.3.2.0.tar.gz
Algorithm Hash digest
SHA256 c467081a68358fa9f28645f946814bcff2a451a621a6137da8c6a6d59c923346
MD5 205415d89b3f62a620ac489d49379332
BLAKE2b-256 893cfbf3a11727bf18c115fb72526a67092d97dfafa8907a921ee2b8c2d68041

See more details on using hashes here.

File details

Details for the file ailia_speech-1.3.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ailia_speech-1.3.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 790447f84c08e818ac8e47450187647e9678914d8d81e018aa26a0384abace71
MD5 40a3c8fd48fc4270f973ada79fca5cba
BLAKE2b-256 ae4aa8443ec9d3a85744b118b86e6d384f3738ed867cd76b30621f7e46319891

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page