Skip to main content

ailia AI Speech

Project description

ailia AI Speech Python API

!! CAUTION !! “ailia” IS NOT OPEN SOURCE SOFTWARE (OSS). As long as user complies with the conditions stated in License Document, user may use the Software for free of charge, but the Software is basically paid software.

About ailia AI Speech

ailia AI Speech is a library to perform speech recognition using AI. It provides a C API for native applications, as well as a C# API well suited for Unity applications. Using ailia AI Speech, you can easily integrate AI powered speech recognition into your applications.

Install from pip

You can install the ailia AI Speech free evaluation package with the following command.

pip3 install ailia_speech

Install from package

You can install the ailia AI Speech from Package with the following command.

python3 bootstrap.py
pip3 install ./

Usage

Batch mode

In batch mode, the entire audio is transcribed at once.

import ailia_speech

import librosa

import os
import urllib.request

# Load target audio
input_file_path = "demo.wav"
if not os.path.exists(input_file_path):
	urllib.request.urlretrieve(
		"https://github.com/axinc-ai/ailia-models/raw/refs/heads/master/audio_processing/whisper/demo.wav",
		"demo.wav"
	)
audio_waveform, sampling_rate = librosa.load(input_file_path, mono = True)

# Infer
speech = ailia_speech.Whisper()
speech.initialize_model(model_path = "./models/", model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO)
recognized_text = speech.transcribe(audio_waveform, sampling_rate)
for text in recognized_text:
	print(text)

Step mode

In step mode, the audio is input in chunks and transcribed sequentially.

import ailia_speech

import librosa

import os
import urllib.request

# Load target audio
input_file_path = "demo.wav"
if not os.path.exists(input_file_path):
	urllib.request.urlretrieve(
		"https://github.com/axinc-ai/ailia-models/raw/refs/heads/master/audio_processing/whisper/demo.wa",
		"demo.wav"
	)
audio_waveform, sampling_rate = librosa.load(input_file_path, mono = True)

# Infer
speech = ailia_speech.Whisper()
speech.initialize_model(model_path = "./models/", model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO)
speech.set_silent_threshold(silent_threshold = 0.5, speech_sec = 1.0, no_speech_sec = 0.5)
for i in range(0, audio_waveform.shape[0], sampling_rate):
	complete = False
	if i + sampling_rate >= audio_waveform.shape[0]:
		complete = True
	recognized_text = speech.transcribe_step(audio_waveform[i:min(audio_waveform.shape[0], i + sampling_rate)], sampling_rate, complete)
	for text in recognized_text:
		print(text)

Available model types

It is possible to select multiple models according to accuracy and speed. LARGE_V3_TURBO is the most recommended.

ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_TINY
ilia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_BASE
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_SMALL
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_MEDIUM
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3
ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO

API specification

https://github.com/axinc-ai/ailia-sdk

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ailia_speech-1.3.2.1.tar.gz (555.0 kB view details)

Uploaded Source

Built Distribution

ailia_speech-1.3.2.1-py3-none-any.whl (559.5 kB view details)

Uploaded Python 3

File details

Details for the file ailia_speech-1.3.2.1.tar.gz.

File metadata

  • Download URL: ailia_speech-1.3.2.1.tar.gz
  • Upload date:
  • Size: 555.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for ailia_speech-1.3.2.1.tar.gz
Algorithm Hash digest
SHA256 c5d470ebaed6d252d88e8483ce1839f6b0c6338d9c3c2616027468913d4944bd
MD5 a506425fd34a722438bb69987eb64aae
BLAKE2b-256 15ce9731abec97d03b539a0490a23b91c1ea3e36a097c345bba3e7f724cf704c

See more details on using hashes here.

File details

Details for the file ailia_speech-1.3.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ailia_speech-1.3.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7f4f05fba1a5d9ef283f345c71712ecc92fb48f4a656008bde1d89591d673a31
MD5 5bcdd60b82843c2898f3af010871f0be
BLAKE2b-256 bdcc3ebc8331c3d2d7093c6281400e5cb4cdfd97ebcae930f84439a7097974b0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page