Skip to main content

SDK for dolphinvoice

Project description

DolphinVoice Python SDK

DolphinVoice SDK is used for speech recognition and synthesis. This SDK provides three main modules:

  • Real-time Speech Recognition (ASR)
  • Audio File Transcription (FileAsr)
  • Text to Speech (TTS)

Directory

Documentation

Find more detailed documentation and guides about the DolphinVoice SDK in the following resources:

For technical support or any questions, please contact our developer support team: voice.contact@dolphin-ai.jp

Installation

Install the DolphinVoice Python SDK using pip: You can install this SDK directly from pip.

pip install dolphinvoice

Usage

Real-time Speech Recognition

from dolphinvoice.speech_rec.callbacks import SpeechTranscriberCallback
from dolphinvoice import speech_rec
import time

class Callback(SpeechTranscriberCallback):
    def started(self, message):
        print('TranscriptionStarted: %s' % message)

    def result_changed(self, message):
        print('TranscriptionResultChanged: %s' % message)

    def sentence_begin(self, message):
        print('SentenceBegin: %s' % message)

    def sentence_end(self, message):
        print('SentenceEnd: %s' % message)

    def completed(self, message):
        print('TranscriptionCompleted: %s' % message)

    def task_failed(self, message):
        print('TaskFailed: %s' % message)

    def warning_info(self, message):
        print('Warning: %s' % message)

    def channel_closed(self):
        print('TranslationChannelClosed')

audio_path = 'demo.mp3'
client = speech_rec.SpeechClient(app_id='YOUR_APP_ID', app_secret='YOUR_APP_SECRET')

with client.create_transcriber(Callback()) as transcriber:
    transcriber.set_parameter({
        "lang_type": "en-US",
        "format": "mp3",
        "sample_rate": 16000,
    })
    transcriber.start()
    with open(audio_path, 'rb') as f:
        audio = f.read(7680)
        while audio:
            transcriber.send(audio)
            time.sleep(0.24)
            audio = f.read(7680)
    transcriber.stop()

Audio File Transcription

from dolphinvoice import speech_rec

client = speech_rec.SpeechClient(app_id='YOUR_APP_ID', app_secret='YOUR_APP_SECRET')

asrfile = client.create_asrfile()

audio = 'demo.mp3'
data = {
    "lang_type": "en-US",
    "format": "mp3",
    "sample_rate": 16000
}
result = asrfile.transcribe_file(audio, data)
print(result)

Text to Speech

from dolphinvoice.speech_syn.callbacks import SpeechSynthesizerCallback
from dolphinvoice import speech_syn

class MyCallback(SpeechSynthesizerCallback):
    def __init__(self, name):
        self._name = name
        self._fout = open(name, 'wb')

    def binary_data_received(self, raw):
        self._fout.write(raw)

    def on_message(self, message):
        print('Received : %s' % message)

    def started(self, message):
        print('MyCallback.OnSynthesizerStarted: %s' % message)

    def get_Timestamp(self,message):
        print('MyCallback.OnSynthesizerGetTimestamp: %s' % message)

    def get_Duration(self, message):
        print('MyCallback.OnSynthesizerGetDuration: %s' % message)

    def completed(self, message):
        print('MyCallback.OnSynthesizerCompleted: %s' % message)
        self._fout.close()

    def channel_closed(self):
        print('MyCallback.OnSynthesizerChannelClosed')

audio_name = 'syAudio.mp3'
client = speech_syn.SpeechClient(app_id='YOUR_APP_ID', app_secret='YOUR_APP_SECRET')
callback = MyCallback(audio_name)

with client.create_synthesizer(callback) as synthesizer:
    synthesizer.set_parameter({
        "text": "The weather is nice, let's go for a walk.",
        "lang_type": "en-US",
        "format": "mp3"
    })
    synthesizer.start()
    synthesizer.wait_completed()

API Reference

Real-time Speech Recognition

The real-time speech recognition module is for processing real-time audio streams.

Methods

  • create_transcriber(callback: SpeechSynthesizerCallback) - Registers event handlers for recognition events
  • set_parameter(params: Json) - Specifies parameters
  • start() - Starts a new recognition session
  • send(stream: Bytes) - Sends audio stream to the recognition service
  • stop() - Stops the current recognition session and releases resources

Events

  • TranscriptionStarted - Triggered when recognition session starts
  • SentenceBegin - Triggered when a new sentence is detected
  • TranscriptionResultChanged - Triggered when intermediate results are updated
  • SentenceEnd - Triggered when a sentence is completed
  • TranscriptionCompleted - Triggered when the entire recognition session is completed
  • Warning - Triggered when a non-fatal warning occurs

Audio File Transcription

The audio file transcription module is for processing pre-recorded audio files.

Methods

  • transcribe_file(audio: String, params: Json) - Uploads and transcribes the audio file

Text to Speech

The text-to-speech synthesis module is used to convert text into natural speech.

Methods

  • create_synthesizer(callback: SpeechSynthesizerCallback) - Registers event handlers for synthesis events
  • set_parameter(params: Json) - Specifies parameters
  • start() - Starts a new synthesis session

Events

  • OnSynthesizerStarted - Triggered when synthesis process starts
  • OnSynthesizerGetDuration - Provides the total duration of the synthesized audio
  • OnSynthesizerGetTimestamp - Provides timestamp information for the synthesized text
  • OnSynthesizerCompleted - Triggered when synthesis process is completed

Keywords

DolphinVoice DolphinAI ASR TTS Text-to-Speech Speech-to-Text Speech-Recognition Speech-Synthesis

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dolphinvoice-1.0.0.tar.gz (25.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dolphinvoice-1.0.0-py3-none-any.whl (38.3 kB view details)

Uploaded Python 3

File details

Details for the file dolphinvoice-1.0.0.tar.gz.

File metadata

  • Download URL: dolphinvoice-1.0.0.tar.gz
  • Upload date:
  • Size: 25.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.10

File hashes

Hashes for dolphinvoice-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6024d7b28aa8dd73cdc6078c19439f7303841cfb96919e2a96d763e56c74b867
MD5 9b547ef2d17f67d1ddeee79e1436f761
BLAKE2b-256 407373bc3ed5ab83adf5815a17e0f8a290c825eb31dea39d26b151910e4c2e0e

See more details on using hashes here.

File details

Details for the file dolphinvoice-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: dolphinvoice-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 38.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.10

File hashes

Hashes for dolphinvoice-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7876eab32887e0538ddafd838f87405cee71d24767582bfb7d67742e5b50da24
MD5 e2d75015855c585595a599815b8ffcf0
BLAKE2b-256 42cca7c81d6c4beaef179d721bf558fc5eca4d2e952f33aa7fa9c24df3418a1c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page