Skip to main content

A package to make it easy to interact with LLM's using voice

Project description

Voice Agent

A python package that makes it easy to interact with LLM's using voice

Setup

Install Port Audio on you OS

brew install portaudio #(MacOS)
apt install portaudio19-dev #(Debian/Ubuntu)

Then, setup your project, and install the python package

pip install voiceagent

Description

Voice Agent currently uses AssemblyAI for Speech to Text and ElevenLabs for Text to Speech

You can configure it to work with any LLM of your choice, for example OpenAI, or set it up to work with LangChain also. You are free to make the LLM setup as complex as you like, e.g adding Agents, RAG, Memory, or any other features you like

Voice Agent aims to make it easy to setup your LLM application the way you like, while also making it easy to get setup with voice

The current flow of requests looks like:

  1. Microphone -> Speech To Text (using AssemblyAI)
  2. LLM -> Can setup any code you like here to work with the text from the user query
  3. Speaker -> Text To Speech (using ElevenLabs)

Getting setup should look as easy as:

from os import getenv
from voiceagent.voice_agent import VoiceAgent

voice_agent = VoiceAgent(
    assemblyai_api_key=getenv('ASSEMBLYAI_API_KEY'),
    elevenlabs_api_key=getenv('ELEVENLABS_API_KEY')
)

def on_message_callback(message):
    print(f"Your message from the microphone: {message}", end="\r\n")
    # add any application code you want here to handle the user request
    # e.g. send the message to the OpenAI Chat API
    return "{response from the LLM}"

voice_agent.on_message(on_message_callback)

print("------------------------------------")
print("Voice Agent started. Start chatting!")
print("------------------------------------")
voice_agent.start()

And that is the message returned from on_message_callback gets sent to the speakers. That means you can focus only on writing your application code, and not worry about speech-to-text and text-to-speech conversions from the microphone and speakers

Example

The below example shows how to setup OpenAI to work with Voice Agent, so that you can talk to ChatGPT from your microphone, and hear back it's response from the speakers.

For convenience, the example script also streams the chat and responses to the command line

from os import getenv
from voiceagent.voice_agent import VoiceAgent
from openai import OpenAI

voice_agent = VoiceAgent(
    assemblyai_api_key=getenv('ASSEMBLYAI_API_KEY'),
    elevenlabs_api_key=getenv('ELEVENLABS_API_KEY')
)

openai_client = OpenAI(
    api_key=getenv('OPENAI_API_KEY'),
)

green = "\033[0;32m"
white = "\033[0;39m"


def on_partial_message_callback(message):
    print(f"{green}You: {message}", end="\r")


def on_message_callback(message):
    print(f"{green}You: {message}", end="\r\n")
    chat_completion = openai_client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": f"{message}. Keep responses no longer than 20 words",
            }
        ],
        model="gpt-3.5-turbo",
        stream=True
    )
    return chat_completion


def on_response_callback(response):
    print(f"{white}Assistant: {response}", end="\r\n")


voice_agent.on_partial_message(on_partial_message_callback)
voice_agent.on_message(on_message_callback)
voice_agent.on_response(on_response_callback)

print("------------------------------------")
print("Voice Agent started. Start chatting!")
print("------------------------------------")
voice_agent.start()

Event hooks

There are a few event hook you can use from when the user speaks into the microphone until the audio goes out on the speakers.

In order of the request flow, these event hooks are:

on_partial_message - This is called when the user is speaking into the microphone, and the speech-to-text conversion is still ongoing. For example the user is still speaking into the microphone, so the speech-to-text conversion is streaming in real time, but still not finalised. Once there is a pause in the user's speech, the message is then considered complete and the on_message event is called

on_message - This is called when the user has finished speaking into the microphone (user has paused speaking for more than 1.8 seconds), and the speech-to-text conversion is complete, and now ready to be sent to an LLM for processing

on_response - This is called when the LLM has processed the user's message. We can use this to access the response returned from the LLM

Future Work

Support more Speech-to-Text and Text-to-Speech libraries

Currently the setup is hardcoded to work with AssemblyAI and ElevenLabs. Would be nice to make it easy to swap these out for other libraries

Web application examples

Would be nice to setup some web application example that show how to use Voice Agent via a browser based application

Current Issues

Echo Issues

When I tested with certain headsets, or even used the computer speaker and microphones, the microphone would pick up the speaker's audio, resulting in ongoing loop of the LLM essentially chatting with itself. Using a headset where the microphone was a good distance away from the speaker resolved this

Latency Issues

Latency could definitely be better for a smoother conversation experience.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voiceagent-0.0.2.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voiceagent-0.0.2-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file voiceagent-0.0.2.tar.gz.

File metadata

  • Download URL: voiceagent-0.0.2.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.5 Darwin/23.4.0

File hashes

Hashes for voiceagent-0.0.2.tar.gz
Algorithm Hash digest
SHA256 78161d75c48eb767ccd1b544bcae84c3b86aaad96d04f66e8c85133c0a1865d3
MD5 a4c8026fa122e859ba21cccac7bd7556
BLAKE2b-256 7d9bf2887f40465d9f18abe754ea1e1a335557e0f78ec47ff4afdd5cf2ece6b5

See more details on using hashes here.

File details

Details for the file voiceagent-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: voiceagent-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.5 Darwin/23.4.0

File hashes

Hashes for voiceagent-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ab2e442956d680cc85f18149e86607f4ad1545285d9ca6a3e8fafefc0aa2e4a7
MD5 1a2afdcd486590fc004457ebcd899e3f
BLAKE2b-256 3a256ba884df2d6190e115616bd9af8c6afb04b48e65487f0a141ef811df887b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page