Python project for development of a Conversation Optimized Robot Assistant (CORA). CORA is a voice assistant that is powered by openai's chatgpt for both user intent detection as well as general LLM responses.
Project description
corava
CORA Virtual Assistant
Description:
Python project for development of a Conversation Optimized Robot Assistant (CORA). CORA is a voice assistant that is powered by openai's chatgpt for both user intent detection as well as general LLM responses.
This project is also using amazon AWS's Polly service for voice synthesis and the speechrecognition library utilising google's text to speech for user speech recognition. We are also using pydub and simpleaudio to play the audio coming back from Amazon AWS Polly service without having to write any audio files on the disk.
Getting Started:
- Install the corava library from pip:
pip install corava
- Get all your API keys and setup a .env or just feed them into config if you want. Here is an example using .env.
from corava import cora
from dotenv import load_dotenv
import os
load_dotenv() # take environment variables from .env.
def main():
config = {
"AWS_ACCESS_KEY" : os.getenv('AWS_ACCESS_KEY'),
"AWS_SECRET_KEY" : os.getenv('AWS_SECRET_KEY'),
"AWS_REGION" : os.getenv('AWS_REGION'),
"OPENAI_KEY" : os.getenv('OPENAI_KEY'),
"CHATGPT_MODEL" : os.getenv('CHATGPT_MODEL')
}
conversation_history = cora.start(config)
print(conversation_history)
if __name__ == "__main__":
main()
How to use CORA:
- The wake word for cora is "cora" at start up cora won't do anything except listen for the wake word.
- If the wake word is detected, cora will respond.
- you can say 'cora' and your query in a single sentance and cora will both wake up and respond.
- after cora has awoken, you can continue your conversation until you specifically ask cora to either go to 'sleep' or or 'shut down'.
- in 'sleep' mode, cora will stop responding until you say the wake word
- if you asked cora to 'shut down' at any point, cora's loops will end gracefully, your most recent messages will be summurised and saved locally and the program will exit
- At the moment cora has not been setup with any real functions (this will come soon) however if you ask it for the weather or to turn on a light it will run some dummy functions. These will be updated or removed at as the project progresses.
Project Dependancies:
- Python 3.11.6
- OpenAI API Key
- AWS Polly Key
- Microsoft Visual C++ 14.0 or greater
- SpeechRecognition
- simpleaudio
- pydub
- boto3
- python-dotenv
- openai
- pyaudio
- whisper-mic
- soundfile
Setting up your dev environment:
-
Install Python 3.11.6 from: https://www.python.org/downloads/release/python-3116/
- 3.11.6 is required at the moment because this is the latest version supported by pyaudio
-
Clone this repo:
git clone https://github.com/Nixxs/corava.git
- Setup your local .env file in the project root:
AWS_ACCESS_KEY = "[YOUR OWN AWS ACCESS KEY]"
AWS_SECRET_KEY = "[THE CORRESPONDING SECRET KEY]"
AWS_REGION = "[AWS REGION YOU WANT TO USE]"
OPENAI_KEY = "[OPENAI API KEY]"
CHATGPT_MODEL = "gpt-3.5-turbo-1106"
cora uses the amazon aws polly service for it's voice synthesis. To access this service, you will need to generate a key and secret on your amazon aws account that has access to the polly service. You'll also want to define your aws region here too as well as your openai key and the chatgpt model you want to use, make sure the model supports parallel function calling otherwise cora's skill functions might not work (at time of writing either gpt-3.5-turbo-1106 or gpt-4-1106-preview).
- Install dependancies using poetry is easiest:
poetry install
OPTIONAL: pydub generally also needs ffmpeg installed as well if you want to do anything with audio file formats or editing the audio at all. This project doesn't require any of that (at least not yet) as we just use simpleaudio to play the stream. However, you will get a warning from pydub on import if you don't have ffmpeg installed.
You can download it from here to cover all bases, you will also need to add it to your PATH:
- Then just run the entry script using
poetry run cora
Road Map (Core):
Initial text and speech recognitionSynthesize voice from AWS PollyIntegration with openai chatgptUpgrade the openai ai service to use function callingSimple utility functions for logging to the screenSimple activation on wake-up wordsupdate skills to support parallel function callingSimple speech visualiser using pygamechange visualisation depending on sleeping or not sleepingDisplay logging output in the visualiserMake it easier to setup the project from scratch (use poetry)setup the project so it can be used from pypimanage the conversation history better to work more effciently with the token limit- Allow CORA to monitor things and report back/notify as events occur (third thread)
- Refactor cora to better manage state, have cora decide if the user wants her to shutdown or go into sleep mode rather than just looking for words in speech recognition
remember message history between sessions- Build and implement ML model for wake-up word detection
use a local model for speech recognition instead of sending it to google- Improve memory to store things into a long-term memory file that will correct itself as CORA learns more about it's user
- Support for local LLM instead of using sending everything to OpenAI
- need an open source model that will support function calling well
Road Map (Active Skills):
- Report daily outlook calendar schedule
- Make the weather function call actually work
- Report latest most relevant news for a given location
- Play youtube music (have a look at whats available in youtube apis)
- Open youtube videos (have a look at whats available in youtube apis)
- look up information using google maps (directions, distance to)
- generate an image and open it (openai DALL-E image api)
Road Map (Monitoring Skills):
- Monitor calendar and notify of next meeting
Additional Notes:
- Conversations are logged locally in the corava/logs folder and organised by date
- Summurised recent memory is stored in corava/memory folder
- CORA will remember the most recent thing you talked about from your previous conversation.
- CORA uses a local model for text to speech, when you send speech to CORA for the first time the Whisper base model will be downloaded to your local computer and will be used from there.
- When you are in a conversation with CORA, all your querys are being sent to the OpenAI ChatGPT model that you set so be aware of that.
- Take a look cora's skills in the cora_skills.py file, make your own skills that might be relevant to you. Skills are activated when ChatGPT thinks the user wants to use one of the skills and give's cora access to everything you'd want to do (you just have to write the skill).
Local Voices:
In an earlier version of the project we were using local voices, at some stage this might still be useful if we don't want to pay for AWS Polly anymore.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file corava-0.2.4.tar.gz
.
File metadata
- Download URL: corava-0.2.4.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.0 CPython/3.11.6 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 924d1997a828cb5fb85d8e6603bb595053eb783ed5568b2ff33eafcdb20ab2bf |
|
MD5 | 40e724386c2e8f10cf5f4065a9f10a3f |
|
BLAKE2b-256 | af89ccfaa803d2665ce9665d67c7edec5236134952ddd9908b3d902825cec0f3 |
File details
Details for the file corava-0.2.4-py3-none-any.whl
.
File metadata
- Download URL: corava-0.2.4-py3-none-any.whl
- Upload date:
- Size: 15.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.0 CPython/3.11.6 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f313d1d3fd47030ab3018e7ff8d6e7b458a647ae5abee128533ee0b38318eee |
|
MD5 | 74bf745c9c4b0eccb8e41fe0aa5d5acc |
|
BLAKE2b-256 | 1b96ac0925c61ef402d692c04a6aa62ba4fa511f853328976b25ca1cd8dab7df |