Streamlit component that allows to record mono audio from the user's microphone, and/or perform speech recognition directly.

Project description

Streamlit Mic Recorder

Streamlit component that allows to record mono audio from the user's microphone, and/or perform speech recognition directly.

Installation instructions

pip install streamlit-mic-recorder

Usage instructions

Two functions are provided (with the same front-end):

1. Mic Recorder

from streamlit_mic_recorder import mic_recorder
audio = mic_recorder(
    start_prompt="Start recording",
    stop_prompt="Stop recording",
    just_once=False,
    use_container_width=False,
    callback=None,
    args=(),
    kwargs={},
    key=None
)

Renders a button. Click to start recording, click to stop. Returns None or a dictionary with the following structure:

{
    "bytes": audio_bytes,  # wav audio bytes mono signal, can be processed directly by st.audio
    "sample_rate": sample_rate,  # depends on your browser's audio configuration
    "sample_width": sample_width,  # 2
    "id": id  # A unique timestamp identifier of the audio
}

sample_rate and sample_width are provided in case you need them for further audio processing.

Arguments:

'start/stop_prompt', the prompts appearing on the button depending on its recording state.
'just_once' determines if the widget returns the audio only once just after it has been recorded (and then None), or on every rerun of the app. Useful to avoid reprocessing the same audio twice.
'use_container_width' just like for st.button, determines if the button fills its container width or not.
'callback': an optional callback being called when a new audio is received
'args/kwargs': optional args and kwargs passed to the callback when triggered

Remark: When using a key for the widget, due to how streamlit's component API works, the associated state variable will only contain the raw unprocessed output from the React frontend, which was not very practical. For convenience, I added a special state variable to be able to access the output in the expected format (the dictionary described above) more easily. If key is the key you gave to the widget, you can acces the properly formatted output via key+'_output' in the session state. Here is an example on how it can be used within a callback:

from streamlit_mic_recorder import mic_recorder
import streamlit as st

def callback():
    if st.session_state.my_recorder_output:
        audio_bytes = st.session_state.my_recorder_output['bytes']
        st.audio(audio_bytes)


mic_recorder(key='my_recorder', callback=callback)

2. Speech recognition with Google API

from streamlit_mic_recorder import speech_to_text
text = speech_to_text(
    language='en',
    start_prompt="Start recording",
    stop_prompt="Stop recording",
    just_once=False,
    use_container_width=False,
    callback=None,
    args=(),
    kwargs={},
    key=None
)

Renders a button. Click to start recording, click to stop. Returns None or a text transcription of the recorded speech in the chosen language. Similarly to the mic_recorder function, you can pass a callback that will trigger when a new text transcription is received, and access this transcription directly in the session state by adding an '_output' suffix to the key you chose for the widget.

import streamlit as st
from streamlit_mic_recorder import speech_to_text
def callback():
    if st.session_state.my_stt_output:
        st.write(st.session_state.my_stt_output)


speech_to_text(key='my_stt', callback=callback)

Example

import streamlit as st
from streamlit_mic_recorder import mic_recorder, speech_to_text

state = st.session_state

if 'text_received' not in state:
    state.text_received = []

c1, c2 = st.columns(2)
with c1:
    st.write("Convert speech to text:")
with c2:
    text = speech_to_text(language='en', use_container_width=True, just_once=True, key='STT')

if text:
    state.text_received.append(text)

for text in state.text_received:
    st.text(text)

st.write("Record your voice, and play the recorded audio:")
audio = mic_recorder(start_prompt="⏺️", stop_prompt="⏹️", key='recorder')

if audio:
    st.audio(audio['bytes'])

Using it with OpenAI Whisper API

For those interested in using the mic recorder component with Whisper here is the script I'm using, working just fine for me.

# whisper.py

from streamlit_mic_recorder import mic_recorder
import streamlit as st
import io
from openai import OpenAI
import dotenv
import os


def whisper_stt(openai_api_key=None, start_prompt="Start recording", stop_prompt="Stop recording", just_once=False,
               use_container_width=False, language=None, callback=None, args=(), kwargs=None, key=None):
    if not 'openai_client' in st.session_state:
        dotenv.load_dotenv()
        st.session_state.openai_client = OpenAI(api_key=openai_api_key or os.getenv('OPENAI_API_KEY'))
    if not '_last_speech_to_text_transcript_id' in st.session_state:
        st.session_state._last_speech_to_text_transcript_id = 0
    if not '_last_speech_to_text_transcript' in st.session_state:
        st.session_state._last_speech_to_text_transcript = None
    if key and not key + '_output' in st.session_state:
        st.session_state[key + '_output'] = None
    audio = mic_recorder(start_prompt=start_prompt, stop_prompt=stop_prompt, just_once=just_once,
                         use_container_width=use_container_width, key=key)
    new_output = False
    if audio is None:
        output = None
    else:
        id = audio['id']
        new_output = (id > st.session_state._last_speech_to_text_transcript_id)
        if new_output:
            output = None
            st.session_state._last_speech_to_text_transcript_id = id
            audio_bio = io.BytesIO(audio['bytes'])
            audio_bio.name = 'audio.mp3'
            success = False
            err = 0
            while not success and err < 3:  # Retry up to 3 times in case of OpenAI server error.
                try:
                    transcript = st.session_state.openai_client.audio.transcriptions.create(
                        model="whisper-1",
                        file=audio_bio,
                        language=language
                    )
                except Exception as e:
                    print(str(e))  # log the exception in the terminal
                    err += 1
                else:
                    success = True
                    output = transcript.text
                    st.session_state._last_speech_to_text_transcript = output
        elif not just_once:
            output = st.session_state._last_speech_to_text_transcript
        else:
            output = None

    if key:
        st.session_state[key + '_output'] = output
    if new_output and callback:
        callback(*args, **(kwargs or {}))
    return output

Usage:

import streamlit as st
from whisper import whisper_stt

text = whisper_stt(
    openai_api_key="<your_api_key>", language = 'en')  # If you don't pass an API key, the function will attempt to load a .env file in the current directory and retrieve it as an environment variable : 'OPENAI_API_KEY'.
if text:
    st.write(text)

Project details

Release history Release notifications | RSS feed

This version

0.0.8

Mar 5, 2024

0.0.7

Mar 5, 2024

0.0.6

Mar 5, 2024

0.0.5

Mar 5, 2024

0.0.4

Nov 18, 2023

0.0.2

Sep 11, 2023

0.0.1

Sep 6, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streamlit_mic_recorder-0.0.8.tar.gz (473.9 kB view details)

Uploaded Mar 5, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

streamlit_mic_recorder-0.0.8-py3-none-any.whl (2.2 MB view details)

Uploaded Mar 5, 2024 Python 3

File details

Details for the file streamlit_mic_recorder-0.0.8.tar.gz.

File metadata

Download URL: streamlit_mic_recorder-0.0.8.tar.gz
Upload date: Mar 5, 2024
Size: 473.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for streamlit_mic_recorder-0.0.8.tar.gz
Algorithm	Hash digest
SHA256	`5a29a98f3bd1582f9d5d90911ef498b32244863d646759c2f5ceec515befb6cf`
MD5	`4ab9d081c16fec19c1f368c4081f869e`
BLAKE2b-256	`971b8dc0c547691abb4f98ce42da7767f49ee6e3257a576c7a94e7941003d5dc`

See more details on using hashes here.

File details

Details for the file streamlit_mic_recorder-0.0.8-py3-none-any.whl.

File metadata

Download URL: streamlit_mic_recorder-0.0.8-py3-none-any.whl
Upload date: Mar 5, 2024
Size: 2.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for streamlit_mic_recorder-0.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`acbd5ed868dba083d567341c85f1740ae42bd03259c2780dce7f69d5bc109ac8`
MD5	`4cb7d724b38066baf9c72474b1134eb6`
BLAKE2b-256	`29452883ea5ac05aab014399f996255d7a84fc764c39b892fe68d0835135e641`

See more details on using hashes here.

streamlit-mic-recorder 0.0.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Streamlit Mic Recorder

Installation instructions

Usage instructions

1. Mic Recorder

2. Speech recognition with Google API

Example

Using it with OpenAI Whisper API

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes