Skip to main content

Source lang speech to machine translation to target lang speech

Project description

Speech2Speech

image of main screen

The Speech2Speech Python package is a Streamlit Web application that models all phases of speech-to-speech translation, including:

  • recording speech in the source language,
  • converting the source language speech to source language text,
  • translating the source language text to target language text, and
  • converting the translated text to speech in the target language.

As a web application, it can be accessed through any web browser and is compatible with Linux, Mac, and Windows operating systems.

Speech2Speech is currently configured to translate to and from 13 different languages. Although the quality of translation may vary depending on the target language, it is pretty good for popular languages such as English, French, Portuguese, Spanish, German, Dutch and Italian. Speech2Speech can be configured for many more than just these languages (specified in the config. ini file), as long as they are supported by Whisper AI, Chat-GPT and gtts, the packages on which it depends.

Speech2Speech is designed to be accessible to a broad audience. One of the key advantages of Speech2Speech is that it's incredibly easy to use:

  • The package automatically detects the source language used in speech. The user therefore is not asked to specify it.
  • There is no need to train the software or the user before actually using the product. It works well straight out of the box with no further tuning or configuration required. This makes it a highly accessible tool that anyone can use, regardless of their technical expertise or experience with speech recognition and machine translation technology.

It is also hoped that this technology could be leveraged to develop products specifically designed for persons with visual impairments. It can empower them to have texts read aloud or dictate their texts and listen to them being read out loud before forwarding them to their intended recipients.

Each phase of the workflow creates a file, whose name is defined in the config.ini file. Advanced users can start and/or interrupt the workflow wherever they need by inserting their own files in the speech2speech/data subdirectory and adapting the config.ini file to refer to them.

Prerequisites

You need to get an OpenAI API key in order to use this app.

Speech2Speech local installation

Run the following command:

pip install speech2speech

In order to launch it locally follow these steps:

  1. Make sure the microphone and speakers of your device are on.

  2. Navigate to the directory where your Speech2Speech program is located using the cd command.

  3. Type the following command in the terminal to launch Speech2Speech:

    streamlit run speech2speech.py

Workflow

Here's a step-by-step guide on how to use the full workflow of Speech2Speech:

  1. Copy your OpenAI API key and paste it into the text box below the label "OpenAI API Key". The API key you enter will not be visible on the screen by default.
  2. Click the "Record Audio" button to start recording.
  3. Begin speaking or reading aloud. When your dictation is finished, press CTRL+E to stop recording it. Chat-GPT can automatically detect the language you're speaking (as long as it also supports it), so there's no need to specify it.
  4. Click the "Transcribe" button to convert your dictation into text.
  5. Select your desired target language from the dropdown menu under "Target Language".
  6. Click the "Translate" button to translate the transcription into your chosen target language. The translated text will appear on a blue background after a few seconds.
  7. Click the "Read Translation" button to listen to the translated text.
  8. If you want to repeat the process with a new dictation, click the "Refresh Page" button to reset the page.

As indicated above, you can also use just parts of this full workflow by specifying the name(s) of the file(s) you want to use in the config.ini file and by clicking the relevant button of the user interface.

What to do if you encounter issues

If Chat-GPT or Speech2Speech get stuck or you encounter any issues, simply refresh the browser page. ChatGPT may, however, have lots of users at certain times of the day and be poorly responsive for a while.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech2speech-0.3.0.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

speech2speech-0.3.0-py3-none-any.whl (3.3 kB view details)

Uploaded Python 3

File details

Details for the file speech2speech-0.3.0.tar.gz.

File metadata

  • Download URL: speech2speech-0.3.0.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for speech2speech-0.3.0.tar.gz
Algorithm Hash digest
SHA256 ca2717f94b2b82c1dd0cc7d4bea666e5a5b1fa384dbe4d3a0ed55bf407bd6fc2
MD5 f0ac05a8f6df1db062b0f92b7489bf7f
BLAKE2b-256 2838abe1df4464c5dbea7dc06bcb721c15a8d31681d1db8013389b16438af552

See more details on using hashes here.

File details

Details for the file speech2speech-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for speech2speech-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fa620f032c01d1c9e7ea1b9e9ea708c9dae9924310d5d888ff018cfe31589c9c
MD5 4d75ffca18fba9036fa78047ceff7ea9
BLAKE2b-256 d7b74a247f5b0ad41fbf74efe5c58c20a6d22008338e2154d9029cfde70b5ee3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page