No project description provided

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Project description

Project logo

Document-to-podcast: a Blueprint by Mozilla.ai for generating podcasts from documents using local AI

This blueprint demonstrate how you can use open-source models & tools to convert input documents into a podcast featuring two speakers. It is designed to work on most local setups or with GitHub Codespaces, meaning no external API calls or GPU access is required. This makes it more accessible and privacy-friendly by keeping everything local.

👉 📖 For more detailed guidance on using this project, please visit our Docs here.

Built with

Python 3.10+ (use Python 3.12 for Apple M1/2/3 chips)
Llama-cpp (text-to-text, i.e script generation)
OuteAI / Parler_tts (text-to-speech, i.e audio generation)
Streamlit (UI demo)

Quick-start

Get started with Document-to-Podcast using one of the two options below: GitHub Codespaces for a hassle-free setup or Local Installation for running on your own machine.

Option 1: GitHub Codespaces

The fastest way to get started. Click the button below to launch the project directly in GitHub Codespaces:

Once the Codespaces environment launches, inside the terminal, start the Streamlit demo by running:

python -m streamlit run demo/app.py

Option 2: Local Installation

Clone the Repository Inside the Codespaces terminal, run:

git clone https://github.com/mozilla-ai/document-to-podcast.git
cd document-to-podcast

Install Dependencies Inside the terminal, run:
```
pip install -e .
```
Run the Demo Inside the terminal, start the Streamlit demo by running:
```
python -m streamlit run demo/app.py
```

NOTE: The first time you run the demo app it might take a while to generate the script or the audio because it will download the models to the machine which are a few GBs in size.

How it Works

Document Upload Start by uploading a document in a supported format (e.g., PDF, .txt, or .docx).
Document Pre-Processing The uploaded document is processed to extract and clean the text. This involves:
- Extracting readable text from the document.
- Removing noise such as URLs, email addresses, and special characters to ensure the text is clean and structured.
Script Generation The cleaned text is passed to a language model to generate a podcast transcript in the form of a conversation between two speakers.
- Model Loading: The system selects and loads a pre-trained LLM optimized for running locally, using the llama_cpp library. This enables the model to run efficiently on CPUs, making them more accessible and suitable for local setups.
- Customizable Prompt: A user-defined "system prompt" guides the LLM in shaping the conversation, specifying tone, content, speaker interaction, and format.
- Output Transcript: The model generates a podcast script in structured format, with each speaker's dialogue clearly labeled. Example output:
```
{
    "Speaker 1": "Welcome to the podcast on AI advancements.",
    "Speaker 2": "Thank you! So what's new this week for the latest AI trends?",
    "Speaker 1": "Where should I start.. Lots has been happening!",
    ...
}
```
This step ensures that the podcast script is engaging, relevant, and ready for audio conversion.
Audio Generation

The generated transcript is converted into audio using a Text-to-Speech (TTS) model.
Each speaker is assigned a distinct voice.
- The final output is saved as an audio file in formats like MP3 or WAV.

Models

The architecture of this codebase focuses on modularity and adaptability, meaning it shouldn't be too difficult to swap frameworks to use your own suite of models. We have selected fully open source models that are very memory efficient and can run on a laptop CPU with less than 10GB RAM requirements.

text-to-text

We are using the llama.cpp library, which supports open source models optimized for local inference and minimal hardware requirements. The default text-to-text model in this repo is the open source OLMoE-7B-Instruct from AllenAI.

For the complete list of models supported out-of-the-box, visit this link.

text-to-speech

We support models from the OuteAI and Parler_tts packages. The default text-to-speech model in this repo is OuteTTS-0.1-350M-GGUF. Note that the 0.1-350M version has a CC-By-4.0 (permissive) license, whereas the newer / better 0.2-500M version has a CC-By-NC-4.0 (non-commercial) license. For a complete list of models visit Oute HF (only the GGUF versions) and Parler HF.

Important note: In order to keep the package dependencies as lightweight as possible, only the Oute interface is installed by default. If you want to use the parler models, please also run:

pip install -e '.[parler]'

Pre-requisites

System requirements:
- OS: Windows, macOS, or Linux
- Python 3.10>, <3.12
- Minimum RAM: 10 GB
- Disk space: 32 GB minimum
Dependencies:
- Dependencies listed in pyproject.toml

Troubleshooting

When starting up the codespace, I get the message Oh no, it looks like you are offline!

If you are on Firefox and have Enhanced Tracking Protection On, try turning it Off for the codespace webpage.

During the installation of the package, it fails with ERROR: Failed building wheel for llama-cpp-python

You are probably missing the GNU Make package. A quick way to solve it is run on your terminal sudo apt install build-essential

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

daavoo

Release history Release notifications | RSS feed

1.4.5

Feb 12, 2025

1.4.4

Feb 12, 2025

1.4.3

Jan 28, 2025

1.4.2

Jan 17, 2025

1.4.1

Jan 17, 2025

1.4.0

Jan 17, 2025

1.3.1

Jan 16, 2025

1.3.0

Jan 13, 2025

1.2.0

Jan 10, 2025

1.1.2

Jan 9, 2025

1.1.1

Jan 9, 2025

1.1.0

Jan 9, 2025

1.0.2

Dec 18, 2024

1.0.1

Dec 16, 2024

This version

1.0.0

Dec 16, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

document_to_podcast-1.0.0.tar.gz (2.8 MB view details)

Uploaded Dec 16, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

document_to_podcast-1.0.0-py3-none-any.whl (17.9 kB view details)

Uploaded Dec 16, 2024 Python 3

File details

Details for the file document_to_podcast-1.0.0.tar.gz.

File metadata

Download URL: document_to_podcast-1.0.0.tar.gz
Upload date: Dec 16, 2024
Size: 2.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for document_to_podcast-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`f64a5dcc66b564c1b5d0c1ac29bf6130665aa863108471b350662bda783d8cf2`
MD5	`ead0501134a4750246f4140ef55a81ef`
BLAKE2b-256	`efd70a7e5ed451d76e3faa6dc5b0a198ce5cdf4eb863ad79d799bf228c00d8f2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for document_to_podcast-1.0.0.tar.gz:

Publisher: release.yaml on mozilla-ai/document-to-podcast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: document_to_podcast-1.0.0.tar.gz
- Subject digest: f64a5dcc66b564c1b5d0c1ac29bf6130665aa863108471b350662bda783d8cf2
- Sigstore transparency entry: 155623217
- Sigstore integration time: Dec 16, 2024
Source repository:
- Permalink: mozilla-ai/document-to-podcast@59e8b46d35c469e030c13320dfe657c2e61ca62e
- Branch / Tag: refs/tags/1.0.0
- Owner: https://github.com/mozilla-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yaml@59e8b46d35c469e030c13320dfe657c2e61ca62e
- Trigger Event: release

File details

Details for the file document_to_podcast-1.0.0-py3-none-any.whl.

File metadata

Download URL: document_to_podcast-1.0.0-py3-none-any.whl
Upload date: Dec 16, 2024
Size: 17.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for document_to_podcast-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`33e0326fb5830b522aec2d1cdd421892b9491ca1645577243771d66335cdbf00`
MD5	`12667e4a5ddf4ea38f16c64eb43c20f2`
BLAKE2b-256	`6e69a44094f23ec76394f81f5411de1ed29faff3367d5642962a9bf6009c1ef3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for document_to_podcast-1.0.0-py3-none-any.whl:

Publisher: release.yaml on mozilla-ai/document-to-podcast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: document_to_podcast-1.0.0-py3-none-any.whl
- Subject digest: 33e0326fb5830b522aec2d1cdd421892b9491ca1645577243771d66335cdbf00
- Sigstore transparency entry: 155623219
- Sigstore integration time: Dec 16, 2024
Source repository:
- Permalink: mozilla-ai/document-to-podcast@59e8b46d35c469e030c13320dfe657c2e61ca62e
- Branch / Tag: refs/tags/1.0.0
- Owner: https://github.com/mozilla-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yaml@59e8b46d35c469e030c13320dfe657c2e61ca62e
- Trigger Event: release

document-to-podcast 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

Document-to-podcast: a Blueprint by Mozilla.ai for generating podcasts from documents using local AI

👉 📖 For more detailed guidance on using this project, please visit our Docs here.

Built with

Quick-start

Option 1: GitHub Codespaces

Option 2: Local Installation

How it Works

Models

text-to-text

text-to-speech

Pre-requisites

Troubleshooting

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance