Multimodal AI Story Teller, built with Stable Diffusion, GPT, and neural text-to-speech.
Project description
StoryTeller
A multimodal AI story teller, built with Stable Diffusion, GPT, and neural text-to-speech (TTS).
Given a prompt as an opening line of a story, GPT writes the rest of the plot; Stable Diffusion draws an image for each sentence; a TTS model narrates each line, resulting in a fully animated video of a short story, replete with audio and visuals.
Quickstart
- Clone the repository.
$ git clone https://github.com/jaketae/storyteller.git
- Install package requirements.
$ pip install --upgrade pip wheel
$ pip install -e .
# for dev requirements, do:
# pip install -e .[developer]
- Run the demo. The final video will be saved as
/out/out.mp4
, alongside other intermediate images, audio files, and subtitles.
$ storyteller
# alternatively with make, do:
# make run
Usage
- Load the model with defaults.
from storyteller import StoryTeller
story_teller = StoryTeller.from_defaults()
story_teller.generate(...)
- Alternatively, configure the model with custom settings.
from storyteller import StoryTeller, StoryTellerConfig
config = StoryTellerConfig(
writer="gpt2-large",
painter="CompVis/stable-diffusion-v1-4",
max_new_tokens=100,
diffusion_prompt_prefix="Van Gogh style",
)
story_teller = StoryTeller(config)
story_teller.generate(...)
License
Released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
fabler-0.0.1-py3-none-any.whl
(90.8 kB
view hashes)