Skip to main content

Multimodal AI Story Teller, built with Stable Diffusion, GPT, and neural text-to-speech

Project description

StoryTeller

Code style: black License: MIT

A multimodal AI story teller, built with Stable Diffusion, GPT, and neural text-to-speech (TTS).

Given a prompt as an opening line of a story, GPT writes the rest of the plot; Stable Diffusion draws an image for each sentence; a TTS model narrates each line, resulting in a fully animated video of a short story, replete with audio and visuals.

out

Installation

PyPI

Story Teller is available on PyPI.

$ pip install storyteller-core

Source

  1. Clone the repository.
$ git clone https://github.com/jaketae/storyteller.git
$ cd storyteller
  1. Install dependencies.
$ pip install .

Note: For Apple M1/2 users, mecab-python3 is not available. You need to install mecab before running pip install. You can do this with Hombrew via brew install mecab. For more information, refer to this issue.

  1. (Optional) To develop locally, install dev dependencies and install pre-commit hooks. This will automatically trigger linting and code quality checks before each commit.
$ pip install -e .[dev]
$ pre-commit install

Quickstart

The quickest way to run a demo is through the CLI. Simply type

$ storyteller

The final video will be saved as /out/out.mp4, alongside other intermediate images, audio files, and subtitles.

To adjust the defaults with custom parametes, toggle the CLI flags as needed.

$ storyteller --help
usage: storyteller [-h] [--writer_prompt WRITER_PROMPT]
                   [--painter_prompt_prefix PAINTER_PROMPT_PREFIX] [--num_images NUM_IMAGES]
                   [--output_dir OUTPUT_DIR] [--seed SEED] [--max_new_tokens MAX_NEW_TOKENS]
                   [--writer WRITER] [--painter PAINTER] [--speaker SPEAKER]
                   [--writer_device WRITER_DEVICE] [--painter_device PAINTER_DEVICE]

optional arguments:
  -h, --help            show this help message and exit
  --writer_prompt WRITER_PROMPT
  --painter_prompt_prefix PAINTER_PROMPT_PREFIX
  --num_images NUM_IMAGES
  --output_dir OUTPUT_DIR
  --seed SEED
  --max_new_tokens MAX_NEW_TOKENS
  --writer WRITER
  --painter PAINTER
  --speaker SPEAKER
  --writer_device WRITER_DEVICE
  --painter_device PAINTER_DEVICE

Usage

For more advanced use cases, you can also directly interface with Story Teller in Python code.

  1. Load the model with defaults.
from storyteller import StoryTeller

story_teller = StoryTeller.from_default()
story_teller.generate(...)
  1. Alternatively, configure the model with custom settings.
from storyteller import StoryTeller, StoryTellerConfig

config = StoryTellerConfig(
    writer="gpt2-large",
    painter="CompVis/stable-diffusion-v1-4",
    max_new_tokens=100,
)

story_teller = StoryTeller(config)
story_teller.generate(...)

License

Released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

storyteller_core-0.0.2.tar.gz (6.5 kB view hashes)

Uploaded Source

Built Distribution

storyteller_core-0.0.2-py3-none-any.whl (7.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page