Skip to main content

Live-ish illustration for your role-playing campaign

Project description

TTRPG live_illustrate

ASR + LLM + Diffusion = ???

This project:

  • Uses Whisper to transcribe live audio of a tabletop RPG session
  • Uses GPT-3.5 to extract a description of the current setting from the transcript
  • Uses DALL-E to draw the setting
  • Uses Flask & HTMX to display a new image every few minutes

And like most AI projects, it simultaneously works better and worse than one might expect. The images generated are usually an amusingly flawed rendition of what's going on, but are almost too good to be just ambient background flavor.

Demo Reel

Some scenes from our party's first trial session:

image The party enjoys dinner together on the deck of the Daydream. No one's quite sure where the other ship came from, but it looks nice.

image The party sails the Daydream through a narrow canal in a swamp, searching for the hidden pirate city of Siren's Cove. Perhaps they should ask the barrel people for directions.

image The party eavesdrops on a red-haired gnome and a halfling in a Siren's Cove tavern who are plotting to steal a competitor's shipping manifest. Pay no attention to the faces of the other patrons.

image The party seeks further gossip at a luxe brothel called The Rich Dagger, guarded by a Goliath bouncer and famed for its perplexing architecture.

Installation

I recommend installing in a virtual environment.

# From PyPI:
pip install live-illustrate

# Or for hacking:
git clone git@github.com:ehennenfent/live_illustrate.git
cd live_illustrate
pip install -e ".[dev]"

Whisper will be much faster if you use a cuda-enabled pytorch build. I recommend installing this manually afterwards.

pip install --index-url https://download.pytorch.org/whl/cu118 torch torchvision torchaudio  # https://pytorch.org/get-started/locally/

You'll need an OpenAI API key, exposed via environment variable or in the .env file, like so: OPENAI_API_KEY=<my_secret_api_key>.

With the default settings, it costs about $1/hour to run. You can lower the cost by reducing the size of the generated images, or increasing the interval between them.

Running

Once installed, run the illustrate command line tool, which will automatically start recording with your default microphone. A data\ directory will be created containing the generated images and transcripts, and a web server will start on localhost:8080 to display the generated images.

A few words about the most important command line options:

  • --wait_minutes: This controls how frequently the tool draws an image, which directly translates into how expensive it is to run. The default of 7.5 minutes seems to work well for our campaign.
  • --max_context: Each interval, the tool looks back at the transcript and collects up to max_context tokens to send to GPT3. It will get as close as possible, so some of these tokens may come from before the previous image was generated. GPT can be a bit slow about summarizing large amounts of text, so be careful about making this too large. The default of 2000 tokens seems to correspond very roughly to about ten minutes of conversation from one of our sessions, but YMMV.
  • --persistence_of_memory When summarizing long conversations, the LLM can seem to get "stuck" on the first setting described. This argument controls what fraction of the previous context is retained each time an image is generated. The default setting of 0.2 may lead to some discontinuity if your party is in one place for a long time.

Optionally, it's possible to upload generated images to a Discord server automatically by configuring a Discord webhook and supplying the URL in the DISCORD_WEBHOOK environment variable.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

live_illustrate-0.2.0.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

live_illustrate-0.2.0-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file live_illustrate-0.2.0.tar.gz.

File metadata

  • Download URL: live_illustrate-0.2.0.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for live_illustrate-0.2.0.tar.gz
Algorithm Hash digest
SHA256 542989b6c37970b58ea417ceaf4dc037b49cecd0152c0f1586d93a887dd8197d
MD5 d8ec306e1ca777e4c62a7ec9bbe63b40
BLAKE2b-256 e30877d827f4ae32e18123f3e4d30e0491088592adb5a54f96a2d5595c15d245

See more details on using hashes here.

File details

Details for the file live_illustrate-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for live_illustrate-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 117ca419ddc12db1881e78780a5e00f8918300cc4b1e99c78034a4f786dec437
MD5 925f39b1a33b0111c746a3de982aef14
BLAKE2b-256 c1f0296dccfcb1730a7d98e5c5f2122c2101cb3eaac0fbc83bc7aa4a5a67a6b4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page