Purple-Gold Gourd (紫金葫芦): absorb a creator's public voice and refine it into a chat persona.

These details have not been verified by PyPI

Project description

紫金葫芦

Named after the Purple-Gold Gourd from Journey to the West: "Dare you answer me when I call your name?" Once a creator answers the call, this library rapidly draws in their public voice, transcribes it, distills it, and refines it into a chat persona.

What it does

Resolves a creator by name, ID, handle, or URL on Bilibili or YouTube.
Downloads audio only from selected videos.
Transcribes with FunASR SenseVoice, including timestamps and language detection, and can also ingest local audio or video files through the CLI.
Exports each transcript as JSON and .srt.
Lets you drop custom .md files into each character's documents/ folder; those files also participate in skill generation and RAG.
Distills a persona skill.md with a local OpenAI-compatible LLM.
Builds BM25 retrieval over transcript chunks and custom documents.
Falls back to web search when transcript evidence is too thin, and injects that material as external Background info (背景信息) instead of persona memory.
Automatically refreshes the skill when transcripts or custom documents in the character data folder change.
Optionally synthesizes replies with a TTS plugin using an automatically selected voice prompt.

Install

Prerequisites:

Python 3.11 or newer
ffmpeg available on your PATH or configured through FFMPEG_PATH
An OpenAI-compatible chat endpoint for skill distillation and persona chat; the defaults target LM Studio at http://127.0.0.1:1234/v1

# Recommended: full local experience
pip install "purple-gold-gourd[full]"

# Minimal package install
pip install purple-gold-gourd

# Add the bundled FunASR speech-transcription plugin
pip install "purple-gold-gourd[speech]"

# Add Qwen3-TTS synthesis and audio playback helpers
pip install "purple-gold-gourd[tts]"

# Add Bilibili scraping support
pip install "purple-gold-gourd[bilibili]"

The main transcript-backed build flow needs the speech extra because the bundled STT plugin is FunASR. The tts extra is only needed when you want spoken replies or audio discussion output.

If you install the Bilibili extra, install a Playwright browser once:

playwright install chromium

For local development, editable installs still work:

pip install -e .
pip install -e ".[full]"

# Or use convenience requirements files
pip install -r requirements.txt
pip install -r requirements-full.txt

requirements.txt installs the local package with the core dependency set, and requirements-full.txt installs .[full]. After installation, you can use purple-gold-gourd or zijin-hulu CLI entrypoint.

Package layout

purple_gold_gourd/
  cli.py            entry point
  config.py         AppConfig
  schema.py         data classes
  utils.py          shared helpers
  language.py       language detection and normalization
  pipeline.py       build orchestrator

  plugins/
    stt/
      base.py       STT plugin interfaces
      registry.py   internal STT plugin loader/registry
      shared.py     subtitle helpers
      funasr/
        plugin.py   FunASR STT plugin
        transcriber.py
    tts/
      base.py       TTS plugin interfaces
      registry.py   internal TTS plugin loader/registry
      shared.py     voice prompt selection, playback, text prep, validation
      qwen3/
        plugin.py   Qwen3-TTS plugin
        voice.py

  media/
    platforms.py    creator resolution
    downloader.py   audio-only media download
    transcribe.py   compatibility shim to the STT plugin

  synthesis/
    voice.py        compatibility shim to TTS helpers

  chat/
    llm.py          OpenAI-compatible completion helper
    retrieval.py    BM25 retrieval + weak-RAG assessment
    web_search.py   web search fallback
    skillgen.py     persona distillation
    persona.py      chat loop

Quick start

Bilibili note: Bilibili does not expose a public search API, so the first build of a Bilibili character requires the creator's numeric UID (visible in the profile URL, e.g. space.bilibili.com/208259). Once a character has been built once, you can reopen it by name from the local cache.

# Bilibili — first build: must use the numeric UID
purple-gold-gourd "208259" --platform bilibili

# Bilibili — subsequent runs: name lookup works from local cache
purple-gold-gourd "敬汉卿"
purple-gold-gourd discuss "敬汉卿" "马督工" --topic "Should creators rely on AI tools?" --rounds 3

# YouTube — handle or URL works directly
purple-gold-gourd "@LinusTechTips" --platform youtube

# Use specific ranked videos only
purple-gold-gourd "@LinusTechTips" --series 1 3 8
purple-gold-gourd "@LinusTechTips" --series 2,5,9

# Import local audio/video into an existing character
purple-gold-gourd "敬汉卿" --media D:\clips\interview.mp3 D:\clips\livestream.mp4

# Build without opening chat
purple-gold-gourd "208259" --platform bilibili --build-only

# Start with speech synthesis enabled
purple-gold-gourd "敬汉卿" --speak

You can also run the current module path directly:

python -m purple_gold_gourd.cli "208259" --platform bilibili

Selection rules

If you pass --series, only those 1-based ranked video numbers are used for RAG.
If a requested video is missing locally, the library downloads and processes it immediately.
If you pass --media, each local audio/video file is converted to audio first and then transcribed into the same character.
If you do not pass --series, the library uses all cached transcripts for that creator.
Any .md files you place under that character's documents/ folder are also used for skill generation and RAG.
During character initialization, if files under transcripts/ or documents/ changed, the library refreshes skill.md automatically.
On a creator's first build, when no transcripts are cached yet, it bootstraps from the top --top videos.

Chat commands

Command	Effect
`/help`	Show available commands
`/speak on`	Enable reply synthesis
`/speak off`	Disable reply synthesis
`/rebuild`	Re-download, re-transcribe, and re-distill
`/calibrate <path> <start-end>`	Set a new voice reference from a time slice of any audio/video file, e.g. `/calibrate rec.mp4 00:10-00:20`
`/exit`	Quit

Discussion controls

When you use discuss, the CLI does not enter a normal one-character chat loop. Instead, it runs the requested rounds directly, while still letting you use these control commands before the start and between rounds:

Command	Effect
`/help`	Show available discussion controls
`/speak on`	Enable discussion speech playback and audio saving for later turns
`/speak off`	Disable discussion speech playback for later turns
`/exit`	Stop early and keep the partial record

CLI flags

Flag	Default	Description
`--platform`	`auto`	`auto`, `youtube`, or `bilibili`
`--top`	`10`	Videos to process during bootstrap ranking
`--scan-limit`	`30`	Candidate videos to inspect before ranking
`--series`	unset	1-based ranked video numbers to use for RAG
`--media`	unset	Local audio/video files to import and transcribe for this character
`--rebuild`	off	Ignore cache and rebuild
`--build-only`	off	Stop after profile build
`--speak`	off	Enable voice synthesis from start

Discussion mode flags:

Flag	Default	Description
`discuss`	n/a	Multi-character discussion mode
`--topic`	required	Discussion topic
`--rounds`	`3`	Full discussion rounds; each character speaks once per round
`--speak`	off	Start the discussion with speech playback enabled

Environment overrides

Project-specific overrides use the PURPLE_GOLD_GOURD_* prefix.

Variable	Purpose
`OPENAI_BASE_URL`	LLM endpoint
`OPENAI_API_KEY`	LLM API key
`OPENAI_MODEL`	Preferred model name
`OPENAI_MAX_CONTEXT_TOKENS`	Default prompt-context budget
`OPENAI_MAX_TOKENS`	Default completion budget
`OPENAI_MODEL_CONTEXT_TOKENS`	Per-model context limits
`OPENAI_MODEL_MAX_TOKENS`	Per-model completion limits
`PURPLE_GOLD_GOURD_STT_PLUGIN`	Active STT plugin, default `funasr`
`FUNASR_DEVICE`	`cuda:0` or `cpu`
`FUNASR_MODEL`	FunASR model ID
`PURPLE_GOLD_GOURD_TTS_PLUGIN`	Active TTS plugin, default `qwen3`
`PURPLE_GOLD_GOURD_WEB_SEARCH`	Enable guarded web search fallback
`PURPLE_GOLD_GOURD_WEB_SEARCH_MAX_RESULTS`	Max web results injected into prompts
`PURPLE_GOLD_GOURD_WEB_SEARCH_TIMEOUT_S`	Web-search timeout in seconds
`PURPLE_GOLD_GOURD_VALIDATE_TTS`	Validate synthesized speech by re-transcribing it when set
`QWEN3_TTS_MODEL`	Qwen3-TTS model id or local path
`QWEN3_TTS_DEVICE_MAP`	Qwen3-TTS device map
`QWEN3_TTS_DTYPE`	Qwen3-TTS dtype
`QWEN3_TTS_ATTN_IMPLEMENTATION`	Optional attention backend
`QWEN3_TTS_CHUNK_CHARS`	Approximate chars per TTS chunk
`QWEN3_TTS_DO_SAMPLE`	Enable or disable Qwen3-TTS sampling
`QWEN3_TTS_MAX_NEW_TOKENS`	Optional generation cap for TTS
`FFMPEG_PATH`	ffmpeg binary path

Data layout

data/creators/<platform>-<id>-<name>/
  manifest.json
  videos.json
  downloads/
  transcripts/
  documents/
  skill/
    skill.md
    notes/
  voice/
  outputs/

Put any custom markdown files you want the persona to use into documents/. No extra command is needed; the next character initialization will pick them up automatically and refresh the skill when needed.

Discussion records are saved separately under data/discussions/<timestamp>-<topic>/, including discussion.json, discussion.md, discussion.txt, and an audio/ folder when discussion speech is enabled.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.3

Apr 19, 2026

0.1.2

Apr 18, 2026

This version

0.1.1

Apr 18, 2026

0.1.0

Apr 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

purple_gold_gourd-0.1.1.tar.gz (63.4 kB view details)

Uploaded Apr 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

purple_gold_gourd-0.1.1-py3-none-any.whl (69.0 kB view details)

Uploaded Apr 18, 2026 Python 3

File details

Details for the file purple_gold_gourd-0.1.1.tar.gz.

File metadata

Download URL: purple_gold_gourd-0.1.1.tar.gz
Upload date: Apr 18, 2026
Size: 63.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for purple_gold_gourd-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`f94a1c00afc32f897df70c2185580b1b62316696924a7d158cb6c5abcd0dad6c`
MD5	`2cd7ac84a415179c35d1970b81812344`
BLAKE2b-256	`44e7ba5958a84c0f9b1558a5ac127b5bcd7eb1915b9065ef651fe34312163c9e`

See more details on using hashes here.

File details

Details for the file purple_gold_gourd-0.1.1-py3-none-any.whl.

File metadata

Download URL: purple_gold_gourd-0.1.1-py3-none-any.whl
Upload date: Apr 18, 2026
Size: 69.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for purple_gold_gourd-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`378e0b64fc673b511b31e9c756c1a9492835bffcefbb33f275e8cfdbeb131898`
MD5	`9dccab7783aaa34ed17c8995181c98dc`
BLAKE2b-256	`d4c0550be36496edf5d288df768e9b02027d8ba260cf6bef9f324c577a27c41a`

See more details on using hashes here.

purple-gold-gourd 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

紫金葫芦

What it does

Install

Package layout

Quick start

Selection rules

Chat commands

Discussion controls

CLI flags

Environment overrides

Data layout

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes