Skip to main content

Live2D Mouth-sync artifact

Project description

PyPI - Python Version PyPI - License PyPI - Version PyPI Downloads

English | 中文

pymouth

pymouth 是基于Python的Live2D口型同步库. 你可以用音频文件, 甚至是AI模型输出的ndarray, 就能轻松的让你的Live2D形象开口.
效果演示视频. Demo video

  • 以Python API的形式提供能力,用作和其他项目的集成,把宝贵的计算资源留给皮套的大脑,而不是给音频捕获软件和虚拟声卡。
  • 采用动态时间规划算法(DTW)匹配音频中的元音,并以元音置信度(softmax)的方式输出,而不是使用AI模型,即使是移动端CPU也绰绰有余。
  • VTubeStudio对pymouth来说只是可选项,只是一个Adapter,你可以使用Low Level API和你想要皮套引擎结合,只使用音频播放和音频分析能力。

Update

Quick Start

Environment

  • Python>=3.10
  • VTubeStudio>=1.28.0 (可选)

Installation

pip install pymouth

Get Started

  1. 在开始前你需要打开 VTubeStudio 的 Server 开关. 端口一般默认是8001.
    server_start.png

  2. 你需要确定自己Live2D口型同步的支持参数.
    请注意:下面提供一种简单的判断方式,但这种方式会修改(重置)Live2D模型口型部分参数,使用前请备份好自己的模型。
    如果你对自己的模型了如指掌,可以跳过这步。
    setup.png

    • 确认重置参数后,如果出现以下信息,则说明你的模型仅支持 基于分贝的口型同步 db.png
    • 确认重置参数后,如果出现以下信息,则说明你的模型仅支持 基于元音的口型同步 vowel.png
    • 如果VTubeStudio找到了所有参数,并且重置成功,说明两种方式都支持。只需要在接下来的代码中选择一种方式即可.
  3. 下面是两种基于不同方式的Demo.
    你可以找一个音频文件替换some.wav.
    samplerate:音频数据的采样率.
    output_device:输出设备Index. 这里很重要,如果不告诉插件播放设备是哪个,那么插件不会正常工作。 可以参考audio_devices_utils.py

    • 基于分贝的口型同步

      import time
      from pymouth import VTSAdapter, DBAnalyser
      
      def main():
        with VTSAdapter(DBAnalyser()) as a:
            a.action(audio='some.wav', samplerate=44100, output_device=2)
            time.sleep(100000)  # do something
      
      
      if __name__ == "__main__":
        main()
      
    • 基于元音的口型同步

      import time
      from pymouth import VTSAdapter, VowelAnalyser
      
      def main():
        with VTSAdapter(VowelAnalyser()) as a:
            a.action(audio='some.wav', samplerate=44100, output_device=2)
            time.sleep(100000)  # do something
      
      
      if __name__ == "__main__":
        main()
      

      第一次运行程序时, VTubeStudio会弹出插件授权界面, 通过授权后, 插件会在runtime路径下生成pymouth_vts_token.txt文件, 之后运行不会重复授权, 除非token文件丢失或在VTubeStudio移除授权.

About AI

下面是一个比较完整的使用pymouth作为AI TTS消费者的例子。

import queue
import threading
import time
from fish_speech import tts
from pymouth import VTSAdapter, DBAnalyser, VTSPluginInfo


class SpeakMsg:
    def __init__(self, msg: str, required: bool):
        self.msg = msg
        self.required = required
        self.create_timestamp = time.time()
        self.create_datetime = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(self.create_timestamp))


class Speaker:
    def __init__(self):
        self.queue = queue.Queue(1)

    def start(self):
        plugin_info = VTSPluginInfo(plugin_name='kanojyo2',
                                    developer='organics',
                                    authentication_token_path='./pymouth_vts_token.txt',
                                    plugin_icon=None)

        with VTSAdapter(DBAnalyser(temperature=10), plugin_info=plugin_info) as a:
            while True:
                msg: SpeakMsg = self.queue.get()
                t0 = time.time()
                audio, rate = tts.tts_ndarray(msg.msg)
                print(f'speak time:{time.time() - t0:.02f}')

                a.action(audio=audio, samplerate=rate, output_device=2)

    def speak(self, msg: str, required=True):
        if required:
            self.queue.put(SpeakMsg(msg, required))
        else:
            try:
                self.queue.put_nowait(SpeakMsg(msg, required))
            except queue.Full:
                print("Queue Full")


if __name__ == "__main__":
    speakers = Speaker()
    # 这里的实现只作为参考而不是建议。对于AI等CPU密集型场景,使用线程而不是协程可能会更好。
    threading.Thread(target=speakers.start).start()

More Details

High Level

关键的代码只有两行:

with VTSAdapter(DBAnalyser(temperature=10)) as a:
    a.action(audio='some.wav', samplerate=44100, output_device=2)  # no-block
    # a.action_block(audio='aiueo.wav', samplerate=44100, output_device=2) # block

temperature=10温度(softmax) ,有别于LLM中对下一个token的概率分布,这里的温度指的是:音频的每个窗帧FFT,与元音相似度的概率分布。值越大,概率越平均,口型会变得平滑。反之亦然。默认为10,不可 <=0,可以随意调整这个值观察同步效果,并确定理想值。
a.action()非阻塞,会立即返回,由程序内部维护线程池和队列。
a.action_block()阻塞,直到音频播放和处理完毕才会返回,纯同步代码无线程,线程由调用者维护。

VTSAdapter以下是详细的参数说明:

param required default describe
analyser Y 分析仪,必须是 Analyser 的子类,目前支持DBAnalyserVowelAnalyser
db_vts_mouth_param 'MouthOpen' 仅作用于DBAnalyser, VTS中控制mouth_input的参数, 如果不是默认值请自行修改.
vowel_vts_mouth_param dict[str,str] 仅作用于VowelAnalyser, VTS中控制mouth_input的参数, 如果不是默认值请自行修改.
ws_uri str websocket uri 默认:ws://localhost:8001
plugin_info VTSPluginInfo 插件信息,可以自定义

a.action() 会开始处理音频数据. 以下是详细的参数说明:

param required default describe
audio Y 音频数据, 可以是文件path, 可以是SoundFile对象, 也可以是ndarray
samplerate Y 采样率, 这取决与音频数据的采样率, 如果你无法获取到音频数据的采样率, 可以尝试输出设备的采样率.
output_device Y 输出设备Index, 这取决与硬件或虚拟设备. 可用 audio_devices_utils.py 打印当前系统音频设备信息.
finished_callback None 音频处理完成会回调这个方法.
auto_play True 是否自动播放音频,默认为True,会播放音频(自动将audio写入指定output_device)

Low Level

Get Started 演示了一种High Level API 如果你不使用 VTubeStudio 或者想更加灵活的使用, 可以尝试Low Level API. 下面是一个Demo.

import time

from pymouth import DBAnalyser


def callback(y: float, data):
    # Y is the Y coordinate of the model's mouth.
    # Like is 0.4212883452
    print(y)  # do something


with DBAnalyser() as a:
    a.action_noblock('zh.wav', 44100, output_device=2, callback=callback)  # no block
    # a.action_block()  # block
    print("end")
    time.sleep(1000000)
import time

from pymouth import VowelAnalyser


def callback(md: dict[str, float], data):
    """
    md like is:
    {
        'VoiceSilence': 0,
        'VoiceA': 0.6547555255,
        'VoiceI': 0.2872873444,
        'VoiceU': 0.1034789232,
        'VoiceE': 0.3927834533,
        'VoiceO': 0.1927834548,
    }
    """
    print(md)  # do something


with VowelAnalyser() as a:
    a.action_noblock('zh.wav', 44100, output_device=2, callback=callback)  # no block
    # a.action_block() # block
    print("end")
    time.sleep(1000000)

Interruption During Playback

TTS的音频有时会很长,用户需要一个中断信号让Agent闭嘴。为此添加了一个 interrupt_listening 参数。
这个参数接收一个函数,这个函数的返回值必须是bool
插件会在每处理一个block_size时检查这个函数的返回值,如果返回True,异步线程会立刻结束(a.action_noblock()) ,或同步函数立刻返回(a.action_block())。

import threading
import time

from pymouth import VowelAnalyser


class Demo:
    def __init__(self):
        self.interrupt = False

    def __callback(self, md: dict[str, float], data):
        pass

    def __listening(self) -> bool:
        # Time-consuming operations are not recommended here.
        return self.interrupt

    def interrupt_handler(self):
        time.sleep(5)  # do something
        print("interrupt")
        self.interrupt = True

    def boot(self):
        threading.Thread(target=self.interrupt_handler).start()

        with VowelAnalyser() as a:
            a.action_block('zh.wav', 44100,
                           output_device=3,
                           callback=self.__callback,
                           interrupt_listening=self.__listening)
            print("end")


Demo().boot()

TODO

  • Test case

Special Thanks

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymouth-1.3.3.tar.gz (67.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pymouth-1.3.3-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file pymouth-1.3.3.tar.gz.

File metadata

  • Download URL: pymouth-1.3.3.tar.gz
  • Upload date:
  • Size: 67.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for pymouth-1.3.3.tar.gz
Algorithm Hash digest
SHA256 bbb50323c7d898163d5184e40ac0414ed9bda94624e0f0653e9b42de5b3b826a
MD5 c41cd5cd0fee6b12b8100223709182b4
BLAKE2b-256 342d839b97a580426fd1c9eeed2fd7bc5c738bd53c2b3181ef9554d3a3e66fbf

See more details on using hashes here.

File details

Details for the file pymouth-1.3.3-py3-none-any.whl.

File metadata

  • Download URL: pymouth-1.3.3-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for pymouth-1.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 701887e5968266d25f2c727a83c3b5e48273fc4ec00310f5406c11466bd8f606
MD5 e75b04091caa95969d94f84bca16dc36
BLAKE2b-256 98a66ea1d2b70b9326ed10f4e307cc2396aeef17c8375ee840c9ee74f1045ea8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page