ML-Pipelines architecture and implementation for paraphrasing and voice cloning

These details have not been verified by PyPI

Project links

repository

Project description

`chara`

Chara is a collection of research scripts that are primarly used to create characters.

Currently, two branches are covered:

Paraphrasing. This is used to paraphrase the outputs of Kaia so that they match to the personality of the character. The pipeline will take the replies from the assistant, take the previously generated paraphrases from Avatar server, determine which replies need more coverage, paraphrase them and upload the results back to the Avatar server. All you need to do is to define the characters and their personalities. Under the hood, this system generate the paraphrases for arbitrary grammatron Templates, which can be used elsewhere: e.g. we also use it in natural-language understanding research.
Voice cloning will help you to create character's voices. It will take the voice sample of your character, then the GPU-requiring zero-shot voice cloner generate an hours-long dataset of phrases, and then the lightweight Piper model will be trained on this dataset.

Both systems support different languages. Currently German, English and Russian are implemented, others can be added with some additional effort.

For the detailed description of these project, please consult the README.md files in the respective directories.

General concepts

chara infrastructure offers a way to write the reproducible, maintainable and deliverable research. Traditionally, the research is often done in notebooks, and while they are very handy for research, they are also terrible in production. It's hard to pass the parameters to the notebook, or to write a unit tests for it, or to call one notebook from another, etc. Hence, the code is often rewritten as a pure Python, but then it loses the benefits of the notebooks: visualizations that may still be useful in production for the quality control. Also, after the conversions, the intermediate values are no longer cached in memory, and so if your several hours long pipeline gives incorrect result in the end, it's not possible no understand why. Also, it's impossible to fix the bug and restart the pipeline from the partially computed state.

I've been suffering from these problems for quite awhile, and chara became a simple and efficient solution. First, foundation_kaia.logger provides a logger that can consume plots, dataframes and ipynotebook widgets and output them to the HTML files. This way you can still enjoy your intermediate visualizations, copied straight from notebooks.

Second, chara offers the caching infrastruture, seamlessly build over the functions call.

Caching

If you need to cache the results of the function(*args, **kwargs), just write Chara.start(folder), and then Chara.call(function)(*args, **kwargs). This will cache the returned value of function in the folder, so if the function is called again, the value will be restored and function won't be called again. If function calls other functions inside with Chara.call, the subfolders with the interpretable names will be created to cache the results of the internal calls. In case something went wrong, you can invalidate the cache: invalidate_self(path) will invalidate the result of the function's call, but not the results of the inner calls, while invalidate_down will reset the associated cache completely. Also, these functions remove the caches of all the functions called after the desired path, and remove the cached results of all the function up on the stack, so basically this corresponds to "repeat the pipeline from the selected place".

Obviously, caches need a bit of architectural redesign of the code, e.g.

If the code is called with the different parameters, the folder needs to be changed or reseted
The code shouldn't have branching on the Chara.call calls. If you need to conditionally run the function, call it anyway and return None.

In addition, you may use the Chara.phase decorator to subdivide a function without extracting functions from them:

def function(argument):
  
  @Chara.phase
  def first_formula():
    return argument + 1

  first_result = Chara.last.result
  
  @Chara.phase
  def second_formula():
    return first_result * 2

  second_result = Chara.last.result
  return 1 / second_result

This will create two caches, for the first_formula and the second_formula, and store the intermediate values there.

Finally, all the functions can get the access to the current folder with Chara.current. That allows to store some additional files there. Since chara pipelines work with media, they are quite heavy on files, and this functionality is very handy.

Integrations

There are some important integrations that offer you the basic building blocks for your pipelines. The most important ones are BrainBox integrations. brainbox_training_pipeline runs the training process (such as e.g. PiperTraining.train), monitors the progress and interprets the results. brainbox_pipeline accepts the Iterable of tasks, adds them to the server, and then stores the pickled results in the tar file, without placing it in the memory at once. If the result of the task is file or several files, the pipeline downloads these files from the server, and in this case the paths of the downloaded files will be stored in the tar file. This tar file can then be iteratively read, again, without placing its content in the memory at once.

Cases

Most of the integrations implement the pattern of ICasePipeline: they accept the list of cases (essentially, arbitrary dataclasses), modify the cases (or replace with other cases), and return the updated collection or the same one. This is extremely convenient when you need e.g. to run several BrainBox tasks that are associated with, e.g. a particular file, and then bring together the results. BrainBoxCasePipeline does just this: builds a task for each case, and then places the result of the task to the case's field.

AnnotationPipeline is also very useful. When working with GenAI, it is often needed to annotate data, e.g. rejecting some of the data points by quality. AnnotationPipeline accepts the list of the cases, displays each case and collects the feedback. Display is done via IAnnotator, and right now a Gradio-based GradioLabelAnnotator is available.

There are also collective actions on the cases. Such pipelines accept inner_pipeline: ICasePipeline, and:

RepeatUntilDonePipeline calls inner_pipeline several times on the array of the cases, excluding those cases that received a non-erroneous answer.
ChooseBestAnswerPipeline calls inner_pipeline several times on all cases, then select the most popular answer. This is handy if you need the LLMs to vote on the result.
BatchingPipeline selects several cases from the bigger subset and calls the inner_pipeline on them, thus providing the manageable execution time of each batch and control over when to stop the process.

Project details

These details have not been verified by PyPI

Project links

repository

Release history Release notifications | RSS feed

5.0.2

May 18, 2026

5.0.1

May 10, 2026

5.0.0

May 5, 2026

4.9.91

May 5, 2026

This version

4.9.9

May 5, 2026

4.0.9

Dec 27, 2025

0.0.0

Jan 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaia_chara-4.9.9.tar.gz (895.4 kB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kaia_chara-4.9.9-py3-none-any.whl (935.8 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file kaia_chara-4.9.9.tar.gz.

File metadata

Download URL: kaia_chara-4.9.9.tar.gz
Upload date: May 5, 2026
Size: 895.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for kaia_chara-4.9.9.tar.gz
Algorithm	Hash digest
SHA256	`bc8d76d4a740389c8e551c2a725d41546746919850b28945faaa8c5c332f3c41`
MD5	`23327e41dced9cb308365c9cd5bdd4d9`
BLAKE2b-256	`2af167cc35d8e13b4b621f9c532027c7d84a18dfe33aee25917166a27b176e3a`

See more details on using hashes here.

File details

Details for the file kaia_chara-4.9.9-py3-none-any.whl.

File metadata

Download URL: kaia_chara-4.9.9-py3-none-any.whl
Upload date: May 5, 2026
Size: 935.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for kaia_chara-4.9.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ad5c70ac3d7621f284175ffb21533d9e4a6f6e6add681c7a58b099a28bd2a4b2`
MD5	`df3d6637f957f0dcb3e71f7b31fd7d3a`
BLAKE2b-256	`51fe18e945acef9aa5463de445e24f9464eb93c8f26839b6e93e8edc50e3628b`

See more details on using hashes here.

kaia-chara 4.9.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

`chara`

General concepts

Caching

Integrations

Cases

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes