Visualization tool for various generation tasks on Language Models.

These details have not been verified by PyPI

Project links

Project description

Conditional Language Model Generation Visualization

when evaluating language models it is often pain to see what is generated and why
this little package is a vue.js frontend together with flask backend and it is designed to easily show some interesting visualizations on conditional generation models
it handles frontend-backend communication as well as frontend rendering
hence the developper can focus only on ML aspects of his work!

example workflow

VERSION: `0.3.4` changelog

added exception handling
merged TextInputElement and ButtonElement (TextInputElement became a SubElement)

Installation

install from pypi:
- pip install visuallm

Example Usage

Alpaca Example

The first workflow that we'll show is the workflow where you don't alter the implementation of the components at all and just use the provided components.

Run Instructions

The alpaca example code can be found here: ./examples_py/alpaca_example, the code can be started by running flask --app examples_py.alpaca_example.app run.

We'll use alpaca dataset and gpt2 model as those are reasonably small to run even on less performant computers.

# ./examples_py/alpaca_example/app.py lines 14-19
def create_text_to_tokenizer(loaded_sample, target: str | None = None) -> str:
    text_to_tokenizer = f"Instruction: {loaded_sample['instruction']} Answer:"
    if target is not None:
        text_to_tokenizer += " " + target

All the datasets are different, therefore we expect the user to provide 3 functions, which define how the text which is tokenized is constructed, how the text for the one step prediction is constructed, and how the target text is constructed.

# ./examples_py/alpaca_example/app.py lines 22-41
def create_text_to_tokenizer_one_step(loaded_sample, received_tokens: list[str]) -> str:
    # one step prediction means that the model is used to predict tokens one per one
    # received_tokens list contains already selected tokens

    text_to_tokenizer = (
        f"Instruction: {loaded_sample['instruction']} Answer:"
        + "".join(received_tokens)
    )
    return text_to_tokenizer


def retrieve_target_str(loaded_sample):
    return loaded_sample["output"]


generator = HuggingFaceGenerator(
    model=model,
    tokenizer=tokenizer,
    create_text_to_tokenizer=create_text_to_tokenizer,

Instantiate all the components from the library and run the server

# ./examples_py/alpaca_example/app.py lines 44-57
)

app = create_app(dataset=dataset, generator_choices={"gpt2": generator})

Dataset Visualization (Screenshots)

In the screenshot, you can see that the dataset browser is created, where you can select dataset sample and dataset split and the frontend will show the inputs to the model and also the expected output.

dataset_visualization

Generation (Screenshots)

In the screenshot, you can see the dataset browser and the model generations.

generation

Next Token Prediction (Screenshots)

In the screenshot, you can see that the library enables you to go through the generation step by step and explore why the generated sample (which can be seen e.g. on the Generation tab) looks the way it looks, how the distribution is skewed, etc.

next_token_prediction

PersonaChat Example

The second workflow that we'll show is the workflow where you alter the implementation of the components, so that the dataset sample is shown in a different way.

If you want to use the app with the personachat dataset, you can play with prepared example by running: flask --app examples_py.persona_chat_example.app run.

The code for the sample can be found here: ./examples_py/persona_chat_example.

Customization

The personachat dataset contains two pieces of information for each dataset sample.

The bot's persona
The past dialogue history

So we will add a TableElement which will display the two tables, one with bot's persona and one with past dialogue history. Since the visualization code is the same for all the components we will extract it into a separate class.

# ./examples_py/persona_chat_example/components/input_display.py lines 9-55
class PersonaChatVisualization:
    def __init__(self) -> None:
        # just for the typechecker to not complain
        self.loaded_sample: Any = 1

    def init_dialogue_vis_elements(self) -> list[ElementBase]:
        """Init elements which display the personachat tables."""
        table_input_heading = HeadingElement(content="Structure of Dialogue")
        self.input_table_vis = TableElement()
        return [table_input_heading, self.input_table_vis]

    def update_dialogue_structure_display(self, add_target: bool = True):
        """Update elements which display the personachat tables."""
        sample = self.loaded_sample
        context = copy.deepcopy(sample["history"])
        if add_target:
            context.append(sample["candidates"][-1])
        persona = sample["personality"]

        self.set_sample_tables_element(persona, context)

    def set_sample_tables_element(
        self, persona: list[str], context: list[str], other_last: bool = False
    ):
        """Populate the tables with the information from the dataset sample."""
        self.input_table_vis.clear()

        self.input_table_vis.add_table(
            title="BOT Persona",
            headers=["Trait"],
            rows=[[t] for t in persona],
        )

        d_len = len(context)  # dialogue length
        bot_on_odd = int(d_len % 2 == (1 if not other_last else 0))
        whos = ["BOT" if i % 2 == bot_on_odd else "OTHER" for i in range(d_len)]

        if len(context) > 0:
            self.input_table_vis.add_table(
                "Turns",
                ["Who", "Turn"],
                [[w, u] for w, u in zip(whos, context, strict=True)],
            )

Afterwards we need to implement the inheritors of components that should make use of this specific visualization of the dataset sample. Here is an example of the Generation component.

# ./examples_py/persona_chat_example/components/generation.py lines 1-23
from visuallm.components import GenerationComponent
from visuallm.elements.element_base import ElementBase

from .input_display import PersonaChatVisualization


class Generation(GenerationComponent, PersonaChatVisualization):
    def __post_init__(self, *args, **kwargs):
        self.after_on_generator_change_callback()

    def init_model_input_display(self) -> list[ElementBase]:
        return [
            *PersonaChatVisualization.init_dialogue_vis_elements(self),
            *super().init_model_input_display(),
        ]

    def update_model_input_display(self):
        super().update_model_input_display()
        PersonaChatVisualization.update_dialogue_structure_display(
            self, add_target=False
        )

Generation Playground

Select which parameters you want to use for generation, plug in a HuggingFace model, or an OpenAI token and have fun with experimenting with various generation hyperparameters!

gen_params generation

Chat Playground

Select which parameters you want to use for generation, plug in a HuggingFace model, or an OpenAI token and have fun with chatting with the model!

chat

Visualize Next Token Predictions

By using visuallm.components.NextTokenPredictionComponent.NextTokenPredictionComponent you can just plug the HuggingFace model in and go through the generation process step by step.

next_token_prediction

Other Examples

There is some other documentation:

How does the communication and bootstrapping of the components work ? (link)
What is a minimal app that can be constructed ? (link)
How do the elements work, how can I create custom components ? (link)

Acknowledgement

this work was published on the INLG 2023 conference as a demo paper (link will be added later)
Supported by the project TL05000236 AI asistent pro žáky a učitele co-financed by the Technological Agency of the Czech Republic and by the ERC (No. 101039303 NG-NLG). Resources provided by the LINDAT/CLARIAH-CZ Research Infrastructure

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.5

Feb 3, 2024

0.4.4

Feb 1, 2024

0.4.3

Jan 16, 2024

0.4.2

Jan 8, 2024

0.4.1

Jan 8, 2024

0.4.0

Jan 7, 2024

0.3.13

Dec 20, 2023

0.3.12

Dec 12, 2023

0.3.10

Dec 10, 2023

0.3.9

Nov 26, 2023

0.3.8

Nov 21, 2023

0.3.7

Nov 13, 2023

0.3.6

Nov 13, 2023

0.3.5

Nov 13, 2023

This version

0.3.4

Nov 13, 2023

0.3.3

Nov 13, 2023

0.3.2

Nov 12, 2023

0.3.1

Nov 12, 2023

0.3.0

Sep 13, 2023

0.2.1

Sep 13, 2023

0.2.0

Jul 8, 2023

0.1.8

Jun 16, 2023

0.1.7

Jun 15, 2023

0.1.6

Jun 15, 2023

0.1.5

Jun 15, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

visuallm-0.3.4.tar.gz (121.0 kB view hashes)

Uploaded Nov 13, 2023 Source

Built Distribution

visuallm-0.3.4-py3-none-any.whl (129.4 kB view hashes)

Uploaded Nov 13, 2023 Python 3

Hashes for visuallm-0.3.4.tar.gz

Hashes for visuallm-0.3.4.tar.gz
Algorithm	Hash digest
SHA256	`4f782b740b2bf965e8ac55ec8d45cb644dc9b5b7cb9ae69cb031d0cc60b1cdeb`
MD5	`4ef67e8c727508913d8b4252202521a8`
BLAKE2b-256	`37e50a84be1c05771e2a21944233fcb5fc2c68245d0c395934555bcd2e789895`

Hashes for visuallm-0.3.4-py3-none-any.whl

Hashes for visuallm-0.3.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`808a944538c739560991d5758ad5fa699aa1666da6c25fc657f07f2743c5113c`
MD5	`d7b21efddb63261c3d1e84fd2d3d111f`
BLAKE2b-256	`04de0108be4d4dc21df106a8b8cc10eb085fe1ca992e9aa4b63389f0cfd054ff`

visuallm 0.3.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Conditional Language Model Generation Visualization

VERSION: `0.3.4` changelog

Table of content

Installation

Example Usage

Alpaca Example

Run Instructions

Dataset Visualization (Screenshots)

Generation (Screenshots)

Next Token Prediction (Screenshots)

PersonaChat Example

Customization

Generation Playground

Chat Playground

Visualize Next Token Predictions

Other Examples

Acknowledgement

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

visuallm 0.3.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Conditional Language Model Generation Visualization

VERSION: 0.3.4 changelog

Table of content

Installation

Example Usage

Alpaca Example

Run Instructions

Dataset Visualization (Screenshots)

Generation (Screenshots)

Next Token Prediction (Screenshots)

PersonaChat Example

Customization

Generation Playground

Chat Playground

Visualize Next Token Predictions

Other Examples

Acknowledgement

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

VERSION: `0.3.4` changelog