A package to visualize tokenization boundaries using HTML
Project description
Tokenizer Viz
Tokenizer Viz is a Python package that generates HTML to visualize the tokenization of text. It highlights tokens with different colors and customizable styles, making it easier to understand how a text is tokenized.
Project Layout
tokenizer-viz/
│
├── tokenizer_viz/
│ ├── __init__.py
│ └── visualization.py
│
├── .gitignore
├── LICENSE
├── README.md
└── setup.py
<br>
## Installation
You can install the **`tokenizer-viz`** package using pip:
```bash
pip install tokenizer-viz
Usage
Here's a quick example of how to use the package:
Usage with a provided encoder and decoder
from tokenizer_viz import TokenVisualization
from IPython.display import HTML
# Define sample encoder and decoder functions for demonstration purposes
def sample_encoder(text):
return list(text)
def sample_decoder(token):
return token
# Initialize the TokenVisualization class with the encoder and decoder functions
token_viz = TokenVisualization(
encoder=sample_encoder,
decoder=sample_decoder
)
# Define a sample text to visualize tokenization boundaries
sample_text = "This is a sample text.\nIt has multiple lines."
# Visualize the tokenization boundaries
html = token_viz.visualize(sample_text)
HTML(html)
EXAMPLE OUTPUT
TBD EXAMPLE IMAGE
ARGUMENTS
The TokenVisualization
class accepts several optional
parameters to customize the appearance and layout of the tokens:
cmap
(defualt='Pastel2'
),font_family
(defualt='Courier New'
),transparency
(default=0.675
),font_size
(defualt='1.1em'
),unk_token
(defualt='???'
),font_weight
(defualt=300
),padding
(defualt='0px'
),margin_right
(defualt='0px'
),border_radius
(defualt='0px'
),background_color
(defualt='#F0F0F0'
),
Please refer to the class docstrings and method docstrings for a detailed description of each parameter.
License
This project is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tokenizer-viz-0.2.2.tar.gz
.
File metadata
- Download URL: tokenizer-viz-0.2.2.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ccb885586fe6b3b5cfbc643886f2da8c6e7140413ee64ca794d8d6f3223abd3c |
|
MD5 | 825b4d8966705cf58bd59db593ecfa2e |
|
BLAKE2b-256 | 6d13ad2c9901731127951b814e95a5230010487fdbd1212d6e6543ea5f9ebbf3 |
File details
Details for the file tokenizer_viz-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: tokenizer_viz-0.2.2-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9fd0f61b25bcdb027055bbe54342ee9770e50028939c26ae81d8653e7c3460d |
|
MD5 | c521b65e8502b6f77f5398cb0af815ef |
|
BLAKE2b-256 | 7a09ff2b26e3be94b1916c0a138505ca3b41575c5bef0767c8795ee36ca6ded2 |