Multi-modal tokenizers for more than just text.
Project description
Multi-Modal Tokenizers
Multi-modal tokenizers for more than just text. This package provides tools for tokenizing and decoding images and mixed-modal inputs (text and images) using encoders like DALL-E's VAE.
Installation
To install the package, clone the repository and use pip to install it:
git clone https://github.com/anothy1/multi-modal-tokenizers
pip install ./multi-modal-tokenizers
Or from PyPI:
pip install multi-modal-tokenizers
Usage
Example: Using DalleTokenizer
Below is an example script demonstrating how to use the DalleTokenizer
to encode and decode images.
import requests
import PIL
import io
from multi_modal_tokenizers import DalleTokenizer, MixedModalTokenizer
from IPython.display import display
def download_image(url):
resp = requests.get(url)
resp.raise_for_status()
return PIL.Image.open(io.BytesIO(resp.content))
# Download an image
img = download_image('https://assets.bwbx.io/images/users/iqjWHBFdfxIU/iKIWgaiJUtss/v2/1000x-1.jpg')
# Load the DalleTokenizer from Hugging Face repository
image_tokenizer = DalleTokenizer.from_hf("anothy1/dalle-tokenizer")
# Encode the image
tokens = image_tokenizer.encode(img)
print("Encoded tokens:", tokens)
# Decode the tokens back to an image
reconstructed = image_tokenizer.decode(tokens)
# Display the reconstructed image
display(reconstructed)
Example: Using MixedModalTokenizer
The package also provides MixedModalTokenizer
for tokenizing and decoding mixed-modal inputs (text and images).
from transformers import AutoTokenizer
from multi_modal_tokenizers import MixedModalTokenizer
from PIL import Image
# Load a pretrained text tokenizer from Hugging Face
text_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
# Create a MixedModalTokenizer
mixed_tokenizer = MixedModalTokenizer(
text_tokenizer=text_tokenizer,
image_tokenizer=image_tokenizer,
device="cpu"
)
# Example usage
text = "This is an example with <image> in the middle."
img_path = "path/to/your/image.jpg"
image = Image.open(img_path)
# Encode the text and image
encoded = mixed_tokenizer.encode(text=text, images=[image])
print("Encoded mixed-modal tokens:", encoded)
# Decode the sequence back to text and image
decoded_text, decoded_images = mixed_tokenizer.decode(encoded)
print("Decoded text:", decoded_text)
for idx, img in enumerate(decoded_images):
img.save(f"decoded_image_{idx}.png")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file multi_modal_tokenizers-0.0.2.tar.gz
.
File metadata
- Download URL: multi_modal_tokenizers-0.0.2.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d93f2b053af24f84b83f9883eaad6eafd237fef567a3ff62ec0f7611bca536b5 |
|
MD5 | 4418859e6ab87416439fd07634b217a1 |
|
BLAKE2b-256 | 5d50af0d67760762fa93c0b7b353d2b79cb3d64a019eb00b7c30a13c8a76bda7 |
File details
Details for the file multi_modal_tokenizers-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: multi_modal_tokenizers-0.0.2-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba903ef029e7b6f7bf4940ba93d5e0837acf9b058df7a157808f0b89aaf3ec12 |
|
MD5 | a89c91b821c38f4a2c27bdf90e9e5110 |
|
BLAKE2b-256 | 724f8265c4055c304e325ac20f64e722a19d0e2ce6ac92ac62e40476157f6b64 |