Skip to main content

Multi-modal tokenizers for more than just text.

Project description

Multi-Modal Tokenizers

Multi-modal tokenizers for more than just text. This package provides tools for tokenizing and decoding images and mixed-modal inputs (text and images) using encoders like DALL-E's VAE.

Installation

To install the package, clone the repository and use pip to install it:

git clone https://github.com/anothy1/multi-modal-tokenizers
pip install ./multi-modal-tokenizers

Or from PyPI:

pip install multi-modal-tokenizers

Usage

Example: Using DalleTokenizer

Below is an example script demonstrating how to use the DalleTokenizer to encode and decode images.

import requests
import PIL
import io
from multi_modal_tokenizers import DalleTokenizer, MixedModalTokenizer
from IPython.display import display

def download_image(url):
    resp = requests.get(url)
    resp.raise_for_status()
    return PIL.Image.open(io.BytesIO(resp.content))

# Download an image
img = download_image('https://assets.bwbx.io/images/users/iqjWHBFdfxIU/iKIWgaiJUtss/v2/1000x-1.jpg')

# Load the DalleTokenizer from Hugging Face repository
image_tokenizer = DalleTokenizer.from_hf("anothy1/dalle-tokenizer")

# Encode the image
tokens = image_tokenizer.encode(img)
print("Encoded tokens:", tokens)

# Decode the tokens back to an image
reconstructed = image_tokenizer.decode(tokens)

# Display the reconstructed image
display(reconstructed)

Example: Using MixedModalTokenizer

The package also provides MixedModalTokenizer for tokenizing and decoding mixed-modal inputs (text and images).

from transformers import AutoTokenizer
from multi_modal_tokenizers import MixedModalTokenizer
from PIL import Image

# Load a pretrained text tokenizer from Hugging Face
text_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Create a MixedModalTokenizer
mixed_tokenizer = MixedModalTokenizer(
    text_tokenizer=text_tokenizer,
    image_tokenizer=image_tokenizer,
    device="cpu"
)

# Example usage
text = "This is an example with <image> in the middle."
img_path = "path/to/your/image.jpg"
image = Image.open(img_path)

# Encode the text and image
encoded = mixed_tokenizer.encode(text=text, images=[image])
print("Encoded mixed-modal tokens:", encoded)

# Decode the sequence back to text and image
decoded_text, decoded_images = mixed_tokenizer.decode(encoded)
print("Decoded text:", decoded_text)
for idx, img in enumerate(decoded_images):
    img.save(f"decoded_image_{idx}.png")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multi_modal_tokenizers-0.0.2.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

multi_modal_tokenizers-0.0.2-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file multi_modal_tokenizers-0.0.2.tar.gz.

File metadata

File hashes

Hashes for multi_modal_tokenizers-0.0.2.tar.gz
Algorithm Hash digest
SHA256 d93f2b053af24f84b83f9883eaad6eafd237fef567a3ff62ec0f7611bca536b5
MD5 4418859e6ab87416439fd07634b217a1
BLAKE2b-256 5d50af0d67760762fa93c0b7b353d2b79cb3d64a019eb00b7c30a13c8a76bda7

See more details on using hashes here.

File details

Details for the file multi_modal_tokenizers-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for multi_modal_tokenizers-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ba903ef029e7b6f7bf4940ba93d5e0837acf9b058df7a157808f0b89aaf3ec12
MD5 a89c91b821c38f4a2c27bdf90e9e5110
BLAKE2b-256 724f8265c4055c304e325ac20f64e722a19d0e2ce6ac92ac62e40476157f6b64

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page