Skip to main content

Multi-modal tokenizers for more than just text.

Project description

Multi-Modal Tokenizers

Multi-modal tokenizers for more than just text. This package provides tools for tokenizing and decoding images and mixed-modal inputs (text and images) using DALL-E and other models.

Installation

To install the package, clone the repository and use pip to install it:

git clone https://github.com/anothy1/multi-modal-tokenizers
pip install ./multi-modal-tokenizers

Usage

Example: Using DalleTokenizer

Below is an example script demonstrating how to use the DalleTokenizer to encode and decode images.

import requests
import PIL
import io
from multi_modal_tokenizers import DalleTokenizer, MixedModalTokenizer
from IPython.display import display

def download_image(url):
    resp = requests.get(url)
    resp.raise_for_status()
    return PIL.Image.open(io.BytesIO(resp.content))

# Download an image
img = download_image('https://assets.bwbx.io/images/users/iqjWHBFdfxIU/iKIWgaiJUtss/v2/1000x-1.jpg')

# Load the DalleTokenizer from Hugging Face repository
image_tokenizer = DalleTokenizer.from_hf("anothy1/dalle-tokenizer")

# Encode the image
tokens = image_tokenizer.encode(img)
print("Encoded tokens:", tokens)

# Decode the tokens back to an image
reconstructed = image_tokenizer.decode(tokens)

# Display the reconstructed image
display(reconstructed)

Example: Using MixedModalTokenizer

The package also provides MixedModalTokenizer for tokenizing and decoding mixed-modal inputs (text and images).

from transformers import AutoTokenizer
from multi_modal_tokenizers import MixedModalTokenizer
from PIL import Image

# Load a pretrained text tokenizer from Hugging Face
text_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Create a MixedModalTokenizer
mixed_tokenizer = MixedModalTokenizer(
    text_tokenizer=text_tokenizer,
    image_tokenizer=image_tokenizer,
    device="cpu"
)

# Example usage
text = "This is an example with <new_image> in the middle."
img_path = "path/to/your/image.jpg"
image = Image.open(img_path)

# Encode the text and image
encoded = mixed_tokenizer.encode(text=text, images=[image])
print("Encoded mixed-modal tokens:", encoded)

# Decode the sequence back to text and image
decoded_text, decoded_images = mixed_tokenizer.decode(encoded)
print("Decoded text:", decoded_text)
for idx, img in enumerate(decoded_images):
    img.save(f"decoded_image_{idx}.png")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multi_modal_tokenizers-0.0.1.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

multi_modal_tokenizers-0.0.1-py3-none-any.whl (4.7 kB view details)

Uploaded Python 3

File details

Details for the file multi_modal_tokenizers-0.0.1.tar.gz.

File metadata

File hashes

Hashes for multi_modal_tokenizers-0.0.1.tar.gz
Algorithm Hash digest
SHA256 16cead165ae9e6a99f9eee26fd99e0fba00cf925ffeab0ed4134fac3a5d3d36e
MD5 64ff40c57b5a54033211ef1d9fec963e
BLAKE2b-256 8ccf0b962bd9fb9785ce310c8c1ad861ce5edf2eda4801088f24d5a90494fa0c

See more details on using hashes here.

File details

Details for the file multi_modal_tokenizers-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for multi_modal_tokenizers-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8636ca0ee73a33ce647c2756833bc70984bfd48d1f4b17adf4ec1ed9aed16f95
MD5 ce8358d7938624ac24c500c1d7141874
BLAKE2b-256 6789718f91b147973ec3fd2af97768d5f05a6da58a1c0ccee267f460769004b1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page