Skip to main content

No project description provided

Project description

dtokenizer

discretize everything into tokens

Introduction

dtokenizer is a Python library designed to discretize audio files into tokens using various models. It supports models like Hubert and Encodec for tokenization.

Installation

To use dtokenizer, first ensure you have Python and pip installed. Then, install the required dependencies by running:

pip install -r requirements.txt

Usage

Hubert Tokenizer

The Hubert tokenizer can be used to tokenize audio files into discrete tokens and then decode them back. Here's how you can use it:

from dtokenizer.audio.model.hubert_model import HubertTokenizer
import soundfile as sf

ht = HubertTokenizer('hubert_layer6_code100')
code, decodec_stuff = ht.encode_file('./sample2_22k.wav')
wav_values = ht.decode(code)

# Write the decoded audio to a file
sf.write('output.wav', wav_values, 16000)

Encodec Tokenizer

Similarly, the Encodec tokenizer allows for efficient audio file tokenization. Here's an example of its usage:

import torch
from dtokenizer.audio.model.encodec_model import EncodecTokenizer
import torchaudio

et = EncodecTokenizer('encodec_24k_6bps')
code, stuff_for_decode = et.encode_file('./sample2_22k.wav')
wav_values = et.decode(stuff_for_decode)

# Save the decoded audio to a file
torchaudio.save('output.wav', torch.from_numpy(wav_values), 22050)

Contributing

We welcome contributions to the dtokenizer project. Please feel free to submit issues or pull requests.

License

This project is released under the MIT License. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dtokenizer-0.0.0.tar.gz (15.0 kB view hashes)

Uploaded Source

Built Distribution

dtokenizer-0.0.0-py3-none-any.whl (16.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page