Skip to main content

No project description provided

Project description

dtokenizer

discretize everything into tokens

Introduction

dtokenizer is a Python library designed to discretize audio files into tokens using various models. It supports models like Hubert and Encodec for tokenization.

Installation

To use dtokenizer, first ensure you have Python and pip installed. Then, install the required dependencies by running:

pip install -r requirements.txt

Usage

Hubert Tokenizer

The Hubert tokenizer can be used to tokenize audio files into discrete tokens and then decode them back. Here's how you can use it:

from dtokenizer.audio.model.hubert_model import HubertTokenizer
import soundfile as sf

ht = HubertTokenizer('hubert_layer6_code100')
code, decodec_stuff = ht.encode_file('./sample2_22k.wav')
wav_values = ht.decode(code)

# Write the decoded audio to a file
sf.write('output.wav', wav_values, 16000)

Encodec Tokenizer

Similarly, the Encodec tokenizer allows for efficient audio file tokenization. Here's an example of its usage:

import torch
from dtokenizer.audio.model.encodec_model import EncodecTokenizer
import torchaudio

et = EncodecTokenizer('encodec_24k_6bps')
code, stuff_for_decode = et.encode_file('./sample2_22k.wav')
wav_values = et.decode(stuff_for_decode)

# Save the decoded audio to a file
torchaudio.save('output.wav', torch.from_numpy(wav_values), 22050)

Contributing

We welcome contributions to the dtokenizer project. Please feel free to submit issues or pull requests.

License

This project is released under the MIT License. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dtokenizer-0.0.6.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

dtokenizer-0.0.6-py3-none-any.whl (20.3 kB view details)

Uploaded Python 3

File details

Details for the file dtokenizer-0.0.6.tar.gz.

File metadata

  • Download URL: dtokenizer-0.0.6.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for dtokenizer-0.0.6.tar.gz
Algorithm Hash digest
SHA256 d800c475ecaf2ec6f1d1fc0060d2aadd4f218556213a8dfea4c1869253eb473c
MD5 51cbf0916a031ad1927d19a5515030cb
BLAKE2b-256 4ac1b0f69b87d91ba84a46bfe7b925fbc51326590493f938bbc48c090ca2be03

See more details on using hashes here.

File details

Details for the file dtokenizer-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: dtokenizer-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 20.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for dtokenizer-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 679388a733fe94f3137bce77102bd2d6787f10d48457c6f6c24ec3f3139acf05
MD5 5c5805bef6b63013496a0208b7f7c210
BLAKE2b-256 3aca7ba55ac54043c8a475c2dcc32b778a8de32953eeccb4d6aa005aa916d1e7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page