Skip to main content

No project description provided

Project description

dtokenizer

discretize everything into tokens

Introduction

dtokenizer is a Python library designed to discretize audio files into tokens using various models. It supports models like Hubert and Encodec for tokenization.

Installation

To use dtokenizer, first ensure you have Python and pip installed. Then, install the required dependencies by running:

pip install -r requirements.txt

Usage

Hubert Tokenizer

The Hubert tokenizer can be used to tokenize audio files into discrete tokens and then decode them back. Here's how you can use it:

from dtokenizer.audio.model.hubert_model import HubertTokenizer
import soundfile as sf

ht = HubertTokenizer('hubert_layer6_code100')
code, decodec_stuff = ht.encode_file('./sample2_22k.wav')
wav_values = ht.decode(code)

# Write the decoded audio to a file
sf.write('output.wav', wav_values, 16000)

Encodec Tokenizer

Similarly, the Encodec tokenizer allows for efficient audio file tokenization. Here's an example of its usage:

import torch
from dtokenizer.audio.model.encodec_model import EncodecTokenizer
import torchaudio

et = EncodecTokenizer('encodec_24k_6bps')
code, stuff_for_decode = et.encode_file('./sample2_22k.wav')
wav_values = et.decode(stuff_for_decode)

# Save the decoded audio to a file
torchaudio.save('output.wav', torch.from_numpy(wav_values), 22050)

Contributing

We welcome contributions to the dtokenizer project. Please feel free to submit issues or pull requests.

License

This project is released under the MIT License. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dtokenizer-0.0.4.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

dtokenizer-0.0.4-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file dtokenizer-0.0.4.tar.gz.

File metadata

  • Download URL: dtokenizer-0.0.4.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for dtokenizer-0.0.4.tar.gz
Algorithm Hash digest
SHA256 6eb6475d4663a9bc7bb95dea6b0d97873d97db1f87b0b66207a103e8b11476fd
MD5 e608917eaf46e7c368314f1fb66a85ab
BLAKE2b-256 fcad397e7af6727a5d92ef162a09c27a555a4d462b5e3829fc7fb5eaabdc0d28

See more details on using hashes here.

File details

Details for the file dtokenizer-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: dtokenizer-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 20.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for dtokenizer-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5b2e3dc2f56d20327c5b80fe679e3f1ae4f7b906fca11cc3be77af51ce84307b
MD5 9688d09d5b01882d25dec27b90d5d68e
BLAKE2b-256 91616f775130b3e56aa1ba628496587a33c28fb198c654e1e2df127b07620117

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page