No project description provided
Project description
dtokenizer
discretize everything into tokens
Introduction
dtokenizer
is a Python library designed to discretize audio files into tokens using various models. It supports models like Hubert and Encodec for tokenization.
Installation
To use dtokenizer
, first ensure you have Python and pip installed. Then, install the required dependencies by running:
pip install -r requirements.txt
Usage
Hubert Tokenizer
The Hubert tokenizer can be used to tokenize audio files into discrete tokens and then decode them back. Here's how you can use it:
from dtokenizer.audio.model.hubert_model import HubertTokenizer
import soundfile as sf
ht = HubertTokenizer('hubert_layer6_code100')
code, decodec_stuff = ht.encode_file('./sample2_22k.wav')
wav_values = ht.decode(code)
# Write the decoded audio to a file
sf.write('output.wav', wav_values, 16000)
Encodec Tokenizer
Similarly, the Encodec tokenizer allows for efficient audio file tokenization. Here's an example of its usage:
import torch
from dtokenizer.audio.model.encodec_model import EncodecTokenizer
import torchaudio
et = EncodecTokenizer('encodec_24k_6bps')
code, stuff_for_decode = et.encode_file('./sample2_22k.wav')
wav_values = et.decode(stuff_for_decode)
# Save the decoded audio to a file
torchaudio.save('output.wav', torch.from_numpy(wav_values), 22050)
Contributing
We welcome contributions to the dtokenizer
project. Please feel free to submit issues or pull requests.
License
This project is released under the MIT License. See the LICENSE file for more details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dtokenizer-0.0.4.tar.gz
.
File metadata
- Download URL: dtokenizer-0.0.4.tar.gz
- Upload date:
- Size: 15.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6eb6475d4663a9bc7bb95dea6b0d97873d97db1f87b0b66207a103e8b11476fd |
|
MD5 | e608917eaf46e7c368314f1fb66a85ab |
|
BLAKE2b-256 | fcad397e7af6727a5d92ef162a09c27a555a4d462b5e3829fc7fb5eaabdc0d28 |
File details
Details for the file dtokenizer-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: dtokenizer-0.0.4-py3-none-any.whl
- Upload date:
- Size: 20.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b2e3dc2f56d20327c5b80fe679e3f1ae4f7b906fca11cc3be77af51ce84307b |
|
MD5 | 9688d09d5b01882d25dec27b90d5d68e |
|
BLAKE2b-256 | 91616f775130b3e56aa1ba628496587a33c28fb198c654e1e2df127b07620117 |