Skip to main content

A package for creating audio tokens

Project description

Audiotoken

Tokenize audio to get acoustic and semantic tokens.

Installation

pip install audiotoken

Usage

Encoding

You can either use an acoustic or semantic encoder to encode audio and get tokens.

from pathlib import Path
from audiotoken import AudioToken, Tokenizers
encoder = AudioToken(tokenizer=Tokenizers.acoustic, device='cuda:0')
encoded_audio = encoder.encode(Path('path/to/audio.wav'))

There are 1 acoustic and 2 semantic tokenizers available:

  1. Tokenizers.acoustic
  2. Tokenizers.semantic_s (Small)
  3. Tokenizers.semantic_m (Medium)

Decoding

You can decode acoustic tokens like this:

from pathlib import Path
from audiotoken import AudioToken, Tokenizers

tokenizer = AudioToken(tokenizer=Tokenizers.acoustic, device='cuda:0')
encoded_audio = tokenizer.encode(Path('path/to/audio.wav'))
decoded_audio = tokenizer.decode(encoded_audio)

# Save the decoded audio and compare it with the original audio
import torch
import torchaudio
torchaudio.save(
    'reconstructed.wav',
    decoded_audio,
    sample_rate=24000
)

You can decode semantic tokens like this:

from pathlib import Path
from audiotoken import AudioToken, Tokenizers

semantic_tokenizer = AudioToken(tokenizer=Tokenizers.semantic_s, device='cuda:0')
semantic_toks = semantic_tokenizer.encode(Path('path/to/audio.wav'))
acoustic_toks = semantic_tokenizer.decode(semantic_toks)

acoustic_tokenizer = AudioToken(tokenizer=Tokenizers.acoustic, device='cuda:0')
decoded_audio = acoustic_tokenizer.decode(acoustic_toks)

# Save the decoded audio and compare it with the original audio
import torch
import torchaudio
torchaudio.save(
    'reconstructed.wav',
    decoded_audio,
    sample_rate=24000
)

See examples/usage.ipynb for more usage examples.

APIs

Core class

from audiotoken import AudioToken, Tokenizers
tokenizer = AudioToken(tokenizer=Tokenizers.semantic_m, device='cuda:0')

See audiotoken/core.py for complete documentation of APIs.

There are 3 APIs provided:

  1. tokenizer.encode: Encode single audio files/arrays at a time
  2. tokenizer.encode_batch_files: Encode multiple audio files in batches and save them to disk directly
    1. NOTE: encode_batch_files is not safe to run multiple times on the same list of files as it can result in incorrect data. This will be fixed in a future release.
  3. tokenizer.decode: Decode acoustic/semantic tokens. Note: Semantic tokens are decoded to acoustic tokens

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audiotoken-0.3.0.tar.gz (38.8 kB view details)

Uploaded Source

Built Distribution

audiotoken-0.3.0-py3-none-any.whl (48.8 kB view details)

Uploaded Python 3

File details

Details for the file audiotoken-0.3.0.tar.gz.

File metadata

  • Download URL: audiotoken-0.3.0.tar.gz
  • Upload date:
  • Size: 38.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.7

File hashes

Hashes for audiotoken-0.3.0.tar.gz
Algorithm Hash digest
SHA256 92bdad2c02992714b9ee15a2f42d5fce38dc278664453c12b8fe61262b84f1d3
MD5 05a67da3d6da6562195757740f905b94
BLAKE2b-256 b4b2a31087ff8eb7ea954aa02f00119c72751b8c6b50215862a082367446c8ca

See more details on using hashes here.

File details

Details for the file audiotoken-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: audiotoken-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 48.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.7

File hashes

Hashes for audiotoken-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d42390a6f659682d69c622bff2ebf13a4b4c4ba4e6e6d96f4fe91d172ad3d68e
MD5 07076bf18e82f01f10954cf080dcae10
BLAKE2b-256 7445e8c7f439e83ec5e5489a91472d5ee93600f2f0864e9f12ca6a9aee26f200

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page