Skip to main content

A package for creating audio tokens

Project description

Audiotoken

Tokenize audio to get acoustic and semantic tokens.

Installation

pip install audiotoken

Usage

Encoding

You can either use an acoustic or semantic encoder to encode audio and get tokens.

from pathlib import Path
from audiotoken import AudioToken, Tokenizers
encoder = AudioToken(tokenizer=Tokenizers.acoustic, device='cuda:0')
encoded_audio = encoder.encode(Path('path/to/audio.wav'))

There are 1 acoustic and 2 semantic tokenizers available:

  1. Tokenizers.acoustic
  2. Tokenizers.semantic_s (Small)
  3. Tokenizers.semantic_m (Medium)

Decoding

You can decode acoustic tokens like this:

from pathlib import Path
from audiotoken import AudioToken, Tokenizers

tokenizer = AudioToken(tokenizer=Tokenizers.acoustic, device='cuda:0')
encoded_audio = tokenizer.encode(Path('path/to/audio.wav'))
decoded_audio = tokenizer.decode(encoded_audio)

# Save the decoded audio and compare it with the original audio
import torch
import torchaudio
torchaudio.save(
    'reconstructed.wav',
    decoded_audio,
    sample_rate=24000
)

You can decode semantic tokens like this:

from pathlib import Path
from audiotoken import AudioToken, Tokenizers

semantic_tokenizer = AudioToken(tokenizer=Tokenizers.semantic_s, device='cuda:0')
semantic_toks = semantic_tokenizer.encode(Path('path/to/audio.wav'))
decoded_audio = semantic_tokenizer.decode(semantic_toks)

# Save the decoded audio and compare it with the original audio
import torch
import torchaudio
torchaudio.save(
    'reconstructed.wav',
    decoded_audio,
    sample_rate=24000
)

See examples/usage.ipynb for more usage examples.

APIs

Core class

from audiotoken import AudioToken, Tokenizers
tokenizer = AudioToken(tokenizer=Tokenizers.semantic_m, device='cuda:0')

See audiotoken/core.py for complete documentation of APIs.

There are 3 APIs provided:

  1. tokenizer.encode: Encode single audio files/arrays at a time
  2. tokenizer.encode_batch_files: Encode multiple audio files in batches and save them to disk directly
    1. NOTE: encode_batch_files is not safe to run multiple times on the same list of files as it can result in incorrect data. This will be fixed in a future release.
  3. tokenizer.decode: Decode acoustic/semantic tokens

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audiotoken-0.3.1.tar.gz (31.5 kB view details)

Uploaded Source

Built Distribution

audiotoken-0.3.1-py3-none-any.whl (36.5 kB view details)

Uploaded Python 3

File details

Details for the file audiotoken-0.3.1.tar.gz.

File metadata

  • Download URL: audiotoken-0.3.1.tar.gz
  • Upload date:
  • Size: 31.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.7

File hashes

Hashes for audiotoken-0.3.1.tar.gz
Algorithm Hash digest
SHA256 2be566893858388eac77879a0cbfae8aeac75390b05db69ee773d8c9fdbd7795
MD5 ca4e194d5ea186a700108615a4e73620
BLAKE2b-256 d7ada1fce7449c898c718b904b0d435a5207555d7f0aa8439eff05a990836ee0

See more details on using hashes here.

File details

Details for the file audiotoken-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: audiotoken-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 36.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.7

File hashes

Hashes for audiotoken-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e804767e269c5efeca537e8aa1a57c562aa3914bc82fca2103bfb60bc4bca426
MD5 956de637fa06073a42c0b33f3d470da9
BLAKE2b-256 e0b4903610c0e8d6d2f754b37c797bc2203b0272672ff69cb32ef8681e496397

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page