Skip to main content

a simple audio feature extraction tool

Project description

Audio Feature Extractor

A simple and extensible tool for extracting audio features, designed for speech and audio experiments.

Features

  • Extracts traditional audio features (pitch, timbre, loudness, etc.)
  • Supports English and Chinese G2P (grapheme-to-phoneme) conversion
  • Embedding and semantic feature extraction (e.g., speaker embeddings, ASR, etc.)
  • Modular design for easy extension

Installation

  1. Clone the repository:

    git clone <your-repo-url>
    cd audio_feature_extractor
    
  2. Install dependencies:

    pip install .
    

    Note: For GPU support, install the appropriate version of PyTorch for your CUDA version before running pip install .. See PyTorch Get Started for details.

Usage

from audio_feature_extractor import FeatureExtractor

extractor = FeatureExtractor()
audio_path = "tmp/test.wav"
features = extractor.extract_features(audio_path)
print(features)

Project Structure

src/audio_feature_extractor/
    extractor.py
    features/
        traditional.py
        embedding.py
        semantic.py
    g2p/

Notes

  • This tool is intended for research and experimental use.
  • For large models or neural network weights, see the documentation for download instructions.
  • If you encounter CUDA/cuDNN or dependency issues, please refer to the Troubleshooting section below.

Troubleshooting

  • CUDA/cuDNN errors:
    Make sure your environment variables (e.g., LD_LIBRARY_PATH) are set correctly and that you have installed the correct CUDA/cuDNN versions.
  • PyTorch version:
    Install the correct PyTorch version for your hardware and CUDA version before installing this package.

License

MIT

Changelog

v0.1.0

  • Initial release: basic usage for English speech feature extraction.

Feel free to contribute or open issues!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audio_feature_extractor-0.1.1.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

audio_feature_extractor-0.1.1-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file audio_feature_extractor-0.1.1.tar.gz.

File metadata

File hashes

Hashes for audio_feature_extractor-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5e7e03b1c17604d23a2d853fda416a69c76e0388410949f43ac0e5c22c7d219d
MD5 40d368f659bc8fb7b4ea435f0b5d4559
BLAKE2b-256 30edd1d76cc639932781688427359e7be56c042ac2ca9dfa995d343f4ab4d778

See more details on using hashes here.

File details

Details for the file audio_feature_extractor-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for audio_feature_extractor-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 34ea8c5d07e4c429271eed5f6d99f440e0dcce529c2e76db9a4a59dc7e0f7cd6
MD5 c396d92d903bcbf8a44d31c6ef699d04
BLAKE2b-256 f4a44ff53ca01ebd90119dd00067522d07145fa85ea29811387f7e0545282b97

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page