Paralinguistic Event Classification from Diarized Speaker Segments

These details have not been verified by PyPI

Project links

Project description

PEC-DSS 🎵🔊

Paralinguistic Event Classification from Diarized Speaker Segments

PEC-DSS is an advanced audio analysis system that identifies paralinguistic vocal events (like laughter, sighs, etc.) and attributes them to specific speakers through sophisticated speaker diarization and neural audio processing.

✨ Features

🎙️ Advanced speaker identification using neural audio encoders
😀 Attribution of paralinguistic events to specific speakers
🔍 High-accuracy SNAC (Scalable Neural Audio Codec) model integration
🔊 Voice embedding and similarity-based speaker matching
📊 Comprehensive audio codebook analysis
🔄 Modular architecture for easy customization

🚀 Installation

From PyPI

pip install pec-dss

From Source with pip

git clone https://github.com/hwk06023/PEC-DSS.git
cd PEC-DSS
pip install -e .

From Source with requirements.txt

git clone https://github.com/hwk06023/PEC-DSS.git
cd PEC-DSS
pip install -r requirements.txt

For development:

pip install -r requirements-dev.txt

📖 Quick Start

Basic Usage

from snac_model import load_snac_model
from audio_encoder import get_codebook_vectors
from speaker_identification import assign_speakers_to_laughs
import librosa

# Load SNAC model
snac_model = load_snac_model(device="cpu")  # or "cuda" for GPU

# Prepare speaker reference samples
speaker_samples = {
    "speaker1": [audio1, audio2],  # Audio waveforms as numpy arrays
    "speaker2": [audio3, audio4]
}

# Process unidentified audio events
unidentified_events = [event1, event2]  # Audio waveforms as numpy arrays

# Identify speakers for each audio event
results = assign_speakers_to_laughs(speaker_samples, unidentified_events, snac_model)

# Print results
for speaker, events in results.items():
    print(f"Speaker {speaker} has {len(events)} attributed events")

CLI Usage

pec-dss --speakers-dir ./speakers --unidentified-dir ./events --output-dir ./results

📁 Directory Structure

PEC-DSS expects a specific directory structure for processing audio files:

Speaker Reference Structure

speakers_directory/
   ├── speaker_A/       # Each speaker's name becomes their ID
   │   ├── audio1.wav   # Reference voice samples for this speaker
   │   ├── audio2.wav
   │   └── ...
   ├── speaker_B/
   │   ├── audio1.wav
   │   └── ...
   └── speaker_C/
       ├── audio1.wav
       └── ...

Unidentified Audio Structure

unidentified_directory/
   ├── laugh1.wav      # Non-linguistic vocal events to be classified
   ├── giggle1.wav
   └── ...

Output Structure (After Processing)

output_directory/
   ├── results.json           # JSON file with all results
   ├── speaker_A/             # Files assigned to each speaker
   │   ├── 0_laugh1.wav
   │   └── ...
   ├── speaker_B/
   │   ├── 0_giggle1.wav
   │   └── ...
   └── unknown/               # Files below similarity threshold (if any)
       └── ...

🧩 System Architecture

PEC-DSS consists of the following components:

snac_model.py: SNAC model initialization and management
audio_encoder.py: Audio encoding and vectorization
codebook_analysis.py: Statistical analysis of audio codebooks
speaker_identification.py: Speaker identification algorithms
main.py: Integration and execution framework

🔊 Audio Event Types

The system can identify various paralinguistic events including:

Laughter
Sighs
Crying
Coughing
Other non-verbal vocal expressions

Note: PEC-DSS does not automatically classify these event types. It only determines which speaker produced the audio event.

🚀 Future Developments

🧠 Integration with more audio encoder models
😢 Expanded paralinguistic event recognition
🎵 Emotional tone classification
⚡ Performance optimization for real-time processing

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the GNU General Public License v3.0.

🙏 Acknowledgements

SNAC - Scalable Neural Audio Codec
HuggingFace Transformers - Machine learning tools
Llama - Language models for text processing

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Apr 22, 2025

0.1.0

Apr 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pec_dss-0.1.1.tar.gz (34.1 kB view details)

Uploaded Apr 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pec_dss-0.1.1-py3-none-any.whl (18.2 kB view details)

Uploaded Apr 22, 2025 Python 3

File details

Details for the file pec_dss-0.1.1.tar.gz.

File metadata

Download URL: pec_dss-0.1.1.tar.gz
Upload date: Apr 22, 2025
Size: 34.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for pec_dss-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`0de60fb9634825146613eed6199a90043369681bd997a38cef53a78fb61c6f08`
MD5	`de017287c3a9e3c3474d45b25c272e24`
BLAKE2b-256	`4cb8f4eb1ba2fc894efeebeb7c2f4c2fd6e64df657bfddf43e85c4fa3088ecfc`

See more details on using hashes here.

File details

Details for the file pec_dss-0.1.1-py3-none-any.whl.

File metadata

Download URL: pec_dss-0.1.1-py3-none-any.whl
Upload date: Apr 22, 2025
Size: 18.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for pec_dss-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7b35cb6c0c6f3d187591602a03e652c052c5983a12e167be3998239e849e7c49`
MD5	`09b52d2e2c483ca0a2254c6ca5468c17`
BLAKE2b-256	`cf0979f7fc8bd0b8b49d3903f3b8ea84ad1801eeb540262b36a72ccc70dff403`

See more details on using hashes here.

pec-dss 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PEC-DSS 🎵🔊

✨ Features

🚀 Installation

From PyPI

From Source with pip

From Source with requirements.txt

📖 Quick Start

Basic Usage

CLI Usage

📁 Directory Structure

Speaker Reference Structure

Unidentified Audio Structure

Output Structure (After Processing)

🧩 System Architecture

🔊 Audio Event Types

🚀 Future Developments

🤝 Contributing

📄 License

🙏 Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes