Skip to main content

SONATA: SOund and Narrative Advanced Transcription Assistant

Project description

SONATA

SOund and Narrative Advanced Transcription Assistant

SONATA is an advanced Automatic Speech Recognition (ASR) system that captures the symphony of human expression by recognizing and transcribing both verbal content and emotive sounds.

Features

  • High-accuracy speech-to-text transcription
  • Recognition of emotive sounds and non-verbal cues
  • Support for tags like <laugh>, <sigh>, <yawn>, <surprise>, <inhale>, <groan>, <cough>, <sneeze>, <sniffle>
  • Open-source and extensible architecture

Installation

Install the package from PyPI:

pip install sonata-asr

Or install from source:

git clone https://github.com/hwk06023/SONATA.git
cd SONATA
pip install -e .

Usage Examples

Basic Transcription

from sonata import Transcriber

# Initialize the transcriber
transcriber = Transcriber()

# Transcribe an audio file
result = transcriber.transcribe("path/to/audio.wav")
print(result)

Detecting Emotive Sounds

from sonata.core import EmotiveDetector

# Initialize the emotive detector
detector = EmotiveDetector(threshold=0.6)

# Detect emotive events in an audio file
events = detector.detect_events("path/to/audio.wav")

# Print the detected events
for event in events:
    print(f"{event.type}: {event.start_time:.2f}s - {event.end_time:.2f}s (confidence: {event.confidence:.2f})")

Full Pipeline

from sonata import Sonata

# Initialize SONATA with default settings
sonata = Sonata()

# Process audio file - transcribes speech and detects emotive sounds
result = sonata.process("path/to/audio.wav")

# Print the text with emotive tags
print(result.text_with_tags)

# Save the result
sonata.save_output(result, "output.json")

Command Line Interface

SONATA also provides a CLI for quick transcription:

# Basic usage
sonata-asr path/to/audio.wav

# Save output to specific file
sonata-asr path/to/audio.wav --output result.json

# Set threshold for emotive detection
sonata-asr path/to/audio.wav --threshold 0.7

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details. This license ensures that derivative works must also be open source and use the same license.

Acknowledgements

This project leverages the following key open source components:

  • WhisperX - Fast speech recognition with word-level timestamps
  • Laughter-Detection - Automatic detection of laughter in audio files (MIT License)

We are grateful to the developers and contributors of these libraries for their valuable work.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sonata_asr-0.0.2.tar.gz (30.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sonata_asr-0.0.2-py3-none-any.whl (31.8 kB view details)

Uploaded Python 3

File details

Details for the file sonata_asr-0.0.2.tar.gz.

File metadata

  • Download URL: sonata_asr-0.0.2.tar.gz
  • Upload date:
  • Size: 30.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for sonata_asr-0.0.2.tar.gz
Algorithm Hash digest
SHA256 14a2f9896d7ede427ac44565b8664e5c3e0c438568d6f9c667807d87936c1695
MD5 c293363fb25bc2a79c4a45c94d840f7d
BLAKE2b-256 1034d7e8c9e9bd1e100fae64d7fa760926c842ad5a230750887f6b3a3f9e13c0

See more details on using hashes here.

File details

Details for the file sonata_asr-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: sonata_asr-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 31.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for sonata_asr-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 799faafcf218b6414902af2e70e4c77a52148674d9b86872b6e9853477b10117
MD5 aee96ac5db14f3960e22c72964b4a0dc
BLAKE2b-256 886f0fe13912c0b84b9083d26ac95ee24cc755af9c9c12ad9788e34b79fc3e5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page