Skip to main content

Add your description here

Project description

📦 emota_loader — Python Dataloader for EmoTa Dataset

EmoTa: A Tamil Emotional Speech Dataset (Thevakumar et al., CHiPSAL 2025) is the first open-access emotional speech corpus in Tamil, designed to capture the dialectal diversity of Sri Lankan Tamil speakers[^1].

Statistic Value
Utterances 936 (22 speakers × 19 sentences × 5 emotions)
Speakers 22 native Sri Lankan Tamil (11 male, 11 female)
Sentences 19 semantically neutral sentences
Emotions angry, happy, sad, fear, neutral
Inter-annotator Agreement Fleiss’ Kappa = 0.74
Baseline F1 Scores XGBoost: 0.91, Random Forest: 0.90

🔧 Installation

You can install the package from PyPI using:

pip install emota_loader

Make sure to download the EmoTa dataset separately and point the loader to its root directory.


🚀 Sample Usage

from emota_loader import EmoTaDataset

# Point to extracted dataset root folder
dataset = EmoTaDataset(root_dir="path/to/EmoTa").samples

print(f"Loaded {len(dataset)} samples")

sample = dataset[0]
print(f"  Audio Path      : {sample.audio_path}")
print(f"  Speaker ID      : {sample.speaker_id}")
print(f"  Speaker Gender  : {sample.speaker_gender}")
print(f"  Speaker Age     : {sample.speaker_age}")
print(f"  Speaker Region  : {sample.speaker_region}")
print(f"  Sentence ID     : {sample.sentence_id}")
print(f"  Transcript      : {sample.transcript}")
print(f"  Emotion         : {sample.emotion}")

Example Output

Loaded 936 samples

  Audio Path      : EmoTa/19_18_ang.wav
  Speaker ID      : 19
  Speaker Gender  : male
  Speaker Age     : 25
  Speaker Region  : northern
  Sentence ID     : 18
  Transcript      : நான் உன்னை சந்திக்க வேண்டும்.
  Emotion         : angry

📄 Citation

Please cite the dataset as:

@inproceedings{thevakumar-etal-2025-emota,
  title = "{E}mo{T}a: A {T}amil Emotional Speech Dataset",
  author = "Thevakumar, Jubeerathan and Thavarasa, Luxshan and Sivatheepan, Thanikan and Kugarajah, Sajeev and Thayasivam, Uthayasanker",
  booktitle = "Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)",
  year = "2025",
  pages = "193--201",
  address = "Abu Dhabi, UAE",
  publisher = "International Committee on Computational Linguistics"
}

📘 License

Academic use only — see the EmoTa dataset license for details.


[^1]: Thevakumar, J., Thavarasa, L., et al. (2025). EmoTa: A Tamil Emotional Speech Dataset. Proceedings of CHiPSAL 2025.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

emota_loader-2.0.0.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

emota_loader-2.0.0-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file emota_loader-2.0.0.tar.gz.

File metadata

  • Download URL: emota_loader-2.0.0.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for emota_loader-2.0.0.tar.gz
Algorithm Hash digest
SHA256 fdfbb340584e203df280f72910683e7c1e538dbbac25fc1f19832e3b09aebc96
MD5 50c48c168d2a2ab7de8115323077dfcd
BLAKE2b-256 a08b3b9d8ad3154fa9b52c8af7521f82daf631e11bf8303f57eaf586662657e1

See more details on using hashes here.

Provenance

The following attestation bundles were made for emota_loader-2.0.0.tar.gz:

Publisher: publish.yml on aaivu/EmoTa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file emota_loader-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: emota_loader-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for emota_loader-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 888ce4f0b2d4049f80c32f97a67940652aa835c8a04953d8252c7cf5954e4688
MD5 7c95729d11c00c8f4389a8e7ac3a2b3c
BLAKE2b-256 85fef28c98a581bb1036ae8f261c1052dc35329956a7dee60f8852ea3e58f34b

See more details on using hashes here.

Provenance

The following attestation bundles were made for emota_loader-2.0.0-py3-none-any.whl:

Publisher: publish.yml on aaivu/EmoTa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page