Add your description here
Project description
📦 emota_loader — Python Dataloader for EmoTa Dataset
EmoTa: A Tamil Emotional Speech Dataset (Thevakumar et al., CHiPSAL 2025) is the first open-access emotional speech corpus in Tamil, designed to capture the dialectal diversity of Sri Lankan Tamil speakers[^1].
| Statistic | Value |
|---|---|
| Utterances | 936 (22 speakers × 19 sentences × 5 emotions) |
| Speakers | 22 native Sri Lankan Tamil (11 male, 11 female) |
| Sentences | 19 semantically neutral sentences |
| Emotions | angry, happy, sad, fear, neutral |
| Inter-annotator Agreement | Fleiss’ Kappa = 0.74 |
| Baseline F1 Scores | XGBoost: 0.91, Random Forest: 0.90 |
🔧 Installation
You can install the package from PyPI using:
pip install emota_loader
Make sure to clone/download the EmoTa dataset separately and point the loader to its root directory.
🚀 Sample Usage
from emota_loader import EmoTaDataset
dataset = EmoTaDataset(root_dir="path/to/EmoTa").samples
print(f"Loaded {len(dataset)} samples")
sample = dataset[0]
print(f" Audio Path : {sample.audio_path}")
print(f" Speaker ID : {sample.speaker_id}")
print(f" Speaker Gender : {sample.speaker_gender}")
print(f" Speaker Age : {sample.speaker_age}")
print(f" Speaker Region : {sample.speaker_region}")
print(f" Sentence ID : {sample.sentence_id}")
print(f" Transcript : {sample.transcript}")
print(f" Emotion : {sample.emotion}")
Example Output
Loaded 936 samples
Audio Path : EmoTa/19_18_ang.wav
Speaker ID : 19
Speaker Gender : male
Speaker Age : 25
Speaker Region : northern
Sentence ID : 18
Transcript : நான் உன்னை சந்திக்க வேண்டும்.
Emotion : angry
📄 Citation
Please cite the dataset as:
@inproceedings{thevakumar-etal-2025-emota,
title = "{E}mo{T}a: A {T}amil Emotional Speech Dataset",
author = "Thevakumar, Jubeerathan and Thavarasa, Luxshan and Sivatheepan, Thanikan and Kugarajah, Sajeev and Thayasivam, Uthayasanker",
booktitle = "Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)",
year = "2025",
pages = "193--201",
address = "Abu Dhabi, UAE",
publisher = "International Committee on Computational Linguistics"
}
📘 License
Academic use only — see the EmoTa dataset license for details.
[^1]: Thevakumar, J., Thavarasa, L., et al. (2025). EmoTa: A Tamil Emotional Speech Dataset. Proceedings of CHiPSAL 2025.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file emota_loader-0.1.0.tar.gz.
File metadata
- Download URL: emota_loader-0.1.0.tar.gz
- Upload date:
- Size: 4.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
969b7ed2d21b7066f5036d9f4467ee425b65a43711b35f705885ad6705e30c80
|
|
| MD5 |
c32c96bba0c700c8d1e72adc73180e2a
|
|
| BLAKE2b-256 |
08523f24ed93f2cfb7a62de35c0278b49891562c5c0c281b341bc7767aa51160
|
Provenance
The following attestation bundles were made for emota_loader-0.1.0.tar.gz:
Publisher:
publish.yml on aaivu/EmoTa
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
emota_loader-0.1.0.tar.gz -
Subject digest:
969b7ed2d21b7066f5036d9f4467ee425b65a43711b35f705885ad6705e30c80 - Sigstore transparency entry: 341048221
- Sigstore integration time:
-
Permalink:
aaivu/EmoTa@92bae28b9bce9a5315c18cba82fc8b8be3db6edf -
Branch / Tag:
refs/heads/main - Owner: https://github.com/aaivu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@92bae28b9bce9a5315c18cba82fc8b8be3db6edf -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file emota_loader-0.1.0-py3-none-any.whl.
File metadata
- Download URL: emota_loader-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1737e1018c0fd4a8d660919dd8f7aaeac3912c2665093999f9731ee1a70e79b1
|
|
| MD5 |
918e74c7cee06a5badf2a0161299e766
|
|
| BLAKE2b-256 |
d1635f8112088521026ac69d322a5a849c141f3bb3055197c7131ada5532c5a0
|
Provenance
The following attestation bundles were made for emota_loader-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on aaivu/EmoTa
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
emota_loader-0.1.0-py3-none-any.whl -
Subject digest:
1737e1018c0fd4a8d660919dd8f7aaeac3912c2665093999f9731ee1a70e79b1 - Sigstore transparency entry: 341048232
- Sigstore integration time:
-
Permalink:
aaivu/EmoTa@92bae28b9bce9a5315c18cba82fc8b8be3db6edf -
Branch / Tag:
refs/heads/main - Owner: https://github.com/aaivu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@92bae28b9bce9a5315c18cba82fc8b8be3db6edf -
Trigger Event:
workflow_dispatch
-
Statement type: