A package to evaluate pronunciation difficulty of words in multiple languages
Project description
PronDifficulty
A rule-based system for evaluating pronunciation difficulty of words across multiple languages.
Features
- Supports English (en), Norwegian Bokmål (nb), Spanish (es), and Italian (it)
- Provides difficulty scores from 0 (easiest) to 1 (hardest)
- Analyzes multiple linguistic aspects:
- Phoneme complexity (40%)
- Maximum phoneme difficulty (20%)
- Syllable structure (20%)
- Phonotactic constraints (10%)
- Prosodic patterns (5%)
- Complex phoneme ratio (5%)
Installation
pip install pron-difficulty
Quick Start
from pron_difficulty import PronDifficulty
evaluator = PronDifficulty()
# Single word evaluation
score = evaluator.evaluate("difficult", "en")
print(f"Difficulty score: {score}") # Example: 0.723
# Batch evaluation
words = ["cat", "difficult", "antidisestablishmentarianism"]
scores = evaluator.evaluate_batch(words, "en")
How It Works
The system analyzes pronunciation difficulty through several components:
1. Phoneme Complexity (40%)
- Evaluates individual sound difficulty
- Considers articulatory features (place, manner, voicing)
- Rates rare or complex phonemes higher (e.g., /θ/, /ð/ in English)
2. Maximum Phoneme Difficulty (20%)
- Tracks the hardest sound in the word
- Helps identify words with even one challenging phoneme
- Example: "th" in "think" increases difficulty even if rest is simple
3. Syllable Structure (20%)
- Analyzes consonant clusters (e.g., "str" in "string")
- Evaluates syllable patterns (CV, CVC, CCVC, etc.)
- More complex structures = higher scores
4. Phonotactic Constraints (10%)
- Checks if sound combinations follow language rules
- Penalizes unusual or forbidden sequences
- Example: "ng" at start of word (uncommon in English)
5. Prosodic Structure (5%)
- Examines rhythm and stress patterns
- Analyzes sonority profiles
- Considers length and complexity of prosodic units
6. Complex Phoneme Ratio (5%)
- Percentage of difficult phonemes in word
- Helps differentiate consistently hard words
- Affects longer words more significantly
Length Scaling
Words get additional difficulty points based on length:
- ≤3 phonemes: no extra points
- 4-6 phonemes: +0.2 per extra phoneme
-
6 phonemes: +0.1 per extra phoneme (max 0.7)
Final Score Adjustment
- Simple words (score < 0.4): Gentle sigmoid curve
- Complex words (score ≥ 0.4): Steeper sigmoid + exponential boost for very complex words
Language Support
Each supported language has:
- Custom phoneme difficulty ratings
- Language-specific syllable patterns
- Tailored phonotactic rules
- Adjusted prosodic analysis
Examples with Explanations
# English examples with typical scores:
"cat" -> 0.123 # Simple CV-C structure, common phonemes
"string" -> 0.567 # Complex onset cluster, but common in English
"rhythm" -> 0.789 # Unusual consonant patterns, no vowel between 'th' and 'm'
# Norwegian examples:
"hei" -> 0.234 # Simple structure, common sounds
"skjønnhet" -> 0.678 # Complex consonant cluster, front rounded vowel
# Spanish examples:
"casa" -> 0.123 # Simple CV-CV structure
"desarrollo" -> 0.456 # Longer but regular structure
# Italian examples:
"ciao" -> 0.234 # Simple structure despite diphthong
"struggere" -> 0.567 # Complex consonant cluster
Contributing
Contributions welcome! Areas for improvement:
- Additional language support
- Refined difficulty metrics
- Enhanced prosodic analysis
- Performance optimizations
License
This project is licensed under the European Union Public Licence (EUPL) v. 1.2. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pron_difficulty-0.2.0.tar.gz.
File metadata
- Download URL: pron_difficulty-0.2.0.tar.gz
- Upload date:
- Size: 13.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
57cafc2f4ca675030ebaeabc2cded7f3b3f94d2c2eaf2e556adeb2e0af49a1fb
|
|
| MD5 |
bcfab476d1aba7d0385fb9f4f6f87f2a
|
|
| BLAKE2b-256 |
904c87311444e6465aaed61e0374311179d4ad249a891d237f163986fd820f16
|
File details
Details for the file pron_difficulty-0.2.0-py3-none-any.whl.
File metadata
- Download URL: pron_difficulty-0.2.0-py3-none-any.whl
- Upload date:
- Size: 14.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32cbfcabe52a6b89dbdb40e511cc0e95f4422ead0a6551e5bc6115d23ae61f78
|
|
| MD5 |
7a82c9974df6d8bffc008b6d725574b0
|
|
| BLAKE2b-256 |
defac4fc94c07b31d4abb77cb3f9655783d43608f03da7a4f3f9ee61cca6fbe4
|