Skip to main content

A package to evaluate pronunciation difficulty of words in multiple languages

Project description

PronDifficulty

A rule-based system for evaluating pronunciation difficulty of words across multiple languages.

Features

  • Supports English (en), Norwegian Bokmål (nb), Spanish (es), and Italian (it)
  • Provides difficulty scores from 0 (easiest) to 1 (hardest)
  • Analyzes multiple linguistic aspects:
    • Phoneme complexity (40%)
    • Maximum phoneme difficulty (20%)
    • Syllable structure (20%)
    • Phonotactic constraints (10%)
    • Prosodic patterns (5%)
    • Complex phoneme ratio (5%)

Installation

pip install pron-difficulty

Quick Start

from pron_difficulty import PronDifficulty

evaluator = PronDifficulty()

# Single word evaluation
score = evaluator.evaluate("difficult", "en")
print(f"Difficulty score: {score}")  # Example: 0.723

# Batch evaluation
words = ["cat", "difficult", "antidisestablishmentarianism"]
scores = evaluator.evaluate_batch(words, "en")

How It Works

The system analyzes pronunciation difficulty through several components:

1. Phoneme Complexity (40%)

  • Evaluates individual sound difficulty
  • Considers articulatory features (place, manner, voicing)
  • Rates rare or complex phonemes higher (e.g., /θ/, /ð/ in English)

2. Maximum Phoneme Difficulty (20%)

  • Tracks the hardest sound in the word
  • Helps identify words with even one challenging phoneme
  • Example: "th" in "think" increases difficulty even if rest is simple

3. Syllable Structure (20%)

  • Analyzes consonant clusters (e.g., "str" in "string")
  • Evaluates syllable patterns (CV, CVC, CCVC, etc.)
  • More complex structures = higher scores

4. Phonotactic Constraints (10%)

  • Checks if sound combinations follow language rules
  • Penalizes unusual or forbidden sequences
  • Example: "ng" at start of word (uncommon in English)

5. Prosodic Structure (5%)

  • Examines rhythm and stress patterns
  • Analyzes sonority profiles
  • Considers length and complexity of prosodic units

6. Complex Phoneme Ratio (5%)

  • Percentage of difficult phonemes in word
  • Helps differentiate consistently hard words
  • Affects longer words more significantly

Length Scaling

Words get additional difficulty points based on length:

  • ≤3 phonemes: no extra points
  • 4-6 phonemes: +0.2 per extra phoneme
  • 6 phonemes: +0.1 per extra phoneme (max 0.7)

Final Score Adjustment

  • Simple words (score < 0.4): Gentle sigmoid curve
  • Complex words (score ≥ 0.4): Steeper sigmoid + exponential boost for very complex words

Language Support

Each supported language has:

  • Custom phoneme difficulty ratings
  • Language-specific syllable patterns
  • Tailored phonotactic rules
  • Adjusted prosodic analysis

Examples with Explanations

# English examples with typical scores:
"cat" -> 0.123         # Simple CV-C structure, common phonemes
"string" -> 0.567      # Complex onset cluster, but common in English
"rhythm" -> 0.789      # Unusual consonant patterns, no vowel between 'th' and 'm'

# Norwegian examples:
"hei" -> 0.234        # Simple structure, common sounds
"skjønnhet" -> 0.678  # Complex consonant cluster, front rounded vowel

# Spanish examples:
"casa" -> 0.123       # Simple CV-CV structure
"desarrollo" -> 0.456 # Longer but regular structure

# Italian examples:
"ciao" -> 0.234       # Simple structure despite diphthong
"struggere" -> 0.567  # Complex consonant cluster

Contributing

Contributions welcome! Areas for improvement:

  • Additional language support
  • Refined difficulty metrics
  • Enhanced prosodic analysis
  • Performance optimizations

License

This project is licensed under the European Union Public Licence (EUPL) v. 1.2. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pron_difficulty-0.2.0.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pron_difficulty-0.2.0-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file pron_difficulty-0.2.0.tar.gz.

File metadata

  • Download URL: pron_difficulty-0.2.0.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for pron_difficulty-0.2.0.tar.gz
Algorithm Hash digest
SHA256 57cafc2f4ca675030ebaeabc2cded7f3b3f94d2c2eaf2e556adeb2e0af49a1fb
MD5 bcfab476d1aba7d0385fb9f4f6f87f2a
BLAKE2b-256 904c87311444e6465aaed61e0374311179d4ad249a891d237f163986fd820f16

See more details on using hashes here.

File details

Details for the file pron_difficulty-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pron_difficulty-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 32cbfcabe52a6b89dbdb40e511cc0e95f4422ead0a6551e5bc6115d23ae61f78
MD5 7a82c9974df6d8bffc008b6d725574b0
BLAKE2b-256 defac4fc94c07b31d4abb77cb3f9655783d43608f03da7a4f3f9ee61cca6fbe4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page