Skip to main content

Computes phonetic representation of French words

Project description

PyPI version PylintPython package Python versionsLicense: MIT

phonetic-fr

A Soundex-Like Phonetic Algorithm in Python for the French Language

For multilanguage phonetic comparison of words, see https://github.com/gaspardpetit/phonetic_distance-py

Purpose

phonetic-fr implements a Soundex phonetic algorithm, used to compare words by their sound when pronounced in French. The algorithm is particularly useful for tasks such as matching similar-sounding words, especially in cases where the spelling might vary.

How to install

pip install phonetic-fr

Usage in shell

echo "Le ver vert glisse vers le verre" | phonetic_fr

Prints:

L VER VER GLIS VER L VER

Usage in Python

from phonetic_fr import phonetic

# Obtain phonetic representation of a word
example = "python"
result = phonetic(example)
print(f"{example} -> {result}")

Prints

python -> PITON

Phonetic results can be used to compare similar sounding words:

from phonetic_fr import phonetic

# Compare two names with sounding alike
are_alike = phonetic("Gilles") == phonetic("Jill")
print(f"Gilles sounds like Jill: {are_alike}")

Prints

Gilles sounds like Jill: True
from Levenshtein import distance
from phonetic_fr import phonetic

# Improve Levenshtein's distance
word_a = "drapeau"
word_b = "crapaud"
raw_distance = distance(word_a, word_b)
print(f"Levenshtein distance of '{word_a}' and '{word_b}': {raw_distance}")
phonetic_distance = distance(phonetic(word_a), phonetic(word_b))
print(f"Phonetic Levenshtein distance of '{word_a}' and '{word_b}': {phonetic_distance}")

Prints

Levenshtein distance of 'drapeau' and 'crapaud': 3
Phonetic Levenshtein distance of 'drapeau' and 'crapaud': 1

Description

phonetic-fr is a phonetic algorithm for the French language, similar to the Soundex algorithm used for English. Here is a summary of its functionality:

  • Accent and Case Normalization: The function starts by normalizing accented characters to their unaccented counterparts and converting lowercase letters to uppercase.

  • Letter Filtering: It removes any characters that are not alphabetic letters from A to Z.

  • Pre-processing: The script applies a series of specific pre-processing rules to handle particular letter combinations and sequences, such as converting 'OO' to 'OU', handling silent letters, and adjusting for certain phonetic sounds. These rules are implemented using regular expressions.

  • Special Cases: The function has hardcoded responses for certain words, such as "TABAC" returning "TABA", ensuring their unique phonetic codes.

  • Main Phonetic Transformation: The main body of the function uses a series of regular expressions to transform the input string into its phonetic equivalent. This includes handling nasal sounds, silent letters, and specific letter combinations that change their pronunciation in certain contexts.

  • Post-processing: After the main transformations, the function performs additional post-processing to refine the phonetic code. This includes removing certain terminal letter sequences, further reducing letter repetitions, and other adjustments to align with French phonetics.

  • Terminations: The function applies final rules to the end of the phonetic code, such as trimming certain letters from the end of the word.

  • Output: The function returns a phonetic code representing the input string, with a maximum length of 16 characters. If the resulting code is a single letter 'O', it is returned as is. For very short words that may have lost their distinctiveness during processing, the function may revert to earlier saved states of the input string to provide a more accurate phonetic code.

License

phonetic-fr is released under the MIT license. Feel free to use, modify, and distribute it according to the terms of the license.

Credits

Changelog

Changes over the original port are being tracked in the Changelog

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phonetic_fr-1.0.1.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

phonetic_fr-1.0.1-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file phonetic_fr-1.0.1.tar.gz.

File metadata

  • Download URL: phonetic_fr-1.0.1.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for phonetic_fr-1.0.1.tar.gz
Algorithm Hash digest
SHA256 f7a0dae35835b27fd0c29d4fec0898c883d7fb07afbc0d005a9d4f497cdc1212
MD5 008773865a3afcebc1b18eb0a5a1e745
BLAKE2b-256 0d4ae99ae79cfd9735263f71f26957922cba80cc3e33a78da8ad40215da68b73

See more details on using hashes here.

File details

Details for the file phonetic_fr-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: phonetic_fr-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for phonetic_fr-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9801848800552d377022f04eed612f735e0afc90fb4c10c1524f1aae22259a7f
MD5 4fc1de1d62928f5d9d6c869574d00175
BLAKE2b-256 eb3e322550369a6043e6ac3f6c7ab63704694dd44f14a72ad0fb949a208fbcab

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page