Skip to main content

Computes phonetic representation of French words

Project description

PyPI version PylintPython package Python versionsLicense: MIT

phonetic-fr

A Soundex-Like Phonetic Algorithm in Python for the French Language

For multilanguage phonetic comparison of words, see https://github.com/gaspardpetit/phonetic_distance-py

Purpose

phonetic-fr implements a Soundex phonetic algorithm, used to compare words by their sound when pronounced in French. The algorithm is particularly useful for tasks such as matching similar-sounding words, especially in cases where the spelling might vary.

How to install

pip install phonetic-fr

Usage in shell

echo "Le ver vert glisse vers le verre" | phonetic_fr

Prints:

L VER VER GLIS VER L VER

Usage in Python

from phonetic_fr import phonetic

# Obtain phonetic representation of a word
example = "python"
result = phonetic(example)
print(f"{example} -> {result}")

Prints

python -> PITON

Phonetic results can be used to compare similar sounding words:

from phonetic_fr import phonetic

# Compare two names with sounding alike
are_alike = phonetic("Gilles") == phonetic("Jill")
print(f"Gilles sounds like Jill: {are_alike}")

Prints

Gilles sounds like Jill: True
from Levenshtein import distance
from phonetic_fr import phonetic

# Improve Levenshtein's distance
word_a = "drapeau"
word_b = "crapaud"
raw_distance = distance(word_a, word_b)
print(f"Levenshtein distance of '{word_a}' and '{word_b}': {raw_distance}")
phonetic_distance = distance(phonetic(word_a), phonetic(word_b))
print(f"Phonetic Levenshtein distance of '{word_a}' and '{word_b}': {phonetic_distance}")

Prints

Levenshtein distance of 'drapeau' and 'crapaud': 3
Phonetic Levenshtein distance of 'drapeau' and 'crapaud': 1

Description

phonetic-fr is a phonetic algorithm for the French language, similar to the Soundex algorithm used for English. Here is a summary of its functionality:

  • Accent and Case Normalization: The function starts by normalizing accented characters to their unaccented counterparts and converting lowercase letters to uppercase.

  • Letter Filtering: It removes any characters that are not alphabetic letters from A to Z.

  • Pre-processing: The script applies a series of specific pre-processing rules to handle particular letter combinations and sequences, such as converting 'OO' to 'OU', handling silent letters, and adjusting for certain phonetic sounds. These rules are implemented using regular expressions.

  • Special Cases: The function has hardcoded responses for certain words, such as "TABAC" returning "TABA", ensuring their unique phonetic codes.

  • Main Phonetic Transformation: The main body of the function uses a series of regular expressions to transform the input string into its phonetic equivalent. This includes handling nasal sounds, silent letters, and specific letter combinations that change their pronunciation in certain contexts.

  • Post-processing: After the main transformations, the function performs additional post-processing to refine the phonetic code. This includes removing certain terminal letter sequences, further reducing letter repetitions, and other adjustments to align with French phonetics.

  • Terminations: The function applies final rules to the end of the phonetic code, such as trimming certain letters from the end of the word.

  • Output: The function returns a phonetic code representing the input string. If the resulting code is a single letter 'O', it is returned as is. For very short words that may have lost their distinctiveness during processing, the function may revert to earlier saved states of the input string to provide a more accurate phonetic code.

License

phonetic-fr is released under the MIT license. Feel free to use, modify, and distribute it according to the terms of the license.

Credits

Changelog

Changes over the original port are being tracked in the Changelog

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phonetic_fr-1.0.3.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phonetic_fr-1.0.3-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file phonetic_fr-1.0.3.tar.gz.

File metadata

  • Download URL: phonetic_fr-1.0.3.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for phonetic_fr-1.0.3.tar.gz
Algorithm Hash digest
SHA256 30afaf3d059626a535f3c1f1e83185590bc33de157dce20d4b2b3cb8ba18b997
MD5 1402033b0a3312597729f7f73aed6e06
BLAKE2b-256 ce17786abde9ae59abffe5215f2d83a160f9ffc91315fa8b4200b01502e93a1d

See more details on using hashes here.

Provenance

The following attestation bundles were made for phonetic_fr-1.0.3.tar.gz:

Publisher: python-publish.yml on gaspardpetit/phonetic_fr-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file phonetic_fr-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: phonetic_fr-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 11.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for phonetic_fr-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 87f6d43e8d642e106ad71d626fa9d03d119980a9666d833dfc991b73a79d8fb2
MD5 5095d451b5bc467b8a3380881779851e
BLAKE2b-256 10a2865a55927179c209d715d8a4a3e9b6fa146c9a5a049d4f582cf56146452f

See more details on using hashes here.

Provenance

The following attestation bundles were made for phonetic_fr-1.0.3-py3-none-any.whl:

Publisher: python-publish.yml on gaspardpetit/phonetic_fr-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page