Skip to main content

Computes phonetic representation of French words

Project description

PyPI version PylintPython package Python versionsLicense: MIT

phonetic-fr

A Soundex-Like Phonetic Algorithm in Python for the French Language

For multilanguage phonetic comparison of words, see https://github.com/gaspardpetit/phonetic_distance-py

Purpose

phonetic-fr implements a Soundex phonetic algorithm, used to compare words by their sound when pronounced in French. The algorithm is particularly useful for tasks such as matching similar-sounding words, especially in cases where the spelling might vary.

How to install

pip install phonetic-fr

Usage in shell

echo "Le ver vert glisse vers le verre" | phonetic_fr

Prints:

L VER VER GLIS VER L VER

Usage in Python

from phonetic_fr import phonetic

# Obtain phonetic representation of a word
example = "python"
result = phonetic(example)
print(f"{example} -> {result}")

Prints

python -> PITON

Phonetic results can be used to compare similar sounding words:

from phonetic_fr import phonetic

# Compare two names with sounding alike
are_alike = phonetic("Gilles") == phonetic("Jill")
print(f"Gilles sounds like Jill: {are_alike}")

Prints

Gilles sounds like Jill: True
from Levenshtein import distance
from phonetic_fr import phonetic

# Improve Levenshtein's distance
word_a = "drapeau"
word_b = "crapaud"
raw_distance = distance(word_a, word_b)
print(f"Levenshtein distance of '{word_a}' and '{word_b}': {raw_distance}")
phonetic_distance = distance(phonetic(word_a), phonetic(word_b))
print(f"Phonetic Levenshtein distance of '{word_a}' and '{word_b}': {phonetic_distance}")

Prints

Levenshtein distance of 'drapeau' and 'crapaud': 3
Phonetic Levenshtein distance of 'drapeau' and 'crapaud': 1

Description

phonetic-fr is a phonetic algorithm for the French language, similar to the Soundex algorithm used for English. Here is a summary of its functionality:

  • Accent and Case Normalization: The function starts by normalizing accented characters to their unaccented counterparts and converting lowercase letters to uppercase.

  • Letter Filtering: It removes any characters that are not alphabetic letters from A to Z.

  • Pre-processing: The script applies a series of specific pre-processing rules to handle particular letter combinations and sequences, such as converting 'OO' to 'OU', handling silent letters, and adjusting for certain phonetic sounds. These rules are implemented using regular expressions.

  • Special Cases: The function has hardcoded responses for certain words, such as "TABAC" returning "TABA", ensuring their unique phonetic codes.

  • Main Phonetic Transformation: The main body of the function uses a series of regular expressions to transform the input string into its phonetic equivalent. This includes handling nasal sounds, silent letters, and specific letter combinations that change their pronunciation in certain contexts.

  • Post-processing: After the main transformations, the function performs additional post-processing to refine the phonetic code. This includes removing certain terminal letter sequences, further reducing letter repetitions, and other adjustments to align with French phonetics.

  • Terminations: The function applies final rules to the end of the phonetic code, such as trimming certain letters from the end of the word.

  • Output: The function returns a phonetic code representing the input string, with a maximum length of 16 characters. If the resulting code is a single letter 'O', it is returned as is. For very short words that may have lost their distinctiveness during processing, the function may revert to earlier saved states of the input string to provide a more accurate phonetic code.

License

phonetic-fr is released under the MIT license. Feel free to use, modify, and distribute it according to the terms of the license.

Credits

Changelog

Changes over the original port are being tracked in the Changelog

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phonetic_fr-1.0.2.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phonetic_fr-1.0.2-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file phonetic_fr-1.0.2.tar.gz.

File metadata

  • Download URL: phonetic_fr-1.0.2.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for phonetic_fr-1.0.2.tar.gz
Algorithm Hash digest
SHA256 f976ac09206d5e9fd3351c79f7ffcde408e3e81aedff439e14ae88286d336890
MD5 b192fc1ad7bfdfa158b7e736cde9b15c
BLAKE2b-256 b096920ceebb4976e6e24b333b32c9aa05e7483d75c54d519f8964de177f1b9a

See more details on using hashes here.

Provenance

The following attestation bundles were made for phonetic_fr-1.0.2.tar.gz:

Publisher: python-publish.yml on gaspardpetit/phonetic_fr-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file phonetic_fr-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: phonetic_fr-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for phonetic_fr-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e5a1a8ad2d9349c63f9e64d80e01d267b840ded1497178bb08a0fb7219bd0690
MD5 1be2374c6c5f4a7015a9212adbafb0d8
BLAKE2b-256 e116813ee1db156c80c3509a69cd306dbffbcc2e6bf702c397c1e87ed28ba907

See more details on using hashes here.

Provenance

The following attestation bundles were made for phonetic_fr-1.0.2-py3-none-any.whl:

Publisher: python-publish.yml on gaspardpetit/phonetic_fr-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page