Skip to main content

This package helps to replace accented characters with their corresponding non-accented ascii characters

Project description

Often, we encounter data that includes special characters with accents or diacritical marks collectively referred to as diacritics. When working with this data, there's often a need to substitute these accented characters with their equivalent non-accented ASCII counterparts.

The exciting news is that this Python package simplifies the process of replacing accented characters with their non-accented ASCII equivalents.

This Python package is compatible with both standard Python and can seamlessly integrate with Pyspark and Spark SQL for your data processing needs.

The package can be installed from the PyPi repository using the below command

pip install replace_accents

Let's delve into detailed examples

1. Python example

# Import the replace accents function
from replace_accents import replace_accents_characters

# Use the function to replace accent characters
replace_accents_characters("crème de la crème")

2. Pyspark example

# Import the replace accents function
from replace_accents import replace_accents_characters

# Import Pyspark col function
from pyspark.sql.functions import col

# Register python function as Pyspark UDF and Spark SQL UDF
replace_accents_characters_pyspark_udf = spark.udf.register("replace_accents_characters_sparksql_udf", replace_accents_characters)

# Create Pyspark Dataframe
df = spark.table("table_name")

# Use Pyspark UDF on the Pyspark dataframe
display(df.select("col1", replace_accents_characters_pyspark_udf(col("col1")).alias("replaced_col1")))

3. Spark SQL example

# Import the replace accents function
from replace_accents import replace_accents_characters

# Register python function as Pyspark UDF and Spark SQL UDF
replace_accents_characters_pyspark_udf = spark.udf.register("replace_accents_characters_sparksql_udf", replace_accents_characters)

# Use Spark SQL UDF in the SQL query
spark.sql("select col1, replace_accents_characters_sparksql_udf(col1) as replaced_col1 from table")

You can get more information about this package here

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

replace_accents-0.0.5.tar.gz (3.3 kB view details)

Uploaded Source

Built Distribution

replace_accents-0.0.5-py3-none-any.whl (3.8 kB view details)

Uploaded Python 3

File details

Details for the file replace_accents-0.0.5.tar.gz.

File metadata

  • Download URL: replace_accents-0.0.5.tar.gz
  • Upload date:
  • Size: 3.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for replace_accents-0.0.5.tar.gz
Algorithm Hash digest
SHA256 c96b1c8c5d996e7a7b28975407c17710b5070bc256a1bfc83a6226aa0fd86979
MD5 0d2847ec1ed5db28b4fbf54dc5596e83
BLAKE2b-256 b2e11ea7a63e0609b3127e1258f9d5b9a4a4045f22e2b4e2998beee1b7f4ab49

See more details on using hashes here.

File details

Details for the file replace_accents-0.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for replace_accents-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 59ad12519d964cdbfd517d0e91ccb1752692f853626ab5362e106724b210c5ca
MD5 f3f1189da0fff8aec992e689f529d648
BLAKE2b-256 08c424ac5d7594dc13b1e062cec4c83f1ca03f6bd3a5ecf7cfb42bf76d72ecde

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page