This package helps to replace accented characters with their corresponding non-accented ascii characters
Project description
Often, we encounter data that includes special characters with accents or diacritical marks collectively referred to as diacritics. When working with this data, there's often a need to substitute these accented characters with their equivalent non-accented ASCII counterparts.
The exciting news is that this Python package simplifies the process of replacing accented characters with their non-accented ASCII equivalents.
This Python package is compatible with both standard Python and can seamlessly integrate with Pyspark and Spark SQL for your data processing needs.
The package can be installed from the PyPi repository using the below command
pip install replace_accents
Let's delve into detailed examples
1. Python example
# Import the replace accents function
from replace_accents import replace_accents_characters
# Use the function to replace accent characters
replace_accents_characters("crème de la crème")
2. Pyspark example
# Import the replace accents function
from replace_accents import replace_accents_characters
# Import Pyspark col function
from pyspark.sql.functions import col
# Register python function as Pyspark UDF and Spark SQL UDF
replace_accents_characters_pyspark_udf = spark.udf.register("replace_accents_characters_sparksql_udf", replace_accents_characters)
# Create Pyspark Dataframe
df = spark.table("table_name")
# Use Pyspark UDF on the Pyspark dataframe
display(df.select("col1", replace_accents_characters_pyspark_udf(col("col1")).alias("replaced_col1")))
3. Spark SQL example
# Import the replace accents function
from replace_accents import replace_accents_characters
# Register python function as Pyspark UDF and Spark SQL UDF
replace_accents_characters_pyspark_udf = spark.udf.register("replace_accents_characters_sparksql_udf", replace_accents_characters)
# Use Spark SQL UDF in the SQL query
spark.sql("select col1, replace_accents_characters_sparksql_udf(col1) as replaced_col1 from table")
You can get more information about this package here
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file replace_accents-0.0.5.tar.gz
.
File metadata
- Download URL: replace_accents-0.0.5.tar.gz
- Upload date:
- Size: 3.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c96b1c8c5d996e7a7b28975407c17710b5070bc256a1bfc83a6226aa0fd86979 |
|
MD5 | 0d2847ec1ed5db28b4fbf54dc5596e83 |
|
BLAKE2b-256 | b2e11ea7a63e0609b3127e1258f9d5b9a4a4045f22e2b4e2998beee1b7f4ab49 |
File details
Details for the file replace_accents-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: replace_accents-0.0.5-py3-none-any.whl
- Upload date:
- Size: 3.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59ad12519d964cdbfd517d0e91ccb1752692f853626ab5362e106724b210c5ca |
|
MD5 | f3f1189da0fff8aec992e689f529d648 |
|
BLAKE2b-256 | 08c424ac5d7594dc13b1e062cec4c83f1ca03f6bd3a5ecf7cfb42bf76d72ecde |