package to clean and normalize text
Project description
clean_text_rhoni
The clean_text_rhoni
package provides tools to efficiently clean and transform text data. It offers a set of methods and functions for removing special characters, accents, and unnecessary spaces from text, as well as converting text to lowercase and snake case style.
This package is useful for preparing text data for natural language processing tasks, data analysis, and other applications where clean and normalized text is nedeed.
Installation
$ pip install clean_text_rhoni
Usage
This package has 2 main functions to clean a text:
clean_text
function performs a complete text cleaning process on the input text. The cleaning operations include removing leading and trailing spaces, replacing multiple spaces with a single space, converting text to lowercase, removing accents, removing special characters, and removing the tilde from 'ñ'.
clean_text_snake_case
function performs the same comprehensive text cleaning process as clean_text
, and additionally transforms the cleaned text into snake case style by replacing spaces with underscores. This is useful for creating consistent and readable variable or column names.
from clean_text_rhoni import clean_text, clean_text_snake_case
sample_text = "%ábdc efghí %$ñ"
# clean_text()
# run a complete cleaning over a text
cleaned_text = clean_text(sample_text)
print(cleaned_text) # 'abdc efghi n'
# clean_text_snake_case()
# run a complete cleaning over a text and return the result in snake_case style
snake_case_cleaned_text = clean_text_snake_case(sample_text)
print(snake_case_cleaned_text) # abdc_efghi_n
You can also access the BaseCleanText
class and use its methods separately:
from clean_text_rhoni import BaseCleanText
# create a class instance
instance_base_clean_text = BaseCleanText()
# call the chosen method
instance_base_clean_text.remove_accents("áéíóú") #'aeiou'
instance_base_clean_text.replace_underscores_by_spaces("hello_world") #'hello world'
The BaseCleanText
class has the following methods:
-
transform_to_lowercase(text)
: Converts the input text to lowercase. -
remove_leading_trailing_spaces(text)
: Removes leading and trailing white spaces from the input text. -
replace_multiple_spaces(text)
: Removes multiple spaces in the input text and replaces them with a single space. -
remove_special_characters(text)
: Removes special characters from the input text. Special characters are defined as characters that are neither alphanumeric nor whitespace characters. A regular expression is used to match and remove these characters. -
remove_accents(text)
: Removes accents from vowels in the input text. It replaces accented vowel characters (e.g., á, é, í) with their non-accented counterparts (e.g., a, e, i). -
remove_n_tilde(text):
Removes the tilde from the character 'ñ' in the input text, replacing it with a regular 'n'. -
replace_spaces_by_underscores(text)
: Replaces spaces with underscores in the input text. -
replace_underscores_by_spaces(text)
: Replaces underscores with spaces in the input text.
Contributing
Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
License
clean_text_rhoni
was created by rhoni. It is licensed under the terms of the MIT license.
Credits
clean_text_rhoni
was created with cookiecutter
and the py-pkgs-cookiecutter
template.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file clean_text_rhoni-0.1.14.tar.gz
.
File metadata
- Download URL: clean_text_rhoni-0.1.14.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9f0399e5106e3542f6e90a594e60cf0cfe45782093f60a440245034d6d6a6ef |
|
MD5 | 910c643e37105eb8e5c4bd0220fafa6a |
|
BLAKE2b-256 | 70d997c19344f4d3d433814e00d9de99d784d64a23aba82de3a7812227433f63 |
File details
Details for the file clean_text_rhoni-0.1.14-py3-none-any.whl
.
File metadata
- Download URL: clean_text_rhoni-0.1.14-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 04f0f3857a5a06fd31f0f247db9970c10aad26accdab3bbfcf18f0ff951ffc2f |
|
MD5 | cff1afec1df84e4eb599cb1a1190621b |
|
BLAKE2b-256 | f3710997a2f04258a4b2d0fb5e05c278cef57df53bb0bf2f1d6f07b13a8db92b |