Lemmatizer for spanish language
Project description
# reTexto
Fast text processing for python
### Run
cd /[project_path]
docker build -t retext .
docker run -v $(pwd):/retext:rw -it retext bash
### Test
invoke test
### Work in
docker run -v $(pwd):/jiazz:rw -it jiazz bash
## Basic Use
if __name__ == '__main__':
s = '@Edux87, i need this www.google.com | https://github.com <br> \
<strong>UserName: çarlos </strong> \
i\'m from Perú 😛 \
#Friends #Text jajajajaja so fffunny \
loooveee thiiis 😌😎 \
@florenciaflor19 Si!!! sé vo… 🐷JUANA🐷 \
smile! haha jejeje jojojo jujuju jijijijajaja 😂'
text = ReTexto(s)
s = text.remove_html() \
.remove_mentions() \
.remove_tags() \
.remove_smiles(by='SMILING') \
.convert_specials() \
.convert_emoji() \
.remove_nochars(preserve_tilde=True) \
.remove_url() \
.remove_duplicate(r='a-jp-z') \
.remove_duplicate_vowels() \
.remove_duplicate_consonants() \
.remove_punctuation() \
.remove_multispaces() \
.lower() \
.remove_stopwords() \
.split_words(uniques=True)
print(s)
['username', 'from', 'love', 'i', 'ned', 'funy', 'juana', 'vo', 'this', 'si', 'im', 'se', 'peru', 'smile', 'so', 'smiling', 'carlos']
Fast text processing for python
### Run
cd /[project_path]
docker build -t retext .
docker run -v $(pwd):/retext:rw -it retext bash
### Test
invoke test
### Work in
docker run -v $(pwd):/jiazz:rw -it jiazz bash
## Basic Use
if __name__ == '__main__':
s = '@Edux87, i need this www.google.com | https://github.com <br> \
<strong>UserName: çarlos </strong> \
i\'m from Perú 😛 \
#Friends #Text jajajajaja so fffunny \
loooveee thiiis 😌😎 \
@florenciaflor19 Si!!! sé vo… 🐷JUANA🐷 \
smile! haha jejeje jojojo jujuju jijijijajaja 😂'
text = ReTexto(s)
s = text.remove_html() \
.remove_mentions() \
.remove_tags() \
.remove_smiles(by='SMILING') \
.convert_specials() \
.convert_emoji() \
.remove_nochars(preserve_tilde=True) \
.remove_url() \
.remove_duplicate(r='a-jp-z') \
.remove_duplicate_vowels() \
.remove_duplicate_consonants() \
.remove_punctuation() \
.remove_multispaces() \
.lower() \
.remove_stopwords() \
.split_words(uniques=True)
print(s)
['username', 'from', 'love', 'i', 'ned', 'funy', 'juana', 'vo', 'this', 'si', 'im', 'se', 'peru', 'smile', 'so', 'smiling', 'carlos']
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
retexto-1.3.tar.gz
(24.3 kB
view details)
File details
Details for the file retexto-1.3.tar.gz
.
File metadata
- Download URL: retexto-1.3.tar.gz
- Upload date:
- Size: 24.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
f6f32539fefda319949ba5d68bb7e0ff956309e63dc2331b9ed74c4e95335278
|
|
MD5 |
b98718f95284c4960d4bfb67c37fbd4a
|
|
BLAKE2b-256 |
bb518f0a3f48d9e6a11742b3c2b771449bfb79e93fc79995bb69750c954ad777
|