A simple tokenizer function for NLP
Project description
This tokenizer function turns text into lowercase word tokens, removes English stopwords, lemmatize the tokens and replaces URLs with a placeholder.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file jf_tokenize_package-1.0.3.tar.gz
.
File metadata
- Download URL: jf_tokenize_package-1.0.3.tar.gz
- Upload date:
- Size: 1.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69d635bb16298b59d5b76d03c273a202b3e3440827e89657fa9dfc1c40858ced |
|
MD5 | 017fd0628869a7bb29c762ccc4b1c430 |
|
BLAKE2b-256 | 4ef15b7ea57b0ab383a1208bf3c6a33825965990b1b32b48566a920e514f9b96 |