Supercharge text processing
Project description
Texy: A conservative text processing library
A utility library for quickly cleaning texts
Installation
Python version in the dev environment: 3.11.5
pip install -U texy
Usage
Pipelines with parallelization in Rust:
>>> from texy.pipelines import extreme_clean, strict_clean, relaxed_clean
>>> data = ["hello ;/ from the other side 😊 \t "]
print(extreme_clean(data))
>>> ['hello from the other side']
print(strict_clean(data))
>>> ['hello ;/ from the other side']
print(relaxed_clean(data))
>>> ['hello ;/ from the other side 😊']
Parallelize custom functions with Python Multiprocessing:
from texy.pipelines import parallelize
def dummy(x):
return [i[0] for i in x]
data = ["a ", "b ", "c ", "d ", "e ", "f ", "g ", "h ?."] * 100
print(parallelize(dummy, data, 2))
Actions
Pipeline | Actions |
---|---|
relaxed_clean |
remove_newlines , remove_html , remove_xml , merge_spaces |
strict_clean |
remove_newlines , remove_urls , remove_emails , remove_html , remove_xml , remove_emoticons , remove_emojis , remove_infrequent_punctuations , merge_spaces |
extreme_clean |
remove_newlines , remove_urls , remove_emails , remove_html , remove_xml , remove_emoticons , remove_emojis , remove_all_punctuations , merge_spaces |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
texy-0.0.2.tar.gz
(21.1 kB
view hashes)
Built Distributions
Close
Hashes for texy-0.0.2-cp311-abi3-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 378e8a122ceb56097227cf39d72419729adf3b997cc2fd37ecf9eb6924b54690 |
|
MD5 | 169d277ad5520352f8605c7ae37f838a |
|
BLAKE2b-256 | f34f99abfef931c5a51bb650700f2452399792718047b1f30e6fc236556ca6db |
Close
Hashes for texy-0.0.2-cp311-abi3-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c83501652645a81c8f0b29425495f54cbd64498117ce5d1ed464f67b6cdd3e64 |
|
MD5 | c98b9953c9fad4b94494118e01aea946 |
|
BLAKE2b-256 | c7f8b62e227a313536b6de535d9e725b417b06552035856e7957ef1e1a1e38f7 |
Close
Hashes for texy-0.0.2-cp311-abi3-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 124f857f55712d1be4c60cb511582f6a83df48a2c14f95fb8c445233b4e65eea |
|
MD5 | 9d61ce857d738c345aae102f877746d8 |
|
BLAKE2b-256 | cdefb1e7ee4a0c86541baec674e6ab36fec0bbd526daee3144b15c47d41019d7 |