Skip to main content

Supercharge text processing

Project description

Texy: A conservative text processing library


Python PyPI - Version

A utility library for quickly cleaning texts

Installation

Python version in the dev environment: 3.11.5

pip install -U texy

Usage

Pipelines with parallelization in Rust:

>>> from texy.pipelines import extreme_clean, strict_clean, relaxed_clean
>>> data = ["hello ;/ from the other side 😊 \t "]
print(extreme_clean(data))
>>> ['hello from the other side']
print(strict_clean(data))
>>> ['hello ;/ from the other side']
print(relaxed_clean(data))
>>> ['hello ;/ from the other side 😊']

Parallelize custom functions with Python Multiprocessing:

from texy.pipelines import parallelize

def dummy(x):
    return [i[0] for i in x]

data = ["a ", "b ", "c ", "d ", "e ", "f ", "g ", "h ?."] * 100
print(parallelize(dummy, data, 2))

Actions

Pipeline Actions
relaxed_clean remove_newlines, remove_html, remove_xml, merge_spaces
strict_clean remove_newlines, remove_urls, remove_emails, remove_html, remove_xml, remove_emoticons, remove_emojis, remove_infrequent_punctuations, merge_spaces
extreme_clean remove_newlines, remove_urls, remove_emails, remove_html, remove_xml, remove_emoticons, remove_emojis, remove_all_punctuations, merge_spaces

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

texy-0.0.2.tar.gz (21.1 kB view hashes)

Uploaded Source

Built Distributions

texy-0.0.2-cp311-abi3-musllinux_1_2_x86_64.whl (1.9 MB view hashes)

Uploaded CPython 3.11+ musllinux: musl 1.2+ x86-64

texy-0.0.2-cp311-abi3-manylinux_2_28_x86_64.whl (823.0 kB view hashes)

Uploaded CPython 3.11+ manylinux: glibc 2.28+ x86-64

texy-0.0.2-cp311-abi3-macosx_11_0_arm64.whl (697.8 kB view hashes)

Uploaded CPython 3.11+ macOS 11.0+ ARM64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page