Skip to main content

A comprehensive string utility library for Python.

Project description

py-text-toolkit

A lightweight, dependency-minimal Python library for everyday string operations — cleaning, validation, analysis, case conversion, and generation.


Installation

pip install py-text-toolkit

Requires: Python 3.8+
Optional dependency: emoji (required only for cleaning.remove_emojis)


Modules at a Glance

Module What it does
py-text-toolkit.cleaning Strip, replace, and normalize raw text
py-text-toolkit.validation Validate emails, URLs, passwords, and character sets
py-text-toolkit.analysis Count, compare, and measure strings
py-text-toolkit.format_cases Convert between naming conventions and formatting styles
py-text-toolkit.generation Generate slugs, masks, ciphers, and reversed strings

Quick Start

from py-text-toolkit.cleaning import remove_html_tags, remove_urls
from py-text-toolkit.validation import is_email, is_strong_password
from py-text-toolkit.analysis import word_count, is_palindrome
from py-text-toolkit.format_cases import to_snake_case, to_camel_case
from py-text-toolkit.generation import generate_slug, mask_range

# Clean
remove_html_tags("<p>Hello <b>world</b></p>")   # "Hello world"
remove_urls("Visit https://example.com today")  # "Visit today"

# Validate
is_email("user@example.com")        # True
is_strong_password("Passw0rd!")     # True

# Analyse
word_count("Hello, world!")         # 2
is_palindrome("A man a plan a canal Panama")  # True

# Convert case
to_snake_case("camelCaseText")      # "camel_case_text"
to_camel_case("hello_world")        # "helloWorld"

# Generate
generate_slug("Hello World!")       # "hello-world"
mask_range("1234-5678-9012", 5, 9, "*")  # "1234-****-9012"

Module Reference

py-text-toolkit.cleaning

Functions for sanitising and normalising raw text.

Function Signature Description
normalize_whitespace (text) → str Collapse all whitespace runs to a single space and strip ends
remove_punctuation (text, replace="") → str Remove or replace all punctuation characters
remove_digits (text, replace="") → str Remove or replace all digit characters
remove_html_tags (text, replace="") → str Strip or replace HTML tags
remove_urls (text, replace="") → str Remove or replace HTTP/HTTPS and www. URLs
remove_emojis (text, replace="") → str Remove or replace emoji characters (requires emoji)
collapse_spaces (text) → str Remove all whitespace (not just collapse)

All cleaning functions accept an optional replace argument — the string substituted in place of each removed element (defaults to ""). After replacement, whitespace is always normalized.

from py-text-toolkit.cleaning import remove_punctuation, remove_html_tags, remove_emojis

remove_punctuation("Hello, world!")              # "Hello world"
remove_punctuation("Hello, world!", replace=" ") # "Hello world"

remove_html_tags("<p>Hello <b>world</b></p>")    # "Hello world"
remove_html_tags("<br/>line1<br/>line2", replace=" ")  # "line1 line2"

remove_emojis("Great job! 🎉")                   # "Great job!"
remove_emojis("Hello 😊", replace="[emoji]")     # "Hello [emoji]"

py-text-toolkit.validation

Boolean predicates for common string formats.

Function Signature Description
is_email (text) → bool Check for a valid email address
is_url (text) → bool Check for a valid HTTP or HTTPS URL
contains_only (text, allowed_chars) → bool Check that every character is in the allowed set
is_strong_password (text) → bool Check that a password meets strength requirements

Password requirements (is_strong_password):

  • Minimum 8 characters
  • At least one lowercase letter
  • At least one uppercase letter
  • At least one digit
  • At least one special character from @$!%*?&
from py-text-toolkit.validation import is_email, is_url, contains_only, is_strong_password

is_email("user@example.com")          # True
is_email("not-an-email")              # False

is_url("https://api.service.io/v1")   # True
is_url("ftp://files.example.com")     # False

contains_only("12345", "0123456789")  # True
contains_only("hello!", "a-z")        # False  (literal chars only, not a range)

is_strong_password("Passw0rd!")       # True
is_strong_password("weakpass")        # False

Note on contains_only: allowed_chars is treated as a set of literal characters. Special regex characters are escaped automatically, so "a-z" matches only the three characters a, -, and z, not a range.


py-text-toolkit.analysis

Functions that measure and compare strings.

Function Signature Description
word_count (text) → int Count words using regex word-boundary matching
char_frequency (text, char) → int Count non-overlapping occurrences of a character or substring
count_vowels (text) → int Count English vowels (a e i o u), case-insensitive
longest_word (text) → int Return the length of the longest whitespace-delimited word
is_palindrome (text, case_sensitive=False, ignore_formatting=True) → bool Check if a string is a palindrome
is_anagram (word1, word2) → bool Check if two strings are anagrams (case-insensitive, ignores spaces)
from py-text-toolkit.analysis import word_count, is_palindrome, is_anagram, char_frequency

word_count("Hello, world!")                   # 2
word_count("  spaces   everywhere  ")         # 2

char_frequency("banana", "an")                # 2

is_palindrome("racecar")                      # True
is_palindrome("A man a plan a canal Panama")  # True
is_palindrome("Racecar", case_sensitive=True) # False

is_anagram("listen", "silent")                # True
is_anagram("Astronomer", "Moon starer")       # True

py-text-toolkit.format_cases

Convert strings between naming conventions and apply text formatting.

Function Signature Description
to_snake_case (text) → str Convert to snake_case
to_camel_case (text) → str Convert to camelCase
to_pascal_case (text) → str Convert to PascalCase
to_kebab_case (text) → str Convert to kebab-case
to_title_case (text) → str Convert to Title Case
truncate (text, max_length, suffix="...") → str Truncate to a maximum length with a suffix
pad_center (text, width, fillchar=" ") → str Center-pad to a given width

All case converters handle mixed input (camelCase, PascalCase, snake_case, kebab-case, spaces).

from py-text-toolkit.format_cases import to_snake_case, to_camel_case, truncate, pad_center

to_snake_case("camelCaseText")    # "camel_case_text"
to_snake_case("Hello World!")     # "hello_world"

to_camel_case("hello_world")      # "helloWorld"
to_camel_case("PascalCaseText")   # "pascalCaseText"

to_pascal_case("kebab-case-text") # "KebabCaseText"
to_kebab_case("camelCaseText")    # "camel-case-text"
to_title_case("hello_world")      # "Hello World"

truncate("Hello, World!", 8)      # "Hello..."
truncate("Hi", 10)                # "Hi"

pad_center("hello", 11)           # "   hello   "
pad_center("hi", 10, "-")         # "----hi----"

py-text-toolkit.generation

Functions that produce new strings from existing ones.

Function Signature Description
generate_slug (text) → str Convert to a URL-friendly slug
reverse_word (text) → str Reverse all characters
mask_range (text, start_index, end_index, placeholder="X") → str Mask a character range with a placeholder
ceasar_cipher (text, shift) → str Encrypt/decrypt with the Caesar cipher
from py-text-toolkit.generation import generate_slug, mask_range, ceasar_cipher, reverse_word

generate_slug("Hello World!")               # "hello-world"
generate_slug("Python 3.11 -- Release Notes")  # "python-3-11-release-notes"

reverse_word("hello")                       # "olleh"

mask_range("1234-5678-9012", 5, 9, "*")     # "1234-****-9012"
mask_range("secret", -3, -1)               # "secXXt"

ceasar_cipher("Hello, World!", 3)           # "Khoor, Zruog!"
ceasar_cipher("Khoor, Zruog!", -3)          # "Hello, World!"  (decrypt)

Dependencies

Package Required Used by
re (stdlib) Always All modules
string (stdlib) Always cleaning
emoji Optional cleaning.remove_emojis only

Install with the optional dependency:

pip install py-text-toolkit[emoji]

License

MIT License — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_text_toolkit-0.1.0.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_text_toolkit-0.1.0-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file py_text_toolkit-0.1.0.tar.gz.

File metadata

  • Download URL: py_text_toolkit-0.1.0.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for py_text_toolkit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 080114c57eed0415a352b63ee8a6b2d4acf2bc7a0555ec5ef17711b1c3cf3817
MD5 44da799a048e35f34dddec4cb7f84504
BLAKE2b-256 1fb415fdc62cb7980819948eaa5600b2713463dc36e4c45ac4c3d883c0857af3

See more details on using hashes here.

File details

Details for the file py_text_toolkit-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for py_text_toolkit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e35eabce08a2dba33f30cdf334080a7f690c2ff2c392ba85c4019762f68e283f
MD5 af15aa199979f9f3839f5373b88dca16
BLAKE2b-256 1a4e91ced6b211be9096d70ebee54496ee12087ce5b88bdf52efd113be4e6aaf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page