Skip to main content

A comprehensive string utility library for Python.

Project description

py-text-toolkit

A lightweight, dependency-minimal Python library for everyday string operations — cleaning, validation, analysis, case conversion, and generation.


Installation

pip install py-text-toolkit

Requires: Python 3.8+
Optional dependency: emoji (required only for cleaning.remove_emojis)


Modules at a Glance

Module What it does
strutils.cleaning Strip, replace, and normalize raw text
strutils.validation Validate emails, URLs, passwords, and character sets
strutils.analysis Count, compare, and measure strings
strutils.format_cases Convert between naming conventions and formatting styles
strutils.generation Generate slugs, masks, ciphers, and reversed strings

Quick Start

from strutils.cleaning import remove_html_tags, remove_urls
from strutils.validation import is_email, is_strong_password
from strutils.analysis import word_count, is_palindrome
from strutils.format_cases import to_snake_case, to_camel_case
from strutils.generation import generate_slug, mask_range

# Clean
remove_html_tags("<p>Hello <b>world</b></p>")   # "Hello world"
remove_urls("Visit https://example.com today")  # "Visit today"

# Validate
is_email("user@example.com")        # True
is_strong_password("Passw0rd!")     # True

# Analyse
word_count("Hello, world!")         # 2
is_palindrome("A man a plan a canal Panama")  # True

# Convert case
to_snake_case("camelCaseText")      # "camel_case_text"
to_camel_case("hello_world")        # "helloWorld"

# Generate
generate_slug("Hello World!")       # "hello-world"
mask_range("1234-5678-9012", 5, 9, "*")  # "1234-****-9012"

Module Reference

strutils.cleaning

Functions for sanitising and normalising raw text.

Function Signature Description
normalize_whitespace (text) → str Collapse all whitespace runs to a single space and strip ends
remove_punctuation (text, replace="") → str Remove or replace all punctuation characters
remove_digits (text, replace="") → str Remove or replace all digit characters
remove_html_tags (text, replace="") → str Strip or replace HTML tags
remove_urls (text, replace="") → str Remove or replace HTTP/HTTPS and www. URLs
remove_emojis (text, replace="") → str Remove or replace emoji characters (requires emoji)
collapse_spaces (text) → str Remove all whitespace (not just collapse)

All cleaning functions accept an optional replace argument — the string substituted in place of each removed element (defaults to ""). After replacement, whitespace is always normalized.

from strutils.cleaning import remove_punctuation, remove_html_tags, remove_emojis

remove_punctuation("Hello, world!")              # "Hello world"
remove_punctuation("Hello, world!", replace=" ") # "Hello world"

remove_html_tags("<p>Hello <b>world</b></p>")    # "Hello world"
remove_html_tags("<br/>line1<br/>line2", replace=" ")  # "line1 line2"

remove_emojis("Great job! 🎉")                   # "Great job!"
remove_emojis("Hello 😊", replace="[emoji]")     # "Hello [emoji]"

strutils.validation

Boolean predicates for common string formats.

Function Signature Description
is_email (text) → bool Check for a valid email address
is_url (text) → bool Check for a valid HTTP or HTTPS URL
contains_only (text, allowed_chars) → bool Check that every character is in the allowed set
is_strong_password (text) → bool Check that a password meets strength requirements

Password requirements (is_strong_password):

  • Minimum 8 characters
  • At least one lowercase letter
  • At least one uppercase letter
  • At least one digit
  • At least one special character from @$!%*?&
from strutils.validation import is_email, is_url, contains_only, is_strong_password

is_email("user@example.com")          # True
is_email("not-an-email")              # False

is_url("https://api.service.io/v1")   # True
is_url("ftp://files.example.com")     # False

contains_only("12345", "0123456789")  # True
contains_only("hello!", "a-z")        # False  (literal chars only, not a range)

is_strong_password("Passw0rd!")       # True
is_strong_password("weakpass")        # False

Note on contains_only: allowed_chars is treated as a set of literal characters. Special regex characters are escaped automatically, so "a-z" matches only the three characters a, -, and z, not a range.


strutils.analysis

Functions that measure and compare strings.

Function Signature Description
word_count (text) → int Count words using regex word-boundary matching
char_frequency (text, char) → int Count non-overlapping occurrences of a character or substring
count_vowels (text) → int Count English vowels (a e i o u), case-insensitive
longest_word (text) → int Return the length of the longest whitespace-delimited word
is_palindrome (text, case_sensitive=False, ignore_formatting=True) → bool Check if a string is a palindrome
is_anagram (word1, word2) → bool Check if two strings are anagrams (case-insensitive, ignores spaces)
from strutils.analysis import word_count, is_palindrome, is_anagram, char_frequency

word_count("Hello, world!")                   # 2
word_count("  spaces   everywhere  ")         # 2

char_frequency("banana", "an")                # 2

is_palindrome("racecar")                      # True
is_palindrome("A man a plan a canal Panama")  # True
is_palindrome("Racecar", case_sensitive=True) # False

is_anagram("listen", "silent")                # True
is_anagram("Astronomer", "Moon starer")       # True

strutils.format_cases

Convert strings between naming conventions and apply text formatting.

Function Signature Description
to_snake_case (text) → str Convert to snake_case
to_camel_case (text) → str Convert to camelCase
to_pascal_case (text) → str Convert to PascalCase
to_kebab_case (text) → str Convert to kebab-case
to_title_case (text) → str Convert to Title Case
truncate (text, max_length, suffix="...") → str Truncate to a maximum length with a suffix
pad_center (text, width, fillchar=" ") → str Center-pad to a given width

All case converters handle mixed input (camelCase, PascalCase, snake_case, kebab-case, spaces).

from strutils.format_cases import to_snake_case, to_camel_case, truncate, pad_center

to_snake_case("camelCaseText")    # "camel_case_text"
to_snake_case("Hello World!")     # "hello_world"

to_camel_case("hello_world")      # "helloWorld"
to_camel_case("PascalCaseText")   # "pascalCaseText"

to_pascal_case("kebab-case-text") # "KebabCaseText"
to_kebab_case("camelCaseText")    # "camel-case-text"
to_title_case("hello_world")      # "Hello World"

truncate("Hello, World!", 8)      # "Hello..."
truncate("Hi", 10)                # "Hi"

pad_center("hello", 11)           # "   hello   "
pad_center("hi", 10, "-")         # "----hi----"

strutils.generation

Functions that produce new strings from existing ones.

Function Signature Description
generate_slug (text) → str Convert to a URL-friendly slug
reverse_word (text) → str Reverse all characters
mask_range (text, start_index, end_index, placeholder="X") → str Mask a character range with a placeholder
ceasar_cipher (text, shift) → str Encrypt/decrypt with the Caesar cipher
from strutils.generation import generate_slug, mask_range, ceasar_cipher, reverse_word

generate_slug("Hello World!")               # "hello-world"
generate_slug("Python 3.11 -- Release Notes")  # "python-3-11-release-notes"

reverse_word("hello")                       # "olleh"

mask_range("1234-5678-9012", 5, 9, "*")     # "1234-****-9012"
mask_range("secret", -3, -1)               # "secXXt"

ceasar_cipher("Hello, World!", 3)           # "Khoor, Zruog!"
ceasar_cipher("Khoor, Zruog!", -3)          # "Hello, World!"  (decrypt)

Dependencies

Package Required Used by
re (stdlib) Always All modules
string (stdlib) Always cleaning
emoji Optional cleaning.remove_emojis only

License

MIT License — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_text_toolkit-0.1.1.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_text_toolkit-0.1.1-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file py_text_toolkit-0.1.1.tar.gz.

File metadata

  • Download URL: py_text_toolkit-0.1.1.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for py_text_toolkit-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c799b5ca2b474c6d7303fea77b1b04bbe4bfa7b93918b8f73589bc2a25daf108
MD5 5fa387c5db512221809cc41ff1d7135b
BLAKE2b-256 10da92c36f4385693e472f85ce299bddb833433461e464a35cf453345c43121a

See more details on using hashes here.

File details

Details for the file py_text_toolkit-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for py_text_toolkit-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 72317b0f6968c7f02bd94d4253f2a7cb71cee104fe64e75d98e16989ca89492d
MD5 81ff6120e268eb206bb60157a991bd30
BLAKE2b-256 eb4cdb9897d1893ae4be37faa9271316b5cf21195e1f2ba955fb62e66e86905d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page