Skip to main content

A powerful regex utility library for validation, extraction, conversion, and formatting.

Project description

🎉 Rexa Usage Guide

Welcome to Rexa, your one-stop Python library for powerful regex operations and text preprocessing! Whether you’re building web scrapers, form validators, or NLP pipelines, this guide will help you harness Rexa’s full potential.


🔍 1. Validator (validation.py)

Validate and match common patterns with ease:

Method What it does Example
Is_Email(s) ✅ Returns True if s is a valid email Rex().Is_Email("user@example.com") # True
Match_Email(s) 🔎 Returns a Match object for valid email, else None Rex().Match_Email("bad@") # None
Is_URL(s) ✅ Validate HTTP/HTTPS URLs Rex().Is_URL("https://site.io") # True
Match_URL(s) 🔎 Match URL and capture path/query m = Rex().Match_URL("site.com/path")
Is_Date_ISO(s) ✅ Check YYYY-MM-DD date format Rex().Is_Date_ISO("2025-08-02") # True
Match_Date_ISO(s) 🔎 Capture ISO date, if present Rex().Match_Date_ISO("02/08/2025") # None

And many more Is_* / Match_* methods for phones, UUIDs, etc.


📥 2. Extractor (extraction.py)

Pull data out of messy text:

Method Extracts… Example
Extract_Emails(text) All email addresses Rex().Extract_Emails("a@a.com b@b.org") # [a@a.com,b@b.org]
Extract_URLs(text) All web links Rex().Extract_URLs("Go to http://x.com") # [http://x.com]
Extract_Dates(text) Dates in ISO/EU formats Rex().Extract_Dates("2021-01-01 or 01/01/2021")
Extract_Phones(text) Phone numbers (intl & local) Rex().Extract_Phones("+123456789, 09121234567")

…plus IPv4, UUIDs, and more.


🔄 3. Converter (conversion.py)

Normalize and reformat strings:

Method Transforms… Example
Convert_MultipleSpaces(text) Collapse extra spaces Rex().Convert_MultipleSpaces("A B")"A B"
Convert_ThousandSeparatedNumbers(text) Strip commas from large numbers Rex().Convert_ThousandSeparatedNumbers("1,000,000")"1000000"
Convert_DateFormat(s,from,to) Swap date separators Rex().Convert_DateFormat("01.01.2025",".","/")"01/01/2025"
Slugify(text) Generate SEO-friendly URL slugs Rex().Slugify("Hello World!")"hello-world"

✨ 4. Formatter (formatting.py)

Clean up and standardize:

Method Cleans… Example
Strip_HTMLTags(s) Remove HTML tags Rex().Strip_HTMLTags("<b>Hi</b>")"Hi"
Normalize_Spaces(s) Single-space normalization Rex().Normalize_Spaces("A B")"A B"
Remove_ThousandSeparators(s) Drop commas in numbers Rex().Remove_ThousandSeparators("1,234")"1234"
Normalize_DateSeparator(s,sep) Consistent date delimiter Rex().Normalize_DateSeparator("2021/01.01","-")"2021-01-01"

🧹 5. TextTools (texttools.py)

Advanced NLP & text cleaning utilities:

Method Description Example
to_lower(s) Lowercase entire string TextTools.to_lower("HELLO")"hello"
to_upper(s) Uppercase entire string TextTools.to_upper("hi")"HI"
remove_emojis(s) Strip Unicode emojis TextTools.remove_emojis("I ❤️ you")"I you"
remove_numbers(s) Remove all digits TextTools.remove_numbers("a1b2")"ab"
remove_usernames(s) Remove @username tokens TextTools.remove_usernames("@me hi")" hi"
remove_punctuation(s) Strip punctuation & symbols TextTools.remove_punctuation("Hey!?@")"Hey"
remove_urls_emails(s) Drop URLs & email addresses TextTools.remove_urls_emails("a@b.com http://x")" "
remove_stopwords(s) Filter common words (using NLTK) TextTools.remove_stopwords("the cat sits")"cat sits"
lemmatize_text(s) Lemmatize tokens TextTools.lemmatize_text("running")"running"
stem_text(s) Stem tokens TextTools.stem_text("running")"run"
normalize_whitespace(s) Collapse whitespace TextTools.normalize_whitespace(" A B\n")"A B"
normalize_arabic(s) Persian/Arabic char mapping & diacritics TextTools.normalize_arabic("كیف")"کیف"
count_tokens(s) Count word tokens TextTools.count_tokens("a b c")3
remove_short_long_words(s,min,max) Keep words len in range TextTools.remove_short_long_words("a bb ccc",2,3)"bb ccc"
detect_language(s) Auto-detect text language TextTools.detect_language("hello")"en"
clean_text(...kwargs) Pipeline for common cleaning options TextTools.clean_text("Hi @you 123 😊", lowercase=True, remove_emoji=True, remove_username=True, remove_urls_emails=True, remove_punct=True)"hi"

🚀 Quick Tips

  • Mix & Match: Call only the methods you need or use clean_text for a one-shot pipeline.
  • Extendable: Create subclasses to add domain-specific patterns.
  • Performance: For bulk text, parallelize tokenization and regex calls.

Happy coding with Rexa! Questions or feedback? Open an issue at https://github.com/arshia82sbn/rexa/issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rexa-0.2.2.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rexa-0.2.2-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file rexa-0.2.2.tar.gz.

File metadata

  • Download URL: rexa-0.2.2.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for rexa-0.2.2.tar.gz
Algorithm Hash digest
SHA256 0bbc1aecfeea7cd009d8874401c61ff6d087634da271e189b3ef8df8fa2157df
MD5 15af6a69f7730aa51d39193acde7f0be
BLAKE2b-256 e1e47bad25235ef35cd328ed1a7a26a1c63e1a03bada27c469239ce85809c3f4

See more details on using hashes here.

File details

Details for the file rexa-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: rexa-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for rexa-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3d31c09dbfe61ce53e45ed8327c4f266e999af11c29165fa33e0283aabfc90c5
MD5 f58422061768da7336118cecbf9a7a6e
BLAKE2b-256 9f30781507c07258f36ecd0a285f14312dc9845176cff028c3ac042f65bfb6bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page