A powerful regex utility library for validation, extraction, conversion, and formatting.
Project description
🎉 Rexa Usage Guide
Welcome to Rexa, your one-stop Python library for powerful regex operations and text preprocessing! Whether you’re building web scrapers, form validators, or NLP pipelines, this guide will help you harness Rexa’s full potential.
🔍 1. Validator (validation.py)
Validate and match common patterns with ease:
| Method | What it does | Example |
|---|---|---|
Is_Email(s) |
✅ Returns True if s is a valid email |
Rex().Is_Email("user@example.com") # True |
Match_Email(s) |
🔎 Returns a Match object for valid email, else None | Rex().Match_Email("bad@") # None |
Is_URL(s) |
✅ Validate HTTP/HTTPS URLs | Rex().Is_URL("https://site.io") # True |
Match_URL(s) |
🔎 Match URL and capture path/query | m = Rex().Match_URL("site.com/path") |
Is_Date_ISO(s) |
✅ Check YYYY-MM-DD date format |
Rex().Is_Date_ISO("2025-08-02") # True |
Match_Date_ISO(s) |
🔎 Capture ISO date, if present | Rex().Match_Date_ISO("02/08/2025") # None |
And many more Is_* / Match_* methods for phones, UUIDs, etc.
📥 2. Extractor (extraction.py)
Pull data out of messy text:
| Method | Extracts… | Example |
|---|---|---|
Extract_Emails(text) |
All email addresses | Rex().Extract_Emails("a@a.com b@b.org") # [a@a.com,b@b.org] |
Extract_URLs(text) |
All web links | Rex().Extract_URLs("Go to http://x.com") # [http://x.com] |
Extract_Dates(text) |
Dates in ISO/EU formats | Rex().Extract_Dates("2021-01-01 or 01/01/2021") |
Extract_Phones(text) |
Phone numbers (intl & local) | Rex().Extract_Phones("+123456789, 09121234567") |
…plus IPv4, UUIDs, and more.
🔄 3. Converter (conversion.py)
Normalize and reformat strings:
| Method | Transforms… | Example |
|---|---|---|
Convert_MultipleSpaces(text) |
Collapse extra spaces | Rex().Convert_MultipleSpaces("A B") → "A B" |
Convert_ThousandSeparatedNumbers(text) |
Strip commas from large numbers | Rex().Convert_ThousandSeparatedNumbers("1,000,000") → "1000000" |
Convert_DateFormat(s,from,to) |
Swap date separators | Rex().Convert_DateFormat("01.01.2025",".","/") → "01/01/2025" |
Slugify(text) |
Generate SEO-friendly URL slugs | Rex().Slugify("Hello World!") → "hello-world" |
✨ 4. Formatter (formatting.py)
Clean up and standardize:
| Method | Cleans… | Example |
|---|---|---|
Strip_HTMLTags(s) |
Remove HTML tags | Rex().Strip_HTMLTags("<b>Hi</b>") → "Hi" |
Normalize_Spaces(s) |
Single-space normalization | Rex().Normalize_Spaces("A B") → "A B" |
Remove_ThousandSeparators(s) |
Drop commas in numbers | Rex().Remove_ThousandSeparators("1,234") → "1234" |
Normalize_DateSeparator(s,sep) |
Consistent date delimiter | Rex().Normalize_DateSeparator("2021/01.01","-") → "2021-01-01" |
🧹 5. TextTools (texttools.py)
Advanced NLP & text cleaning utilities:
| Method | Description | Example |
|---|---|---|
to_lower(s) |
Lowercase entire string | TextTools.to_lower("HELLO") → "hello" |
to_upper(s) |
Uppercase entire string | TextTools.to_upper("hi") → "HI" |
remove_emojis(s) |
Strip Unicode emojis | TextTools.remove_emojis("I ❤️ you") → "I you" |
remove_numbers(s) |
Remove all digits | TextTools.remove_numbers("a1b2") → "ab" |
remove_usernames(s) |
Remove @username tokens |
TextTools.remove_usernames("@me hi") → " hi" |
remove_punctuation(s) |
Strip punctuation & symbols | TextTools.remove_punctuation("Hey!?@") → "Hey" |
remove_urls_emails(s) |
Drop URLs & email addresses | TextTools.remove_urls_emails("a@b.com http://x") → " " |
remove_stopwords(s) |
Filter common words (using NLTK) | TextTools.remove_stopwords("the cat sits") → "cat sits" |
lemmatize_text(s) |
Lemmatize tokens | TextTools.lemmatize_text("running") → "running" |
stem_text(s) |
Stem tokens | TextTools.stem_text("running") → "run" |
normalize_whitespace(s) |
Collapse whitespace | TextTools.normalize_whitespace(" A B\n") → "A B" |
normalize_arabic(s) |
Persian/Arabic char mapping & diacritics | TextTools.normalize_arabic("كیف") → "کیف" |
count_tokens(s) |
Count word tokens | TextTools.count_tokens("a b c") → 3 |
remove_short_long_words(s,min,max) |
Keep words len in range | TextTools.remove_short_long_words("a bb ccc",2,3) → "bb ccc" |
detect_language(s) |
Auto-detect text language | TextTools.detect_language("hello") → "en" |
clean_text(...kwargs) |
Pipeline for common cleaning options | TextTools.clean_text("Hi @you 123 😊", lowercase=True, remove_emoji=True, remove_username=True, remove_urls_emails=True, remove_punct=True) → "hi" |
🚀 Quick Tips
- Mix & Match: Call only the methods you need or use
clean_textfor a one-shot pipeline. - Extendable: Create subclasses to add domain-specific patterns.
- Performance: For bulk text, parallelize tokenization and regex calls.
Happy coding with Rexa! Questions or feedback? Open an issue at https://github.com/arshia82sbn/rexa/issues
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rexa-0.2.2.tar.gz.
File metadata
- Download URL: rexa-0.2.2.tar.gz
- Upload date:
- Size: 13.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bbc1aecfeea7cd009d8874401c61ff6d087634da271e189b3ef8df8fa2157df
|
|
| MD5 |
15af6a69f7730aa51d39193acde7f0be
|
|
| BLAKE2b-256 |
e1e47bad25235ef35cd328ed1a7a26a1c63e1a03bada27c469239ce85809c3f4
|
File details
Details for the file rexa-0.2.2-py3-none-any.whl.
File metadata
- Download URL: rexa-0.2.2-py3-none-any.whl
- Upload date:
- Size: 10.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d31c09dbfe61ce53e45ed8327c4f266e999af11c29165fa33e0283aabfc90c5
|
|
| MD5 |
f58422061768da7336118cecbf9a7a6e
|
|
| BLAKE2b-256 |
9f30781507c07258f36ecd0a285f14312dc9845176cff028c3ac042f65bfb6bb
|