A powerful regex and NLP utility library for text validation, extraction, conversion, formatting, and pattern generation.
Project description
Rexa Usage Guide
Rexa is a Python library for regex operations and text preprocessing, supporting validation, extraction, conversion, formatting, pattern generation, and NLP tasks.
Getting Started
Install via pip:
pip install rexa
Import:
import rexa
Dependencies: Python 3.10+, regex, dateparser, pydantic, nltk, langdetect.
Source: GitHub | Issues
Validator (validation.py)
| Method | Description | Example |
|---|---|---|
| Is_Email(s) | Validates email format | rexa.Is_Email("user@example.com") → True |
| Match_Email(s) | Returns email Match object | rexa.Match_Email("bad@") → None |
| Is_URL(s) | Validates HTTP/HTTPS URLs | rexa.Is_URL("https://site.io") → True |
| Match_URL(s) | Matches URL, captures path/query | rexa.Match_URL("site.com/path") |
| Is_Date_ISO(s) | Validates YYYY-MM-DD format | rexa.Is_Date_ISO("2025-08-02") → True |
| Match_Date_ISO(s) | Matches ISO date | rexa.Match_Date_ISO("02/08/2025") → None |
| Is_Date_EU(s) | Validates DD/MM/YYYY format | rexa.Is_Date_EU("02/08/2025") → True |
| Match_Date_EU(s) | Matches EU date | rexa.Match_Date_EU("2025-08-02") → None |
| Is_Time(s) | Validates HH:MM(:SS) format | rexa.Is_Time("14:30") → True |
| Match_Time(s) | Matches time | rexa.Match_Time("14:30:45") |
| Is_IranianPhone(s) | Validates Iranian mobile numbers | rexa.Is_IranianPhone("09121234567") → True |
| Match_IranianPhone(s) | Matches Iranian phone | rexa.Match_IranianPhone("+989121234567") |
| Is_InternationalPhone(s) | Validates international phone numbers | rexa.Is_InternationalPhone("+123456789") → True |
| Match_InternationalPhone(s) | Matches international phone | rexa.Match_InternationalPhone("123456789") |
| Is_UUID4(s) | Validates UUID v4 format | rexa.Is_UUID4("123e4567-e89b-12d3-a456-426614174000") → True |
| Match_UUID4(s) | Matches UUID v4 | rexa.Match_UUID4("invalid-uuid") → None |
| Is_IranNationalCode(s) | Validates 10-digit Iranian code | rexa.Is_IranNationalCode("1234567890") → True |
| Match_IranNationalCode(s) | Matches Iranian code | rexa.Match_IranNationalCode("1234567890") |
| Is_ISBN10(s) | Validates ISBN-10 format | rexa.Is_ISBN10("123456789X") → True |
| Match_ISBN10(s) | Matches ISBN-10 | rexa.Match_ISBN10("1234567890") |
| Is_ISBN13(s) | Validates ISBN-13 format | rexa.Is_ISBN13("1234567890123") → True |
| Match_ISBN13(s) | Matches ISBN-13 | rexa.Match_ISBN13("1234567890123") |
| Is_HexColor(s) | Validates hex color (#RGB or #RRGGBB) | rexa.Is_HexColor("#1a2b3c") → True |
| Match_HexColor(s) | Matches hex color | rexa.Match_HexColor("#fff") |
| Is_ScientificNumber(s) | Validates scientific notation | rexa.Is_ScientificNumber("1.23e-4") → True |
| Match_ScientificNumber(s) | Matches scientific number | rexa.Match_ScientificNumber("1.23e-4") |
| Is_ThousandSepNumber(s) | Validates comma-separated numbers | rexa.Is_ThousandSepNumber("1,234,567") → True |
| Match_ThousandSepNumber(s) | Matches thousand-separated number | rexa.Match_ThousandSepNumber("1,234.56") |
| Is_Mention(s) | Validates @username format | rexa.Is_Mention("@user123") → True |
| Match_Mention(s) | Matches mention | rexa.Match_Mention("@user123") |
| Is_MarkdownLink(s) | Validates markdown link format | rexa.Is_MarkdownLink("text") → True |
| Match_MarkdownLink(s) | Matches markdown link | rexa.Match_MarkdownLink("text") |
| Is_HTMLTag(s) | Validates HTML tag format | rexa.Is_HTMLTag(" ") → True |
| Match_HTMLTag(s) | Matches HTML tag | rexa.Match_HTMLTag(" Hello ") |
| Is_MultipleEmails(s) | Validates comma-separated emails | rexa.Is_MultipleEmails("a@a.com,b@b.org") → True |
| Match_MultipleEmails(s) | Matches multiple emails | rexa.Match_MultipleEmails("a@a.com,b@b.org") |
| Is_Base64(s) | Validates base64 string | rexa.Is_Base64("SGVsbG8=") → True |
| Match_Base64(s) | Matches base64 string | rexa.Match_Base64("SGVsbG8=") |
Extractor (extraction.py)
| Method | Description | Example |
|---|---|---|
| Extract_Emails(text) | Extracts all email addresses | rexa.Extract_Emails("a@a.com b@b.org") → ["a@a.com", "b@b.org"] |
| Extract_URLs(text) | Extracts web links (http/https/ftp) | rexa.Extract_URLs("Go to http://x.com") → ["http://x.com"] |
| Extract_Dates(text) | Extracts ISO/EU dates | rexa.Extract_Dates("2021-01-01 or 01/01/2021") → ["2021-01-01", "01/01/2021"] |
| Extract_Phones(text) | Extracts phone numbers | rexa.Extract_Phones("+123456789, 09121234567") → ["+123456789", "09121234567"] |
| Extract_IPv4(text) | Extracts IPv4 addresses | rexa.Extract_IPv4("192.168.1.1") → ["192.168.1.1"] |
| Extract_UUIDs(text) | Extracts UUID v4 strings | rexa.Extract_UUIDs("123e4567-e89b-12d3-a456-426614174000") → ["123e4567-e89b-12d3-a456-426614174000"] |
| Extract_HexColors(text) | Extracts hex colors | rexa.Extract_HexColors("#fff #1a2b3c") → ["#fff", "#1a2b3c"] |
| Extract_ScientificNumbers(text) | Extracts scientific notation | rexa.Extract_ScientificNumbers("1.23e-4") → ["1.23e-4"] |
| Extract_ThousandSepNumbers(text) | Extracts comma-separated numbers | rexa.Extract_ThousandSepNumbers("1,234,567") → ["1,234,567"] |
| Extract_Mentions(text) | Extracts @usernames | rexa.Extract_Mentions("@user1 @user2") → ["@user1", "@user2"] |
| Extract_MarkdownLinks(text) | Extracts markdown links | rexa.Extract_MarkdownLinks("text") → ["text"] |
| Extract_HTMLTags(text) | Extracts HTML tags | rexa.Extract_HTMLTags(" Hello ") → ["", " "] |
| Extract_Base64(text) | Extracts base64 strings | rexa.Extract_Base64("SGVsbG8=") → ["SGVsbG8="] |
Converter (conversion.py)
| Method | Description | Example |
|---|---|---|
| Convert_MultipleSpaces(text) | Collapses multiple spaces | rexa.Convert_MultipleSpaces("A B") → "A B" |
| Convert_ThousandSeparatedNumbers(text) | Removes commas from numbers | rexa.Convert_ThousandSeparatedNumbers("1,234,567") → "1234567" |
| Convert_DateFormat(s, from_sep, to_sep) | Changes date separators | rexa.Convert_DateFormat("01.01.2025", ".", "/") → "01/01/2025" |
| Slugify(text) | Creates SEO-friendly slugs | rexa.Slugify("Hello World!") → "hello-world" |
Formatter (formatting.py)
| Method | Description | Example |
|---|---|---|
| Strip_HTMLTags(s) | Removes HTML tags | rexa.Strip_HTMLTags("Hi") → "Hi" |
| Normalize_Spaces(s) | Collapses whitespace to single spaces | rexa.Normalize_Spaces("A B") → "A B" |
| Remove_ThousandSeparators(s) | Removes commas from numbers | rexa.Remove_ThousandSeparators("1,234") → "1234" |
| Normalize_DateSeparator(s, sep) | Standardizes date delimiters | rexa.Normalize_DateSeparator("2021/01.01", "-") → "2021-01-01" |
Regex Mapper (regex_mapper.py)
| Method | Description | Example |
|---|---|---|
| string_to_regex(text, use_anchors=False, use_char_map=True, detailed=False) | Generates fuzzy regex pattern | rexa.string_to_regex("i'm") → "\b[iIíìîïİ1! |
| string_to_breakdown(text, use_char_map=True, detailed=False) | Breaks text into regex components | rexa.string_to_breakdown("i'm") → [{"type": "LETTER", "pattern": "[iIíìîïİ1! |
| get_all_patterns() | Returns predefined patterns | rexa.get_all_patterns() → {"email": "^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,}$", ...} |
TextTools (texttools.py)
| Method | Description | Example |
|---|---|---|
| to_lower(s) | Converts to lowercase | rexa.to_lower("HELLO") → "hello" |
| to_upper(s) | Converts to uppercase | rexa.to_upper("hi") → "HI" |
| remove_emojis(s) | Removes Unicode emojis | rexa.remove_emojis("I ❤️ you") → "I you" |
| remove_numbers(s) | Removes digits | rexa.remove_numbers("a1b2") → "ab" |
| remove_usernames(s) | Removes @usernames | rexa.remove_usernames("@me hi") → " hi" |
| remove_urls_emails(s) | Removes URLs and emails | rexa.remove_urls_emails("a@b.com http://x") → " " |
| remove_punctuation(s) | Removes punctuation | rexa.remove_punctuation("Hey!?@") → "Hey" |
| remove_stopwords(s) | Removes common words | rexa.remove_stopwords("the cat sits") → "cat sits" |
| lemmatize_text(s) | Lemmatizes tokens | rexa.lemmatize_text("running") → "run" |
| stem_text(s) | Stems tokens | rexa.stem_text("running") → "run" |
| normalize_whitespace(s) | Collapses whitespace | rexa.normalize_whitespace(" A B\n") → "A B" |
| normalize_arabic(s) | Normalizes Persian/Arabic chars | rexa.normalize_arabic("كیف") → "کیف" |
| count_tokens(s) | Counts word tokens | rexa.count_tokens("a b c") → 3 |
| remove_short_long_words(s, min, max) | Keeps words in length range | rexa.remove_short_long_words("a bb ccc", 2, 3) → "bb ccc" |
| detect_language(s) | Detects text language | rexa.detect_language("hello") → "en" |
| clean_text(s, **kwargs) | Applies cleaning pipeline | rexa.clean_text("Hi @you 123 😊", lowercase=True, remove_emoji=True) → "hi" |
Quick Tips
- Combine Functions: Chain methods (e.g.,
clean_text→Extract_Emails). - Fuzzy Matching: Use
string_to_regex(use_char_map=True)for robust patterns. - Performance: Batch large texts with
map()or multiprocessing. - Extensibility: Subclass Rexa for custom patterns.
- Debugging: Enable logging via
utils.Logger.
Example Workflow
import rexa
text = "Contact: alice@example.com, bob@TEST.org Date: 2025-08-02 😊"
cleaned = rexa.clean_text(text, lowercase=True, remove_emoji=True)
emails = rexa.Extract_Emails(cleaned)
is_valid_date = rexa.Is_Date_ISO("2025-08-02")
pattern = rexa.string_to_regex("alice", use_char_map=True)
print(cleaned) # contact: alice@example.com, bob@test.org date: 2025-08-02
print(emails) # ['alice@example.com', 'bob@test.org']
print(is_valid_date) # True
print(pattern) # \b[aAáàâäãåā@4][lL1|][iIíìîïİ1!|][cCçćč][eEéèêëē3]\b
Feedback? Open an issue.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rexa-1.0.4.tar.gz.
File metadata
- Download URL: rexa-1.0.4.tar.gz
- Upload date:
- Size: 29.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
953a0e085d732fb09cb0f02fd7a8d5e8316b468bece064bc15ab1a9678eba0e6
|
|
| MD5 |
32385e42ace082bd0725e54019694dcc
|
|
| BLAKE2b-256 |
fb600de79109bda8cdb222c614884e5c63de01e12d16c36463e2fa415f79bbe4
|
File details
Details for the file rexa-1.0.4-py3-none-any.whl.
File metadata
- Download URL: rexa-1.0.4-py3-none-any.whl
- Upload date:
- Size: 17.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
154a5f324773603372c67b4635726ead586a450b4e841352e2928a060aad4e0f
|
|
| MD5 |
8cac75fd1907b807aeaf1ce4c1384132
|
|
| BLAKE2b-256 |
03de88e6745ddd1c3501c0eca067b576bb63df5f1830974ecefef563246612ee
|