Skip to main content

A powerful regex and NLP utility library for text validation, extraction, conversion, formatting, and pattern generation.

Project description

🎉 Rexa Usage Guide Welcome to Rexa, a powerful Python library for regex operations and text preprocessing! Whether you're validating emails, extracting URLs, generating fuzzy regex patterns, or cleaning text for NLP pipelines, Rexa has you covered. This guide provides clear examples to help you harness Rexa's full potential.

🚀 Getting Started Install Rexa via pip: pip install rexa

Import and use: from rexa import Rex rex = Rex()

Example: Validate an email

print(rex.validator.Is_Email("user@example.com")) # True

Example: Generate fuzzy regex

print(rex.mapper.string_to_regex("i'm")) # \b[iIíìîïİ1!|]['‘’][mM]\b

Dependencies: Requires Python 3.10+, regex, dateparser, pydantic, nltk, langdetect. Source: GitHub | Issues

🔍 1. Validator (validation.py) Validate and match common patterns with ease:

Method Description Example

Is_Email(s) ✅ Checks if s is a valid email rex.validator.Is_Email("user@example.com") → True

Match_Email(s) 🔎 Returns Match object for email, else None rex.validator.Match_Email("bad@") → None

Is_URL(s) ✅ Validates HTTP/HTTPS URLs rex.validator.Is_URL("https://site.io") → True

Match_URL(s) 🔎 Matches URL and captures path/query m = rex.validator.Match_URL("site.com/path")

Is_Date_ISO(s) ✅ Checks YYYY-MM-DD date format rex.validator.Is_Date_ISO("2025-08-02") → True

Match_Date_ISO(s) 🔎 Captures ISO date, if present rex.validator.Match_Date_ISO("02/08/2025") → None

Is_IranianPhone(s) ✅ Validates Iranian mobile numbers rex.validator.Is_IranianPhone("09121234567") → True

Is_UUID4(s) ✅ Checks UUID v4 format rex.validator.Is_UUID4("123e4567-e89b-12d3-a456-426614174000") → True

And more Methods for time, ISBN, hex colors, etc. See validation.py for full list

📥 2. Extractor (extraction.py) Extract specific patterns from text:

Method Extracts… Example

Extract_Emails(text) All email addresses rex.extractor.Extract_Emails("a@a.com b@b.org") → ["a@a.com", "b@b.org"]

Extract_URLs(text) All web links (http/https/ftp) rex.extractor.Extract_URLs("Go to http://x.com") → ["http://x.com"]

Extract_Dates(text) Dates in ISO/EU formats rex.extractor.Extract_Dates("2021-01-01 or 01/01/2021") → ["2021-01-01", "01/01/2021"]

Extract_Phones(text) Phone numbers (intl & local) rex.extractor.Extract_Phones("+123456789, 09121234567") → ["+123456789", "09121234567"]

Extract_IPv4(text) IPv4 addresses rex.extractor.Extract_IPv4("192.168.1.1") → ["192.168.1.1"]

Extract_UUIDs(text) UUID v4 strings rex.extractor.Extract_UUIDs("123e4567-e89b-12d3-a456-426614174000") → ["123e4567-e89b-12d3-a456-426614174000"]

🔄 3. Converter (conversion.py) Normalize and reformat strings:

Method Description Example

Convert_MultipleSpaces(text) Collapses extra spaces rex.converter.Convert_MultipleSpaces("A B") → "A B"

Convert_ThousandSeparatedNumbers(text) Strips commas from large numbers rex.converter.Convert_ThousandSeparatedNumbers("1,000,000") → "1000000"

Convert_DateFormat(s, from, to) Swaps date separators rex.converter.Convert_DateFormat("01.01.2025", ".", "/") → "01/01/2025"

Slugify(text) Generates SEO-friendly URL slugs rex.converter.Slugify("Hello World!") → "hello-world"

✨ 4. Formatter (formatting.py) Clean and standardize text:

Method Description Example

Strip_HTMLTags(s) Removes HTML tags rex.formatter.Strip_HTMLTags("Hi") → "Hi"

Normalize_Spaces(s) Normalizes to single spaces rex.formatter.Normalize_Spaces("A B") → "A B"

Remove_ThousandSeparators(s) Drops commas in numbers rex.formatter.Remove_ThousandSeparators("1,234") → "1234"

Normalize_DateSeparator(s, sep) Standardizes date delimiters rex.formatter.Normalize_DateSeparator("2021/01.01", "-") → "2021-01-01"

🧠 5. Regex Mapper (regex_mapper.py) Generate fuzzy regex patterns and analyze text structure:

Method Description Example

string_to_regex(text, use_anchors=False, use_char_map=True, detailed=False) Generates a fuzzy regex pattern for text. Use use_anchors=True for ^...$, use_char_map=True for fuzzy matching (e.g., i → [iIíìîïİ1! ]), detailed=True for validation info.

string_to_breakdown(text, use_char_map=True, detailed=False) Breaks down text into regex components with metadata (type, pattern, count). detailed=True adds Unicode categories and variant matches. rex.mapper.string_to_breakdown("i'm") → `[{"type": "LETTER", "name": "fuzzy 'i'", "pattern": "[iIíìîïİ1!

get_all_patterns() Returns a dictionary of predefined patterns (email, URL, etc.) rex.mapper.get_all_patterns() → {"email": "^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,}$", ...}

Example with Detailed Output:

Detailed regex

pattern, details = rex.mapper.string_to_regex("i'm", detailed=True) print(pattern) # \b[iIíìîïİ1!|]['‘’][mM]\b print(details) # {'valid': True, 'score': 0.75, 'sample_matches': ["i'm", "I'm", "1'm"], 'explanation': 'Pattern matches original: True...'}

Detailed breakdown

breakdown = rex.mapper.string_to_breakdown("i'm", detailed=True) print(breakdown[0]) # {'type': 'LETTER', 'name': "fuzzy 'i'", 'pattern': '[iIíìîïİ1!|]', 'count': 1, 'example_chars': ['i'], 'unicode_category': 'Ll', 'variant_matches': ['i', 'I', '1']}

🧹 6. TextTools (texttools.py) Advanced NLP and text cleaning utilities:

Method Description Example

to_lower(s) Lowercases entire string rex.texttools.to_lower("HELLO") → "hello"

to_upper(s) Uppercases entire string rex.texttools.to_upper("hi") → "HI"

remove_emojis(s) Strips Unicode emojis rex.texttools.remove_emojis("I ❤️ you") → "I you"

remove_numbers(s) Removes all digits rex.texttools.remove_numbers("a1b2") → "ab"

remove_usernames(s) Removes @username tokens rex.texttools.remove_usernames("@me hi") → " hi"

remove_punctuation(s) Strips punctuation & symbols rex.texttools.remove_punctuation("Hey!?@") → "Hey"

remove_urls_emails(s) Drops URLs & email addresses rex.texttools.remove_urls_emails("a@b.com http://x") → " "

remove_stopwords(s) Filters common words (using NLTK) rex.texttools.remove_stopwords("the cat sits") → "cat sits"

lemmatize_text(s) Lemmatizes tokens rex.texttools.lemmatize_text("running") → "running"

stem_text(s) Stems tokens rex.texttools.stem_text("running") → "run"

normalize_whitespace(s) Collapses whitespace rex.texttools.normalize_whitespace(" A B\n") → "A B"

normalize_arabic(s) Persian/Arabic char mapping & diacritics rex.texttools.normalize_arabic("كیف") → "کیف"

count_tokens(s) Counts word tokens rex.texttools.count_tokens("a b c") → 3

remove_short_long_words(s, min, max) Keeps words in length range rex.texttools.remove_short_long_words("a bb ccc", 2, 3) → "bb ccc"

detect_language(s) Auto-detects text language rex.texttools.detect_language("hello") → "en"

clean_text(...kwargs) Pipeline for common cleaning options rex.texttools.clean_text("Hi @you 123 😊", lowercase=True, remove_emoji=True, remove_username=True, remove_urls_emails=True, remove_punct=True) → "hi"

🚀 Quick Tips

Mix & Match: Combine methods for complex workflows, e.g., rex.texttools.clean_text() followed by rex.extractor.Extract_Emails(). Fuzzy Matching: Use string_to_regex with use_char_map=True for robust pattern generation (e.g., matches i'm, I'm, 1'm). Performance: For large texts, batch process with map() or parallelize with multiprocessing. Extensibility: Subclass Rex to add custom patterns or preprocessing steps. Debugging: Enable logging (via utils.Logger) for detailed insights into regex generation and validation.

📚 Example Workflow Clean and validate user input: from rexa import Rex rex = Rex()

text = "Contact: alice@example.com, bob@TEST.org Date: 2025-08-02 😊" cleaned = rex.texttools.clean_text(text, lowercase=True, remove_emoji=True) emails = rex.extractor.Extract_Emails(cleaned) is_valid_date = rex.validator.Is_Date_ISO("2025-08-02") pattern = rex.mapper.string_to_regex("alice", use_char_map=True) print(cleaned) # contact: alice@example.com, bob@test.org date: 2025-08-02 print(emails) # ['alice@example.com', 'bob@test.org'] print(is_valid_date) # True print(pattern) # \b[aAáàâäãåā@4][lL1|][iIíìîïİ1!|][cCçćč][eEéèêëē3]\b

Happy coding with Rexa! Questions or feedback? Open an issue at https://github.com/arshia82sbn/rexa/issues.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rexa-1.0.3.tar.gz (29.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rexa-1.0.3-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file rexa-1.0.3.tar.gz.

File metadata

  • Download URL: rexa-1.0.3.tar.gz
  • Upload date:
  • Size: 29.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for rexa-1.0.3.tar.gz
Algorithm Hash digest
SHA256 1a47f817dea73d3426fc545342a36da9b488562325fe8ebbba05474ebc14b47a
MD5 9d899c03bbd26ed8a22476a2f4415c47
BLAKE2b-256 362b03de343c0d3e38ec935d8197cb64e97336346a2a6841d16072e5ce4b004a

See more details on using hashes here.

File details

Details for the file rexa-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: rexa-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for rexa-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c7b8b243ce3d2a638a1f0765d7da30af0068085627765091e74137e589849a40
MD5 2632e6acfb12e9d9d3b52dce32a69d99
BLAKE2b-256 3dc60f917b47421497d063c65651c40584904f52459ab9da0afbd379d066bd1f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page