RegXon is a powerful validator, sanitizer and content parser that you're searching for decades.
Project description
RegXon
RegXon is a powerful validator, sanitizer and content parser that you're searching for decades.
Installation
pip install regxon
Usage
from regxon.common import Regxon
regxon = Regxon()
General Validation
General validation includes email, domain, url and ipv4.
Validate Email
from regxon.common import Regxon
regxon = Regxon()
regxon.is_email('xyz@.com') # None
regxon.is_email('xyz@cpx.com') # returns a proper Match object; you can grab the match with `.string`
Validate Domain
from regxon.common import Regxon
regxon = Regxon()
regxon.is_domain('xyzcom') # None
regxon.is_domain('xyz.com') # returns a proper Match object; you can grab the match with `.string`
Validate URL
from regxon.common import Regxon
regxon = Regxon()
regxon.is_url('xyz.com') # None
regxon.is_url('https://xyz.com') # returns a proper Match object
Validate HTTP URL
from regxon.common import Regxon
regxon = Regxon()
regxon.is_http_url('xyz.com') # None; returns None if the url is not http
regxon.is_http_url('ftp://xyz.com') # None; returns None if the url is not http
regxon.is_http_url('http://django.c') # None; returns None because `.c` is not a valid domain
regxon.is_http_url('https://xyz.com') # returns a proper Match object; you can grab the match with `.string`
Validate IP
from regxon.common import Regxon
regxon = Regxon()
# 1, 2 both are same and return a proper Match, as default schema is ""
regxon.is_ipv4('127.0.0.1') # 1
regxon.is_ipv4('127.0.0.1', schema='') # 2; matches because 127.0.0.1 has no schema
regxon.is_ipv4('http://127.0.0.1') # returns None as schema is not matched; "http" != ""
regxon.is_ipv4('http://127.0.0.1', schema='') # returns None as schema is not matched; "http" != ""
regxon.is_ipv4('http://127.0.0.1', schema='http') # returns a proper Match
regxon.is_ipv4('https://127.0.0.1', schema='http') # returns None as schema is not matched; "https" != "http"
Validate Phone Number
from regxon.common import Regxon
regxon = Regxon()
regxon.is_phone('+91 1234567890') # returns a proper Match object; you can grab the match with `.string`
HTML Sanitization and Validation
RegXon provides a powerful HTML sanitizer and validator that you're searching for decades. It's a combination of html5lib and beautifulsoup4.
You "how to remove an attribute from HTML tag" problem is solved now. Or another problem of "how to remove a tag from HTML" is also solved.
from regxon.html import RegxonHTML
regxon_html = RegxonHTML()
html_content = """
<img onload="alert(1)" onerror="hey" src="http://example.com" />
<script>alert(1)</script>
"""
html = regxon_html.get_sanitized_content(html_content)
print(html)
The above code will print the following output
<img onerror="hey"/>
Add custom excluded attributes for any tag you want
from regxon.html import RegxonHTML
regxon_html = RegxonHTML()
html_content = """
<img onload="alert(1)" onerror="hey" src="http://example.com" />
<script>alert(1)</script>
"""
# Add custom excluded attributes for any tag you want
regxon_html.excluded_attributes.update({
'img': regxon_html.excluded_attributes['img'] + ['onerror'],
})
The above code will print the following output
<img/>
Purpose of RegXon
- Sanitize HTML; remove unwanted tags and attributes; XSS prevention
- Validate IP, URL, Domain; SSRF prevention
- Validate Email; Email spoofing prevention
- Validate Phone Number; Phone number spoofing prevention
License
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Authors
Acknowledgements
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file regxon-0.0.9-py3-none-any.whl
.
File metadata
- Download URL: regxon-0.0.9-py3-none-any.whl
- Upload date:
- Size: 4.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4bac3f9b20994cc383ec423696e18e3073a9eb48bc382fdd2a846361adf9de5a |
|
MD5 | 27355777f0b40ca1169556e458e243c2 |
|
BLAKE2b-256 | 966142d5d87ecf8a898c25895aa28534450f41645b47fb065e5a949a462d3da0 |