An enterprise-ready python utility package for XSS sanitization, SQLi detection, and null-byte input sanitization.
Project description
safeinput
A robust, lightweight, and high-reliability Python utility package for input validation, contextual XSS sanitization, and SQL injection heuristic detection.
safeinput offers an engineered, multi-tiered approach to input cleansing and threat scoring aligned with the OWASP Application Security Verification Standard (ASVS) Chapter 5. It shifts focus from destructive string manipulation to structural tokenization, preventing common evasion vectors like nested-tag bypasses and homoglyph normalization attacks without breaking legitimate textual communications.
Features
- Null Byte Cleansing (OWASP ASVS 5.1.2): Eliminates file-system poisoning and database truncation risks.
- Unicode Normalization (OWASP ASVS 5.1.4): Automatically converts full-width and compatibility characters (
NFKCform) to standard representation to block lookalike/homoglyph bypass tricks. - Contextual HTML/XSS Sanitizer (OWASP XSS Guidelines): Uses standard token streams (
html.parser) rather than fragile regular expressions to safely allow specific markup while filtering out dangerous attributes and protocols (javascript:,onclick, etc.). - Heuristic SQLi Detection: A non-destructive, multi-signature evaluation engine that computes a risk score based on structural queries and tautologies instead of blindly stripping text.
Installation
Install safeinput directly via pip:
pip install safeinput
For developmental environments (running tests):
pip install safeinput[dev]
Architecture Flow
To maximize security while maintaining high usability for legitimate user text, safeinput handles input processing systematically across independent specialized layers:
[ User Input String ]
│
▼
┌──────────────────────┐
│ utils.clean_bytes │ ──► Removes \x00 Null Bytes
└──────────────────────┘
│
▼
┌──────────────────────┐
│ utils.normalize │ ──► Normalizes Unicode Homoglyphs
└──────────────────────┘
│
┌──────┴──────┐
▼ ▼
┌───────────┐ ┌───────────┐
│ XSS │ │ SQLi │
│ Sanitizer │ │ Detector │
└───────────┘ └───────────┘
│ │
▼ ▼
[Clean Text/ [Risk Score
HTML] & Indicators]
Usage Guide
-
Basic String Utility & Normalization Eliminate hidden characters and force consistency before sending string payloads to deeper parsing rules.
from safeinput import clean_null_bytes, normalize_text # Remove dangerous null bytes that poison file systems or databases poisoned_input = "profile_photo.jpg\x00.exe" clean_path = clean_null_bytes(poisoned_input) print(clean_path) # Output: "profile_photo.jpg.exe" # Normalize homoglyph bypasses (e.g., full-width uppercase characters) bypass_attempt = "SELECT" normalized = normalize_text(bypass_attempt) print(normalized) # Output: "SELECT"
-
XSS and HTML Sanitization safeinput gives you absolute precision. You can either completely strip out markup or safely pass explicit formatting parameters.
from safeinput import strip_xss, sanitize_html # Aggressive approach: Strip all tags to extract plain text html_payload = "<script>alert('xss')</script>Welcome, <b>User</b>!" plain_text = strip_xss(html_payload) print(plain_text) # Output: "Welcome, User!" # Contextual approach: Allow structured design while isolating active execution vectors rich_text = "<p>Click <a href='javascript:alert(1)' onclick='bad()'>here</a> for your <b>profile</b></p>" allowed_tags = {"p", "b", "a"} allowed_attrs = {"href"} safe_html = sanitize_html(rich_text, allowed_tags=allowed_tags, allowed_attrs=allowed_attrs) print(safe_html) # Output: "<p>Click <a>here</a> for your <b>profile</b></p>" # Note: JavaScript protocols and inline handling macros were strictly dropped!
-
Non-Destructive SQLi Risk Telemetry Instead of changing the input text (which breaks common human language syntax), the package scores the risk of the structural grammar based on completely decoupled numeric and string tautology matching.
from safeinput import SQLiDetector detector = SQLiDetector(high_risk_threshold=0.60) # Legitimate text with SQL keywords legit_text = "I need to select a plan and update my account billing profile." result = detector.analyze(legit_text) print(result["is_malicious"]) # Output: False print(result["risk_score"]) # Output: 0.0 # Active exploit pattern exploit_text = "admin' OR '1'='1' --" result = detector.analyze(exploit_text) print(result["is_malicious"]) # Output: True print(result["risk_score"]) # Output: 0.70 (Crosses threshold metrics due to tautology + comment) print(result["matched_indicators"]) # Output: ['boolean_tautology', 'sql_comment_syntax']
Running Project Tests
Validation logic is maintained inside tests/test_sanitizer.py. To execute the test suite locally, ensure your environment handles pytest
pytest tests/
Security Disclaimer & OWASP Best Practices
safeinput acts as an analytical application gatekeeper layer. It is engineered to satisfy proactive defense-in-depth measures, but it is not a substitute for architecture-wide security foundations.
To remain fully secure and compliant with global enterprise standards:
- SQL Injection: This library is an intrusion detection and telemetry engine. It must not replace Parameterized Queries / Prepared Statements. In accordance with the OWASP SQL Injection Prevention Cheat Sheet, always enforce strict parameter separation inside your database drivers or native ORM architecture.
- Cross-Site Scripting (XSS): While input scrubbing is vital for rich text fields, your primary defense against XSS must always be Contextual Output Encoding. Ensure your application framework escapes data at the exact layer it is rendered to the DOM, following the rules laid out in the OWASP Cross-Site Scripting Prevention Cheat Sheet.
License
Distributed under the terms of the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file safeinput-0.1.0.tar.gz.
File metadata
- Download URL: safeinput-0.1.0.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d1de93d7f7037dc2c72c002b4428ec724b695e6e55e684286daa3dda670931a
|
|
| MD5 |
50f234fe6f0eba4e69dcea354412844f
|
|
| BLAKE2b-256 |
a085f4093a12af89bd82005ff6a732ed76e40473bebc528a450babe5f26a1f45
|
Provenance
The following attestation bundles were made for safeinput-0.1.0.tar.gz:
Publisher:
publish.yml on arsalananwar11/safeinput
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
safeinput-0.1.0.tar.gz -
Subject digest:
3d1de93d7f7037dc2c72c002b4428ec724b695e6e55e684286daa3dda670931a - Sigstore transparency entry: 1862247798
- Sigstore integration time:
-
Permalink:
arsalananwar11/safeinput@67f6502d3f1035b729a4ce52e71249b34367811b -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/arsalananwar11
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@67f6502d3f1035b729a4ce52e71249b34367811b -
Trigger Event:
release
-
Statement type:
File details
Details for the file safeinput-0.1.0-py3-none-any.whl.
File metadata
- Download URL: safeinput-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
014b032373c1b613a0bd8d62b916bc135dc98f5ff3ad50029b4bace522075373
|
|
| MD5 |
7d1b961aa3666576e87eec725da6d39c
|
|
| BLAKE2b-256 |
06d1138ff41cd0d484c2399d8d7f8567542b2409e6bafe8e97d0a126111bb2ff
|
Provenance
The following attestation bundles were made for safeinput-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on arsalananwar11/safeinput
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
safeinput-0.1.0-py3-none-any.whl -
Subject digest:
014b032373c1b613a0bd8d62b916bc135dc98f5ff3ad50029b4bace522075373 - Sigstore transparency entry: 1862248123
- Sigstore integration time:
-
Permalink:
arsalananwar11/safeinput@67f6502d3f1035b729a4ce52e71249b34367811b -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/arsalananwar11
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@67f6502d3f1035b729a4ce52e71249b34367811b -
Trigger Event:
release
-
Statement type: