Skip to main content

An enterprise-ready python utility package for XSS sanitization, SQLi detection, and null-byte input sanitization.

Project description

safeinput

A robust, lightweight, and high-reliability Python utility package for input validation, contextual XSS sanitization, and SQL injection heuristic detection.

License: MIT Python Version OWASP Compliance

safeinput offers an engineered, multi-tiered approach to input cleansing and threat scoring aligned with the OWASP Application Security Verification Standard (ASVS) Chapter 5. It shifts focus from destructive string manipulation to structural tokenization, preventing common evasion vectors like nested-tag bypasses and homoglyph normalization attacks without breaking legitimate textual communications.


Features

  • Null Byte Cleansing (OWASP ASVS 5.1.2): Eliminates file-system poisoning and database truncation risks.
  • Unicode Normalization (OWASP ASVS 5.1.4): Automatically converts full-width and compatibility characters (NFKC form) to standard representation to block lookalike/homoglyph bypass tricks.
  • Contextual HTML/XSS Sanitizer (OWASP XSS Guidelines): Uses standard token streams (html.parser) rather than fragile regular expressions to safely allow specific markup while filtering out dangerous attributes and protocols (javascript:, onclick, etc.).
  • Heuristic SQLi Detection: A non-destructive, multi-signature evaluation engine that computes a risk score based on structural queries and tautologies instead of blindly stripping text.

Installation

Install safeinput directly via pip:

pip install safeinput

For developmental environments (running tests):

pip install safeinput[dev]

Architecture Flow

To maximize security while maintaining high usability for legitimate user text, safeinput handles input processing systematically across independent specialized layers:

    [ User Input String ]
                               ┌──────────────────────┐
     utils.clean_bytes    ──► Removes \x00 Null Bytes
   └──────────────────────┘
                               ┌──────────────────────┐
     utils.normalize      ──► Normalizes Unicode Homoglyphs
   └──────────────────────┘
                     ┌──────┴──────┐
                     ┌───────────┐ ┌───────────┐
    XSS         SQLi      Sanitizer   Detector   └───────────┘ └───────────┘
                                         [Clean Text/  [Risk Score
    HTML]       & Indicators]

Usage Guide

  1. Basic String Utility & Normalization Eliminate hidden characters and force consistency before sending string payloads to deeper parsing rules.

    from safeinput import clean_null_bytes, normalize_text
    
    # Remove dangerous null bytes that poison file systems or databases
    poisoned_input = "profile_photo.jpg\x00.exe"
    clean_path = clean_null_bytes(poisoned_input)
    print(clean_path)  # Output: "profile_photo.jpg.exe"
    
    # Normalize homoglyph bypasses (e.g., full-width uppercase characters)
    bypass_attempt = "SELECT"
    normalized = normalize_text(bypass_attempt)
    print(normalized)  # Output: "SELECT"
    
  2. XSS and HTML Sanitization safeinput gives you absolute precision. You can either completely strip out markup or safely pass explicit formatting parameters.

    from safeinput import strip_xss, sanitize_html
    
    # Aggressive approach: Strip all tags to extract plain text
    html_payload = "<script>alert('xss')</script>Welcome, <b>User</b>!"
    plain_text = strip_xss(html_payload)
    print(plain_text)  # Output: "Welcome, User!"
    
    # Contextual approach: Allow structured design while isolating active execution vectors
    rich_text = "<p>Click <a href='javascript:alert(1)' onclick='bad()'>here</a> for your <b>profile</b></p>"
    allowed_tags = {"p", "b", "a"}
    allowed_attrs = {"href"}
    
    safe_html = sanitize_html(rich_text, allowed_tags=allowed_tags, allowed_attrs=allowed_attrs)
    print(safe_html)
    # Output: "<p>Click <a>here</a> for your <b>profile</b></p>"
    # Note: JavaScript protocols and inline handling macros were strictly dropped!
    
  3. Non-Destructive SQLi Risk Telemetry Instead of changing the input text (which breaks common human language syntax), the package scores the risk of the structural grammar based on completely decoupled numeric and string tautology matching.

    from safeinput import SQLiDetector
    
    detector = SQLiDetector(high_risk_threshold=0.60)
    
    # Legitimate text with SQL keywords
    legit_text = "I need to select a plan and update my account billing profile."
    result = detector.analyze(legit_text)
    print(result["is_malicious"])  # Output: False
    print(result["risk_score"])     # Output: 0.0
    
    # Active exploit pattern
    exploit_text = "admin' OR '1'='1' --"
    result = detector.analyze(exploit_text)
    print(result["is_malicious"])       # Output: True
    print(result["risk_score"])          # Output: 0.70 (Crosses threshold metrics due to tautology + comment)
    print(result["matched_indicators"])  # Output: ['boolean_tautology', 'sql_comment_syntax']
    

Running Project Tests

Validation logic is maintained inside tests/test_sanitizer.py. To execute the test suite locally, ensure your environment handles pytest

pytest tests/

Security Disclaimer & OWASP Best Practices

safeinput acts as an analytical application gatekeeper layer. It is engineered to satisfy proactive defense-in-depth measures, but it is not a substitute for architecture-wide security foundations.

To remain fully secure and compliant with global enterprise standards:

  1. SQL Injection: This library is an intrusion detection and telemetry engine. It must not replace Parameterized Queries / Prepared Statements. In accordance with the OWASP SQL Injection Prevention Cheat Sheet, always enforce strict parameter separation inside your database drivers or native ORM architecture.
  2. Cross-Site Scripting (XSS): While input scrubbing is vital for rich text fields, your primary defense against XSS must always be Contextual Output Encoding. Ensure your application framework escapes data at the exact layer it is rendered to the DOM, following the rules laid out in the OWASP Cross-Site Scripting Prevention Cheat Sheet.

License

Distributed under the terms of the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safeinput-0.1.0.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

safeinput-0.1.0-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file safeinput-0.1.0.tar.gz.

File metadata

  • Download URL: safeinput-0.1.0.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for safeinput-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3d1de93d7f7037dc2c72c002b4428ec724b695e6e55e684286daa3dda670931a
MD5 50f234fe6f0eba4e69dcea354412844f
BLAKE2b-256 a085f4093a12af89bd82005ff6a732ed76e40473bebc528a450babe5f26a1f45

See more details on using hashes here.

Provenance

The following attestation bundles were made for safeinput-0.1.0.tar.gz:

Publisher: publish.yml on arsalananwar11/safeinput

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file safeinput-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: safeinput-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for safeinput-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 014b032373c1b613a0bd8d62b916bc135dc98f5ff3ad50029b4bace522075373
MD5 7d1b961aa3666576e87eec725da6d39c
BLAKE2b-256 06d1138ff41cd0d484c2399d8d7f8567542b2409e6bafe8e97d0a126111bb2ff

See more details on using hashes here.

Provenance

The following attestation bundles were made for safeinput-0.1.0-py3-none-any.whl:

Publisher: publish.yml on arsalananwar11/safeinput

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page