Skip to main content

An enterprise-ready python utility package for XSS sanitization, SQLi detection, and null-byte input sanitization.

Project description

safeinput

A robust, lightweight, and high-reliability Python utility package for input validation, contextual XSS sanitization, and SQL injection heuristic detection.

License: MIT Python Version OWASP Compliance

safeinput offers an engineered, multi-tiered approach to input cleansing and threat scoring aligned with the OWASP Application Security Verification Standard (ASVS) Chapter 5. It shifts focus from destructive string manipulation to structural tokenization, preventing common evasion vectors like nested-tag bypasses and homoglyph normalization attacks without breaking legitimate textual communications.


Features

  • Null Byte Cleansing (OWASP ASVS 5.1.2): Eliminates file-system poisoning and database truncation risks.
  • Unicode Normalization (OWASP ASVS 5.1.4): Automatically converts full-width and compatibility characters (NFKC form) to standard representation to block lookalike/homoglyph bypass tricks.
  • Contextual HTML/XSS Sanitizer (OWASP XSS Guidelines): Uses standard token streams (html.parser) rather than fragile regular expressions to safely allow specific markup while filtering out dangerous attributes and protocols (javascript:, onclick, etc.).
  • Heuristic SQLi Detection & Redaction: A non-destructive, multi-signature evaluation engine that computes a risk score based on structural queries and tautologies, offering surgical redaction capabilities.

Installation

Install safeinput directly via pip:

pip install safeinput

For developmental environments (running tests):

pip install safeinput[dev]

Architecture Flow

To maximize security while maintaining high usability for legitimate user text, safeinput handles input processing systematically across independent specialized layers:

    [ User Input String ]
                               ┌──────────────────────┐
     utils.clean_bytes    ──► Removes \x00 Null Bytes
   └──────────────────────┘
                               ┌──────────────────────┐
     utils.normalize      ──► Normalizes Unicode Homoglyphs
   └──────────────────────┘
                     ┌──────┴──────┐
                     ┌───────────┐ ┌───────────┐
    XSS         SQLi      Sanitizer   Detector   └───────────┘ └───────────┘
                                         [Clean Text/  [Risk Score
    HTML]       & Surgical Redaction]

Usage Guide

  1. Basic String Utility & Normalization Eliminate hidden characters and force consistency before sending string payloads to deeper parsing rules.

    from safeinput import clean_null_bytes, normalize_text
    
    # Remove dangerous null bytes that poison file systems or databases
    poisoned_input = "profile_photo.jpg\x00.exe"
    clean_path = clean_null_bytes(poisoned_input)
    print(clean_path)  # Output: "profile_photo.jpg.exe"
    
    # Normalize homoglyph bypasses (e.g., full-width uppercase characters)
    bypass_attempt = "SELECT"
    normalized = normalize_text(bypass_attempt)
    print(normalized)  # Output: "SELECT"
    
  2. XSS and HTML Sanitization safeinput gives you absolute precision. You can either completely strip out markup or safely pass explicit formatting parameters.

    from safeinput import strip_xss, sanitize_html
    
    # Aggressive approach: Strip all tags to extract plain text
    html_payload = "<script>alert('xss')</script>Welcome, <b>User</b>!"
    plain_text = strip_xss(html_payload)
    print(plain_text)  # Output: "Welcome, User!"
    
    # Contextual approach: Allow structured design while isolating active execution vectors
    rich_text = "<p>Click <a href='javascript:alert(1)' onclick='bad()'>here</a> for your <b>profile</b></p>"
    allowed_tags = {"p", "b", "a"}
    allowed_attrs = {"href"}
    
    safe_html = sanitize_html(rich_text, allowed_tags=allowed_tags, allowed_attrs=allowed_attrs)
    print(safe_html)
    # Output: "<p>Click <a>here</a> for your <b>profile</b></p>"
    # Note: JavaScript protocols and inline handling macros were strictly dropped!
    
  3. Non-Destructive SQLi Risk Telemetry Instead of changing the input text (which breaks common human language syntax), the package scores the risk of the structural grammar based on completely decoupled numeric and string tautology matching.

    from safeinput import SQLiDetector
    
    detector = SQLiDetector(high_risk_threshold=0.60)
    
    # Legitimate text with SQL keywords
    legit_text = "I need to select a plan and update my account billing profile."
    result = detector.analyze(legit_text)
    print(result["is_malicious"])  # Output: False
    print(result["risk_score"])          # Output: 1
    print(result["matched_indicators"])  # Output: ['union_select_pattern']
    
    # Active exploit pattern
    exploit_text = "1 UNION SELECT username, password FROM users"
    result = detector.analyze(exploit_text)
    print(result["is_malicious"])       # Output: True
    print(result["risk_score"])          # Output: 1
    print(result["matched_indicators"])  # Output: ['boolean_tautology', 'sql_comment_syntax']
    
    # Surgical Redaction (Removes only offending structural patterns)
    mixed_text = "Hello admin' OR '1'='1' -- please drop the tables; DROP TABLE users"
    clean_text = detector.sanitize(mixed_text)
    print(clean_text)
    # Output: "Hello admin[REDACTED] please drop the tables[REDACTED]"
    

Running Project Tests

Validation logic is maintained inside tests/test_sanitizer.py. To execute the test suite locally, ensure your environment handles pytest

pytest tests/

Security Disclaimer & OWASP Best Practices

safeinput acts as an analytical application gatekeeper layer. It is engineered to satisfy proactive defense-in-depth measures, but it is not a substitute for architecture-wide security foundations.

To remain fully secure and compliant with global enterprise standards:

  1. SQL Injection: This library is an intrusion detection and telemetry engine. It must not replace Parameterized Queries / Prepared Statements. In accordance with the OWASP SQL Injection Prevention Cheat Sheet, always enforce strict parameter separation inside your database drivers or native ORM architecture.
  2. Cross-Site Scripting (XSS): While input scrubbing is vital for rich text fields, your primary defense against XSS must always be Contextual Output Encoding. Ensure your application framework escapes data at the exact layer it is rendered to the DOM, following the rules laid out in the OWASP Cross-Site Scripting Prevention Cheat Sheet.

License

Distributed under the terms of the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safeinput-1.0.0.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

safeinput-1.0.0-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file safeinput-1.0.0.tar.gz.

File metadata

  • Download URL: safeinput-1.0.0.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for safeinput-1.0.0.tar.gz
Algorithm Hash digest
SHA256 eb5d2ed594281a18d31d7f626af4334e2013afc2aaac3b865b21d71926b1a88a
MD5 5c5b548e5348b063d9c68903968e9ff4
BLAKE2b-256 d043c2f855b582e3f2cf97d2a8a41b23306f7c7364c76b41259f4f928c704608

See more details on using hashes here.

Provenance

The following attestation bundles were made for safeinput-1.0.0.tar.gz:

Publisher: publish.yml on arsalananwar11/safeinput

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file safeinput-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: safeinput-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for safeinput-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 693d22ea4c19b98e4c319eddf11f6832b6e7f20cea78f965e22fd68b9e183a8c
MD5 bd05fcd6dc7f73a5b28181ea6474add3
BLAKE2b-256 7e7dffa7d4550204a2cb1c5e6f6fad20b81da7d1ef4a45c24d7b409822173503

See more details on using hashes here.

Provenance

The following attestation bundles were made for safeinput-1.0.0-py3-none-any.whl:

Publisher: publish.yml on arsalananwar11/safeinput

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page