Skip to main content

A simple Django app that will give you a cleaned HTML field.

Project description

django-cleanhtmlfield

PyPI Test Status Codecov

django-cleanhtmlfield is a simple Django application that defines an HTMLField that automatically removes potentially malicious content.

For instance, if you allow the user to freely input HTML Content, and the user decides to inject a JavaScript snippet:

<h1>Hello Friend</h1>
<script type="text/javascript">
    steal_all_passwords();
</script>
<p>This is for you!</p>

HTMLField will filter this to

<h1>Hello Friend</h1>
<p>This is for you!</p>

Quick start

  1. Download and install using pip install from PyPi:
pip install django-cleanhtmlfield
  1. Create a field HTMLField(strip_unsafe=True) in your model:
from django.db import models
from django_cleanhtmlfield.fields import HTMLField


class MyModel(models.Model):
    some_content = HTMLField(strip_unsafe=True)
  1. Don't forget to create and run migrations for changes on Django models, e.g.:
python manage.py makemigrations
python manage.py migrate

Requirements

For HTML content to be parsed and processed we depend on BeautifulSoup4 (this is installed as a dependency). As this is a Django app, it obviously requires Django (though we expect this to be already installed).

Optional: If you want a WYSIWYG Interface in your Admin Panel (or any other Django Form), you need the django-ckeditor package.

alt text

Field options

  • strip_unsafe (Default: False) needs to be set to True to enable stripping of unsafe HTML content
  • widget_form_class (Default: None) - allows overwriting the css form class for the widget (e.g., if you want to use django-ckeditor)

Configuration options

The following Django Settings are available (see below for a full example)

  • ACCEPTABLE_ELEMENTS - tuple that contains all allowed HTML tags (e.g., 'a', 'span', 'p', 'div', ...)
  • ACCEPTABLE_ATTRIBUTES - tuple that contains all allowed HTMl attributes (e.g., 'alt', 'style', 'target', 'title',...)
  • ACCEPTABLE_STYLES - tuple that contains all allowed CSS styles (e.g., 'background-color', 'border-color', 'font-size', ...)
  • REMOVE_WITH_CONTENT - tuple that contains potentially malicious HTML tags that will automatically be removed (e.g., 'script', 'object', ...)
  • PRESERVE_STYLES_WHITESPACE - optional boolean that can be used to preserve the whitespace within styles (e.g., 'padding: 9px;' stays 'padding: 9px;') - the default behaviour strips the whitespaces so (e.g., 'padding: 9px;' becomes 'padding:9px;')

Example:

ACCEPTABLE_ELEMENTS = (
    'a', 'abbr', 'acronym', 'address', 'area', 'aria-label', 'b', 'big',
    'blockquote', 'br', 'button', 'caption', 'center', 'cite', 'code', 'col',
    'colgroup', 'dd', 'del', 'dfn', 'dir', 'div', 'dl', 'dt', 'em',
    'font', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'i', 'img',
    'ins', 'kbd', 'label', 'legend', 'li', 'map', 'menu', 'ol',
    'p', 'pre', 'q', 's', 'samp', 'small', 'span', 'strike',
    'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th',
    'thead', 'tr', 'tt', 'u', 'ul', 'var', 'iframe', 'section', 'article',
)

ACCEPTABLE_ATTRIBUTES = (
    'abbr', 'accept', 'accesskey',
    'action', 'align', 'alt', 'axis', 'border', 'cellpadding', 'cellspacing',
    'char', 'charoff', 'charset', 'checked', 'cite', 'class', 'clear', 'cols',
    'colspan', 'color', 'compact', 'coords', 'data-mlang', 'data-equation', 'datetime', 'dir',
    'enctype', 'for', 'headers', 'height', 'href', 'hreflang', 'hspace',
    'id', 'ismap', 'label', 'lang', 'longdesc', 'maxlength', 'method',
    'multiple', 'name', 'nohref', 'noshade', 'nowrap', 'prompt',
    'rel', 'rev', 'rows', 'rowspan', 'role', 'rules', 'scope', 'shape', 'size', 'style',
    'span', 'src', 'start', 'summary', 'tabindex', 'target', 'title', 'type',
    'usemap', 'valign', 'value', 'vspace', 'width',
)

ACCEPTABLE_STYLES = (
    'background-color', 'background', 'background-image', 'background-position', 'background-size', 'background-repeat',
    'background-attachment', 'background-origin', 'background-clip',
    'font-family', 'font-size', 'font-weight', 'font-style', 'color',
    'width', 'height', 'min-width', 'max-width', 'min-height', 'max-height', 'line-height',
    'text-decoration', 'text-transform', 'text-align', 'border', 'border-style', 'border-width',
    'border-top', 'border-bottom', 'border-left', 'border-right', 'border-top-style',
    'border-bottom-style', 'border-left-style', 'border-right-style', 'border-top-width',
    'border-bottom-width', 'border-left-width', 'border-right-width',
    'border-color',
    'border-top-color', 'border-bottom-color', 'border-left-color', 'border-spacing', 'border-collapse',
    'border-right-color',
    'border-radius',
    'vertical-align', 'clear', 'float',
    'margin', 'margin-left', 'margin-right', 'margin-top', 'margin-bottom',
    'outline',
    'padding', 'padding-left', 'padding-right', 'padding-top', 'padding-bottom',

)

REMOVE_WITH_CONTENT = ('script', 'object', 'embed', 'style', 'form', )

PRESERVE_STYLES_WHITESPACE = False

Compatibility with Django REST Framework

If you are using Django Rest Framework you need to add the following code to register a serializer/field handler:

from django.utils.translation import gettext_lazy as _

from rest_framework import fields
from rest_framework.serializers import ModelSerializer

from django_cleanhtmlfield.fields import HTMLField
from django_cleanhtmlfield.helpers import clean_html

class RestHtmlField(fields.CharField):
    default_error_messages = {
        'invalid': _('"{input}" is not a valid html.')
    }
    default_empty_html = False
    initial = False

    def __init__(self, **kwargs):
        super(RestHtmlField, self).__init__(**kwargs)

    def to_internal_value(self, data):
        return clean_html(data, strip_unsafe=True)


ModelSerializer.serializer_field_mapping[HTMLField] = RestHtmlField

Compatibility Matrix

This library should be compatible with the latest Django. For reference, here is a matrix showing the guaranteed and tested compatibility.

django-cleanhtmlfield Version Django Versions Python
1.1 2.2, 3.0, 3.1 3.5 - 3.8
1.2 2.2, 3.1, 3.2 3.7 - 3.10
1.3 3.2, 4.0, 4.1 3.7 - 3.10
1.4 4.2, 5.0, 5.1 3.9 - 3.13

License

MIT License

Development and Tests

The test app is located in the tests subfolder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django_cleanhtmlfield-1.4.0.tar.gz (82.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

django_cleanhtmlfield-1.4.0-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file django_cleanhtmlfield-1.4.0.tar.gz.

File metadata

  • Download URL: django_cleanhtmlfield-1.4.0.tar.gz
  • Upload date:
  • Size: 82.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for django_cleanhtmlfield-1.4.0.tar.gz
Algorithm Hash digest
SHA256 e43e71d5a518a36644b8eab2189c611b3cf39a8c92c40fbb82cbb2528070fecd
MD5 1ab68686f7e9d391a71efa1b6007689f
BLAKE2b-256 7400d2790389706e1463262484861c736a8f9c8a35584e8d1f67d8888240161f

See more details on using hashes here.

File details

Details for the file django_cleanhtmlfield-1.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for django_cleanhtmlfield-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4be632786d3f607f496322adebe5219f84382b126a876924e39d2d3151622540
MD5 ed96ff5ad060d9caabd7faeebb35f6a0
BLAKE2b-256 a18c72f10f95601f63dc68a5bca62adc91a38563fea97ebe30611e27152c99f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page