Classify contact form messages as spam or not.
Project description
Django Spam Classifier
Contact form spam getting you down? We know the feeling. It's demeaning, draining and relentless.
This a very basic Django app that uses dbacl
Bayesian text classification tool
to filter out contact form spam. It's not perfect, but it works very well on
blocking the really offensive English text spam. The app was written to avoid
depending on external services like reCAPTCHA or Akismet - these services work
well enough, but introduce some privacy concerns.
Limitations
Update July 2024: The author is no longer actively using or maintaining this package and is instead replacing website contact forms with email links. While django-spam-classifier
is reasonably effective when trained, the increasing volume of automated contact form spam means that even a small proportion getting through is overwhelming for many small websites.
Currently doesn't work so well on non-English text, very short input, garbage
input or HTML only with a single hyperlink. It's possible that dbacl
may have
options to deal more effectively with this.
Additionally, dbacl
seems to be not so actively maintained, and is currently
not available on Debian Bullseye. I may switch to bogofilter
or other Bayesian
filtering options in the future.
Getting started
-
Install
django-spam-classifier
-
Install
dbacl
via your OS package manager -
Add a
BASE_DIR
setting -
Enable Django
django.contrib.sites
app and configure your site domain via Django Admin (used for training links in emails) -
Add
'classifier'
to yourINSTALLED_APPS
setting -
Add
path('', include('classifier.urls')),
to your project'surls.py
-
Run
python manage.py migrate
-
Create the
classifier_data
directory to hold the classifier database -
In contact form call
classifier.is_spam()
on all text accepted by your form:spam, submission = is_spam('\n'.join(submission_fields)) if spam: # Throw away the form submission and don't notify anyone. else: # Process the form submission as normal.
Doing so will internally use
dbacl
to classify the submission as spam or not spam and generate a confidence of 0-100. Spam/not-spam with a high confidence is processed as you'd expect. If the confidence is below theRECORD_AND_DISCARD_CONFIDENCE
, the submission is treated as not spam because confidence is too low to make a safe decision. The body is recorded in theSubmissions
model and can be manually classified via the Django Admin. If the confidence is aboveRECORD_AND_DISCARD_CONFIDENCE
but belowSILENTLY_DISCARD_CONFIDENCE
, the submission is treated as confidently spam, but also recorded to theSubmissions
model for manual classification. -
Add a training link to the footer of any notification email you send::
email_body = email_body + spam_footer(submission, site)
Which will output something like:
-- Spam score: spam (15% confidence) Train as spam: https://example.com/classifier/1704/spam/ Train as not spam: https://example.com/classifier/1704/not-spam/
-
Ensure you have a logging configuration set up so you can see log messages
-
Add a cron job to regularly (eg. daily) update the training database with any new manual classifications you've made:
python manage.py train
-
Visit the Django Admin and classify the low-confidence submissions you receive.
-
Tune the Django settings as desired (optional):
CLASSIFIER = { 'SILENTLY_DISCARD_CONFIDENCE': 90, # Defaults to 80 'RECORD_AND_DISCARD_CONFIDENCE': 75, # Defaults to 60 }
Development
Create a venv and install the development requirements:
python3 -m python3.8 -m venv --system-site-packages [VENV-PATH]
source [VENV_PATH]/bin/activate
python -m pip install Django pytz
TODO: There is undoubtedly a better way of installing dev-dependencies. Perhaps poetry or flit? Are they the only tools that handle this? What's generally accepted?
Run tests with tox
or:
PYTHONPATH=src:.:$PYTHONPATH DJANGO_SETTINGS_MODULE=tests.test_settings pytest tests
Create migrations with:
DJANGO_SETTINGS_MODULE=tests.test_settings python -m django makemigrations
Release History
0.1.0 (2022-08-26)
- Add some manual labelling to improve performance on non-English text and HTML
- Add admin filter for auto and manual spam status
- Update URLConfs for Django 4
0.0.7 (2021-10-01)
- Add admin actions to bulk mark spam/not-spam
- Add tox
0.0.6 (2021-03-15)
- Respond with a 404 if a classifier submission doesn't exist
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for django_spam_classifier-0.1.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a54843aae25c4b21c7af93239f9b86d1ad7126f4cc00722b0bb0eadbcec74c6 |
|
MD5 | 84f83d19b5c87854adfb7b6dee9d2b55 |
|
BLAKE2b-256 | e7402e172d49a02aea07742494bcab747a37397dc52a4ca43935ec3f404087ab |
Hashes for django_spam_classifier-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | edfcd88520ed833fdd9c0d2eb7ede46026e3c440f5af6b238b51b9a1b480e849 |
|
MD5 | f10fc1ac680ad171f150a52c2550febc |
|
BLAKE2b-256 | cc5dd5ecd75dbe94866de3a71e7709974b10676a474c42c64a906696786d8783 |