Skip to main content

A large text field, stored compressed in the database, for Django and MySQL.

Project description

django-mysql-compressed-fields

This package provides CompressedTextField, a MySQL-specific Django model field similar to TextField or CharField that stores its value in compressed form via zlib.

In particular you can replace a TextField or CharField like:

from django.db import models

class ProjectTextFile(models.Model):
    content = models.TextField(blank=True)

with:

from django.db import models
from mysql_compressed_fields import CompressedTextField

class ProjectTextFile(models.Model):
    content = CompressedTextField(blank=True)

such that the text value of the field is actually compressed in the database.

String-based lookups are supported:

html_files = ProjectTextFile.objects.filter(content__contains='<html')
html_files = ProjectTextFile.objects.filter(content__startswith='<!DOCTYPE')
html_files = ProjectTextFile.objects.filter(content__endswith='</html>')
empty_html_files = ProjectTextFile.objects.filter(content__in=['', '<html></html>'])

Advanced manipulations with MySQL's COMPRESS(), UNCOMPRESS(), and UNCOMPRESSED_LENGTH() functions are also supported:

from django.db.models import F
from mysql_compressed_fields import UncompressedLength

files = ProjectTextFile.objects.only('id').annotate(
    content_length=UncompressedLength(F('content'))
)

Dependencies

  • Django 3.2 or later required
  • MySQL 5.7 or later required
  • ...and nothing else 🎉

License

MIT

Migration Steps

To migrate an existing TextField or CharField to be a CompressedTextField:

  • Install this package:
    • pip3 install django-mysql-compressed-fields
  • Find an existing Django model with an uncompressed TextField or CharField that you want to compress. For example:
from django.db import models

class ProjectTextFile(models.Model):
    content = models.TextField(blank=True)
  • Add a *_compressed sibling field that will be used to hold the compressed version of the original field. Mark it as default=''. Include an explicit db_column=... value:
from django.db import models
from mysql_compressed_fields import CompressedTextField

class ProjectTextFile(models.Model):
    content = models.TextField(blank=True)
    content_compressed = CompressedTextField(
        blank=True,
        default='',  # needed by Django when adding a field
        db_column='content_compressed',  # pin column name
    )
  • Generate a migration to add the compressed field:
    • python3 manage.py makemigrations
  • Generate a new empty migration in the same app where the field is defined, which we will use to populate the compressed field:
    • python3 manage.py makemigrations --empty __APP_NAME__
  • Open the empty migration file. It should look something like:
from django.db import migrations

class Migration(migrations.Migration):
    dependencies = [
        ('ide', '0002_projecttextfile_content_compressed'),
    ]

    operations = [
    ]
  • Edit the operations field to use a RunPython step to populate the compressed field from the uncompressed field:
from django.db import migrations
from django.db.models import F
from mysql_compressed_fields import Compress

def _populate_content_compressed(apps, schema_editor):
    ProjectTextFile = apps.get_model('ide', 'ProjectTextFile')
    # NOTE: Assumes "content" field is already UTF-8 encoded,
    #       because CompressedTextField assumes UTF-8 encoding.
    ProjectTextFile.objects.update(content_compressed=Compress(F('content')))

class Migration(migrations.Migration):
    dependencies = [
        ('ide', '0002_projecttextfile_content_compressed'),
    ]

    operations = [
        migrations.RunPython(
            code=_populate_content_compressed,
            reverse_code=migrations.RunPython.noop,
            atomic=False,
        )
    ]
  • Remove the original uncompressed field from the model, leaving only the compressed field remaining:
from django.db import models
from mysql_compressed_fields import CompressedTextField

class ProjectTextFile(models.Model):
    content_compressed = CompressedTextField(
        blank=True,
        default='',  # needed by Django when adding a field
        db_column='content_compressed',  # pin column name
    )
  • Generate a migration to remove the uncompressed field:
    • python3 manage.py makemigrations
  • Rename the compressed field without the *_compressed suffix so that it now has the name of the original uncompressed field:
from django.db import models
from mysql_compressed_fields import CompressedTextField

class ProjectTextFile(models.Model):
    content = CompressedTextField(
        blank=True,
        default='',  # needed by Django when adding a field
        db_column='content_compressed',  # pin column name
    )
  • Generate a migration to rename the field:
    • python3 manage.py makemigrations
    • When prompted whether the field was renamed, answer y (for yes).
  • You now have a compressed version of the original field. All done! 🎉

Sponsor

This project is brought to you by TechSmart, which seeks to inspire the next generation of K-12 teachers and students to learn coding and create amazing things with computers. We use Django heavily.

API Reference

All classes and functions below should be imported directly from mysql_compressed_fields. For example:

from mysql_compressed_fields import CompressedTextField

Fields

CompressedTextField

class CompressedTextField(encode_errors='strict', decode_errors='strict', **options)

A large text field, stored compressed in the database.

Generally behaves like TextField. Stores values in the database using the same database column type as BinaryField. The value is compressed in the same format that MySQL's COMPRESS() function uses. Compression and decompression is performed by Django and not the database.

encode_errors controls how encoding errors are handled when saving the field. decode_errors controls how decoding errors are handled when loading the field. If 'strict' (the default), a UnicodeError exception is raised. Other possible values are 'ignore', 'replace', and any other name registered via codecs.register_error(). See Error Handlers for details.

If you specify a max_length attribute, it will be reflected in the Textarea widget of the auto-generated form field. However it is not enforced at the model or database level. The max_length applies to the length of the uncompressed text rather than the compressed text.

String-based lookups can be used with this field type. Such lookups will transparently decompress the field on the database server.

html_files = ProjectTextFile.objects.filter(content__contains='<html')
html_files = ProjectTextFile.objects.filter(content__startswith='<!DOCTYPE')
html_files = ProjectTextFile.objects.filter(content__endswith='</html>')
empty_html_files = ProjectTextFile.objects.filter(content__in=['', '<html></html>'])

Note that F-expressions that reference this field type will always refer to the compressed value rather than the uncompressed value. So you may need to use the Compress() and Uncompress() database functions manually when working with F-expressions.

# Copy a TextField value (in utf8 collation) to a CompressedTextField
ProjectTextFile.objects.filter(...).update(content=Compress(F('name')))

# Copy a CompressedTextField value to a TextField (in utf8 collation)
ProjectTextFile.objects.filter(...).update(name=Uncompress(F('content')))

# Copy a CompressedTextField value to a CompressedTextField
ProjectTextFile.objects.filter(...).update(content=F('content'))

The default form widget for this field is a django.contrib.admin.widgets.AdminTextareaWidget (a kind of TextInput).

Database functions

Compress

The MySQL COMPRESS() function, usable in F() expressions.

Uncompress

The MySQL UNCOMPRESS() function, usable in F() expressions.

UncompressedLength

The MySQL UNCOMPRESSED_LENGTH() function, usable in F() expressions.

compress

def compress(uncompressed_bytes: bytes) -> bytes:

The MySQL COMPRESS() function.

uncompress

def uncompress(compressed_bytes: bytes) -> bytes:

The MySQL UNCOMPRESS() function.

uncompressed_length

def uncompressed_length(compressed_bytes: bytes) -> int:

The MySQL UNCOMPRESSED_LENGTH() function.

compressed_length

def compressed_length(
    uncompressed_bytes: bytes,
    *, chunk_size: int=64 * 1000,
    stop_if_greater_than: Optional[int]=None) -> int:

Returns the length of COMPRESS(uncompressed_bytes).

If stop_if_greater_than is specified and a result greater than stop_if_greater_than is returned then the compressed length is no less than the returned result.

Running Tests

  • Install Docker.
  • Install MySQL CLI tools:
  • Add MySQL CLI tools to path:
    • export PATH="/usr/local/opt/mysql-client@8.0/bin:$PATH"
  • Start MySQL server:
    • docker run --name ide_db_server -e MYSQL_DATABASE=ide_db -e MYSQL_ROOT_PASSWORD=root -p 127.0.0.1:8889:3306 -d mysql:8.0
  • Run tests:
    • cd tests/test_data/mysite
    • poetry install
    • poetry run python3 manage.py test

Release Notes

v1.2.0

  • Add the encode_errors and decode_errors options to CompressedTextField.

v1.1.0

  • Fix to support Django 4.1.

v1.0.1

  • Add logo.

v1.0.0

  • Initial release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django_mysql_compressed_fields-1.2.0.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file django_mysql_compressed_fields-1.2.0.tar.gz.

File metadata

File hashes

Hashes for django_mysql_compressed_fields-1.2.0.tar.gz
Algorithm Hash digest
SHA256 8c6412a23b0e38daa21a5167e15e358cbfd698b8efa3ab35bcdef285e9a0694c
MD5 3887bbcfa6593d1883b9db2b42eebdd4
BLAKE2b-256 13c382d09106f5b3818296995352e472f8ba5c1f0b5a2f6114ce4e4e29a2b1a5

See more details on using hashes here.

File details

Details for the file django_mysql_compressed_fields-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for django_mysql_compressed_fields-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2345241c23460a1483fc5ecb2ec4968c9e3187e4002572d102c045f0a0a7d2f4
MD5 ee405d87d69623f4caa9bf4eb156b3a6
BLAKE2b-256 5a7bd016cd08f354a55b42ab46e6f9d0ed308123d15b06201a15d33651ff2f9c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page