Skip to main content

Pack/unpack Python dicts into/out of MariaDB's Dynamic Columns format.

Project description

https://img.shields.io/pypi/v/mariadb-dyncol.svg https://img.shields.io/travis/adamchainz/mariadb-dyncol/master.svg

Pack/unpack Python dicts into/out of MariaDB’s Dynamic Columns format.

A quick example:

>>> mariadb_dyncol.pack({"key": "value"})
b'\x04\x01\x00\x03\x00\x00\x00\x03\x00key!value'
>>> mariadb_dyncol.unpack(mariadb_dyncol.pack({"key": "value"}))
{'key': 'value'}

Installation

Use pip:

pip install mariadb-dyncol

Tested on Python 3.6. Python 3.4+ supported.

Features

  • Sensible type mapping from Python to SQL
  • Tested against examples from MariaDB, including property/fuzz testing with hypothesis (which is amazing and found many bugs)

Why?

The normal way for adding data into dynamic columns fields is with the COLUMN_CREATE function, and its relatives. This allows you to do things like:

INSERT INTO mytable (attrs) VALUES (COLUMN_CREATE('key', 'value'))

Unfortunately the Django ORM is restricted and cannot use database functions like this in every instance, at least not until Django 1.9. It was this limitation I hit whilst implementing a dynamic columns field for my project django-mysql that spurred the creation of this library.

By pre-packing the dynamic columns, the above query can just insert the blob of data directly:

INSERT INTO mytable (attrs) VALUES (X'0401000300000003006B65792176616C7565')

Asides from being more easily implemented with the Django ORM, this approach of packing/unpacking dynamic columns in Python also has some advantages:

  • All data types are properly preserved in Python. The only way MariaDB provides of pulling back all values for a dynamic columns field is to call COLUMN_JSON, but JSON only supports strings and integers. Also COLUMN_JSON has a depth limit of 10, but the format has no actual limit.
  • The CPU overhead of packing/unpacking the dynamic columns is moved from you database server to your (presumably more scalable) clients.

API

All functions and names are accessible as attributes of the mariadb_dyncol module, which you can import with import mariadb_dyncol.

pack(mapping)

Packs the given mapping (a dict) into the MariaDB Dynamic Columns format for named columns and returns it as a byte string (Python 3’s bytes, Python 2’s str). This is suitable for then inserting into a table as part of a normal query.

The dict’s keys must all be unicode strings, and the values must all be one of the supported data types:

  • int between -(2 ** 32) + 1 and (2 ** 64) - 1 (Python 2: long is supported too)
  • str up to 4GB encoded in UTF-8 (Python 2: unicode)
  • float - anything except NaN or +/- inf
  • datetime.datetime - full range supported
  • datetime.date - full range supported
  • datetime.time - full range supported
  • Any dict that is valid by these rules, allowing nested keys. There is no nesting limit except from for MariaDB’s COLUMN_JSON function which restricts the depth to 10

Note that this does not support the DECIMAL type that MariaDB does (and would naturally map to Python’s Decimal) - it is a little more fiddly to pack/unpack, though certainly possible, and pull requests are welcomed. If you try and pack a Decimal, a DynColNotSupported exception will be raised.

There are other restrictions on the UTF-8 encoded column names as documented in MariaDB:

  • The maximum length of a column name is 16383 bytes
  • The maximum length of all column names (at one level in nested hierarchies) is 65535 bytes

All other unsupported types will raise a DynColTypeError. Out of range values will raise a DynColValueError.

Examples:

>>> mariadb_dyncol.pack({"a": 1})
b'\x04\x01\x00\x01\x00\x00\x00\x00\x00a\x02'
>>> mariadb_dyncol.pack({"a": "💩"})
b'\x04\x01\x00\x01\x00\x00\x00\x03\x00a!\xf0\x9f\x92\xa9'

unpack(bytestring)

Unpacks MariaDB dynamic columns data encoded byte string into a dict; the types you can expect back are those listed above. This is suitable for fetching the data direct from MariaDB and decoding in Python as opposed to with MariaDB’s COLUMN_JSON function, preserving the types that JSON discards.

As noted above, DECIMAL values are not supported, and unpacking this will raise DynColNotSupported. Also strings will only be decoded with the MySQL charsets utf8 or utf8mb4; strings with other charsets will raise DynColNotSupported as well.

Unsupported column formats, for example the old MariaDB numbered dynamic columns format, or corrupt data, will raise DynColValueError.

Examples:

>>> mariadb_dyncol.unpack(b'\x04\x01\x00\x01\x00\x00\x00\x03\x00a!\xf0\x9f\x92\xa9')
{"a": "💩"}
>>> mariadb_dyncol.unpack(b'\x04\x01\x00\x01\x00\x00\x00\x00\x00a\x02')
{"a": 1}

History

Pending release

3.0.0 (2019-02-07)

  • Drop Python 2 support, only Python 3.4+ is supported now.

2.0.0 (2018-10-20)

  • Use utf8mb4 character set for encoding strings. This seemed to be broken for emoji on older versions of MariaDB (10.1 or 10.2?), so utf8 was previously used, however this may have only been a display/COLUMN_JSON issue on such older versions. MariaDB internally now defaults to utf8mb44 for dynamic column strings. Since this changes the output of serialization slightly, please test before upgrading. Also you probably want to use utf8mb4 for everything else MariaDB in your application if you aren’t already - it is the default on MySQL 8+.

1.2.1 (2017-12-05)

  • Fix a packaging error which caused the tests to be installed alongside the package.
  • Don’t pin version of six to 1.9.0

1.2.0 (2016-05-24)

  • Disallowed str values on Python 2 - always use unicode
  • Added a benchmark script and made some optimizations that add up to a speed boost of about 10%.

1.1.0 (2015-10-13)

  • Tests now verify every operation against MariaDB’s COLUMN_CHECK and COLUMN_CREATE functions
  • Fixed column order when >1 UTF8 byte characters are involved
  • Fix encoding ints around size boundaries
  • Fix encoding times and datetimes with microseconds=0
  • Fix encoding float -0.0
  • Fix a data size boundaries off-by-one error
  • Fix decoding utf8mb4 strings

1.0.0 (2015-10-09)

  • Support to pack and unpack the named dynamic columns format. No support for DECIMAL values or strings with a non utf8mb4 charset.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
mariadb_dyncol-3.0.0-py3-none-any.whl (8.9 kB) Copy SHA256 hash SHA256 Wheel py3
mariadb-dyncol-3.0.0.tar.gz (14.3 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page