Skip to main content

Generate gapless sequences of integer values.

Project description

The problem

By default, Django gives each model an auto-incrementing integer primary key. These primary keys look like they generate a continuous sequence of integers.

However, this behavior isn’t guaranteed.

If a transaction inserts a row and then is rolled back, the sequence counter isn’t rolled back for performance reasons, creating a gap in primary keys.

Such gaps may happen on all databases natively supported by Django:

They may also happen on most databases supported via third-party backends.

This can cause compliance issues for some use cases such as accounting.

This risk isn’t well known. Since most transactions succeed, values look sequential. Gaps will only be revealed by audits.

The solution

django-sequences provides a get_next_value function which is designed to be used as follows:

from django.db import transaction
from sequences import get_next_value
from invoices.models import Invoice

with transaction.atomic():
    Invoice.objects.create(number=get_next_value("invoice_numbers"))

Or, if you’d rather use an object-oriented API:

from django.db import transaction
from sequences import Sequence
from invoices.models import Invoice

invoice_numbers = Sequence("invoice_numbers")

with transaction.atomic():
    Invoice.objects.create(number=invoice_numbers.get_next_value())

get_next_value relies on the database’s transactional integrity to ensure that each value is returned exactly once. As a consequence, the guarantees of django-sequences apply only if you call get_next_value and save its return value to the database within the same transaction!

Getting started

django-sequences is tested with Django 2.2 (LTS), 3.0, and 3.1.

It is released under the BSD license, like Django itself.

Install django-sequences:

$ pip install django-sequences

Add it to the list of applications in your project’s settings:

INSTALLED_APPS += ['sequences.apps.SequencesConfig']

Run migrations:

$ django-admin migrate

API

get_next_value

>>> from sequences import get_next_value

This function generates a gapless sequence of integer values:

>>> get_next_value()
1
>>> get_next_value()
2
>>> get_next_value()
3

It supports multiple independent sequences:

>>> get_next_value("cases")
1
>>> get_next_value("cases")
2
>>> get_next_value("invoices")
1
>>> get_next_value("invoices")
2

The first value defaults to 1. It can be customized:

>>> get_next_value("customers", initial_value=1000)  # pro growth hacking

The initial_value parameter only matters when get_next_value is called for the first time for a given sequence — assuming the corresponding database transaction gets committed; as discussed above, if the transaction is rolled back, the generated value isn’t consumed. It’s also possible to initialize a sequence in a data migration and not use initial_value in actual code.

Sequences can loop:

>>> get_next_value("seconds", initial_value=0, reset_value=60)

When the sequence reaches reset_value, it restarts at initial_value. In other words, it generates reset_value - 2, reset_value - 1, initial_value, initial_value + 1, etc. In that case, each call to get_next_value must provide initial_value when it isn’t the default and reset_value.

Database transactions that call get_next_value for a given sequence are serialized. As a consequence, when you call get_next_value in a database transaction, other callers trying to get a value from the same sequence block until the transaction completes, either with a commit or a rollback. You should keep such transactions short to minimize the impact on performance.

This is why databases default to a faster behavior that may create gaps.

Passing nowait=True makes get_next_value raise an exception instead of blocking in this scenario. This is rarely useful. Also it doesn’t work for the first call. (This is a bug but it’s harmless and hard to fix.)

Calls to get_next_value for distinct sequences don’t interact with one another.

Finally, passing using="..." allows selecting the database on which the current sequence value is stored. When this parameter isn’t provided, it defaults to the default database for writing models of the sequences application. See Multiple databases for details.

To sum up, the complete signature of get_next_value is:

get_next_value(
    sequence_name="default",
    initial_value=1,
    reset_value=None,
    *,
    nowait=False,
    using=None,
)

get_last_value

>>> from sequences import get_last_value

This function returns the last value generated by a sequence:

>>> get_last_value()
None
>>> get_next_value()
1
>>> get_last_value()
1
>>> get_next_value()
2
>>> get_last_value()
2

If the sequence hasn’t generated a value yet, get_last_value returns None.

It supports independent sequences like get_next_value:

>>> get_next_value("cases")
1
>>> get_last_value("cases")
1
>>> get_next_value("invoices")
1
>>> get_last_value("invoices")
1

It accepts using="..." for selecting the database on which the current sequence value is stored, defaulting to the default database for reading models of the sequences application.

The complete signature of get_last_value is:

get_last_value(
    sequence_name="default",
    *,
    using=None,
)

get_last_value is a convenient and fast way to tell how many values a sequence generated but it makes no guarantees. Concurrent calls to get_next_value may produce unexpected results of get_last_value.

Sequence

>>> from sequences import Sequence

(not to be confused with sequences.models.Sequence, a private API)

This class stores parameters for a sequence and provides get_next_value and get_last_value methods:

>>> claim_ids = Sequence("claims")
>>> claim_ids.get_next_value()
1
>>> claim_ids.get_next_value()
2
>>> claim_ids.get_last_value()
2

This reduces the risk of errors when the same sequence is used in multiple places.

Instances of Sequence are also infinite iterators:

>>> next(claim_ids)
3
>>> next(claim_ids)
4

The complete API is:

Sequence(
    sequence_name="default",
    initial_value=1,
    reset_value=None,
    *,
    using=None,
)

Sequence.get_next_value(
    self,
    *,
    nowait=False
)

Sequence.get_last_value(
    self,
)

All parameters have the same meaning as in the get_next_value and get_last_value functions.

Contributing

You can run tests with:

$ poetry install
$ poetry shell
$ make test

If you’d like to contribute, please open an issue or a pull request on GitHub!

Database support

django-sequences is tested on PostgreSQL, MariaDB / MySQL, Oracle, and SQLite.

MySQL only supports the nowait parameter from version 8.0.1. MariaDB only supports nowait from version 10.3.

Multiple databases

Since django-sequences relies on the database to guarantee transactional integrity, the current value for a given sequence must be stored in the same database as models containing generated values.

In a project that uses multiple databases, you must write a suitable database router to create tables for the sequences application on all databases storing models containing sequential numbers.

Each database has its own namespace: a sequence with the same name stored in two databases will have independent counters in each database.

Isolation levels

Since django-sequences relies on the database’s transactional integrity, using a non-default transaction isolation level requires special care.

  • read uncommitted: django-sequences cannot work at this isolation level.

    Indeed, concurrent transactions can create gaps, as in this scenario:

    • Transaction A reads N and writes N + 1;

    • Transaction B reads N + 1 (dirty read) and writes N + 2;

    • Transaction A is rolled back;

    • Transaction B is committed;

    • N + 1 is a gap.

    The read uncommitted isolation level doesn’t provide sufficient guarantees. It will never be supported.

  • read committed: django-sequences works best at this isolation level, like Django itself.

  • repeatable read: django-sequences also works at this isolation level, provided your code handles serialization failures and retries transactions.

    This requirement isn’t specific to django-sequences. It’s generally needed when running at the repeatable read isolation level.

    Here’s a scenario where only one of two concurrent transactions can complete on PostgreSQL:

    • Transaction A reads N and writes N + 1;

    • Transaction B attemps to read; it must wait until transaction A completes;

    • Transaction A is committed;

    • Transaction B is aborted.

    On PostgreSQL, serialization failures are reported as: OperationalError: could not serialize access due to concurrent update.

    On MySQL, they result in: OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction').

    Concurrent transactions initializing the same sequence are also vulnerable, although that’s hardly ever a problem in practice.

    On PostgreSQL, this manifests as IntegrityError: duplicate key value violates unique constraint "sequences_sequence_pkey".

  • serializable: the situation is identical to the repeatable read level.

    SQLite always runs at the serializable isolation level. Serialization failures result in: OperationalError: database is locked.

Changelog

2.6

  • Improve documentation.

2.5

  • Fix Japanese and Turkish translations.

  • Restore compatibility with Python 3.5.

  • Support relabeling the sequences app with a custom AppConfig.

2.4

  • Add the get_last_value function.

  • Add the Sequence class.

2.3

  • Optimize performance on MySQL.

  • Test on MySQL, SQLite and Oracle.

2.2

  • Optimize performance on PostgreSQL ≥ 9.5.

2.1

  • Provide looping sequences with reset_value.

2.0

  • Add support for multiple databases.

  • Add translations.

  • nowait becomes keyword-only argument.

  • Drop support for Python 2.

1.0

  • Initial stable release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django-sequences-2.6.tar.gz (17.2 kB view hashes)

Uploaded Source

Built Distribution

django_sequences-2.6-py3-none-any.whl (27.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page