Skip to main content

Bulk load Django models

Project description

Django Bulk Load

Load large batches of Django models into the DB using the Postgres COPY command. This library is a more performant alternative to bulk_create and bulk_update in Django.

Note: Currently, this library only supports Postgres. Other databases may be added in the future.

Install

pip install django-bulk-load

Benchmarks

bulk_update_models vs Django's bulk_update vs django-bulk-update

Results

count: 1,000
bulk_update (Django):             0.45329761505126953
bulk_update (django-bulk-update): 0.1036691665649414
bulk_update_models:               0.04524850845336914

count: 10,000
bulk_update (Django):             6.0840747356414795
bulk_update (django-bulk-update): 2.433042049407959
bulk_update_models:               0.10899758338928223

count: 100,000
bulk_update (Django):             647.6648473739624
bulk_update (django-bulk-update): 619.0643970966339
bulk_update_modelsL               0.9625072479248047

count: 1,000,000
bulk_update (Django):             Does not complete
bulk_update (django-bulk-update): Does not complete
bulk_update_models:               14.923949003219604

See this thread for information on Django performance issues. https://groups.google.com/g/django-updates/c/kAn992Fkk24

Code

models = [TestComplexModel(id=i, integer_field=i, string_field=str(i)) for i in range(count)]

def run_bulk_update_django():
  start = time()
  TestComplexModel.objects.bulk_update(models, fields=["integer_field", "string_field"])
  print(time() - start)
  
def run_bulk_update_library():
  start = time()
  TestComplexModel.objects.bulk_update(models, update_fields=["integer_field", "string_field"])
  print(time() - start)
  
def run_bulk_update_models():
  start = time()
  bulk_update_models(models)
  print(time() - start)

bulk_insert_models vs Django's bulk_create

Results

count: 1,000
bulk_create:        0.048630714416503906
bulk_insert_models: 0.03132152557373047

count: 10,000
bulk_create:        0.45952868461608887
bulk_insert_models: 0.1908433437347412

count: 100,000
bulk_create:        4.875206708908081
bulk_insert_models: 1.764514684677124

count: 1,000,000
bulk_create:        59.16990399360657
bulk_insert_models: 18.651455640792847

Code

models = [TestComplexModel(integer_field=i, string_field=str(i)) for i in range(count)]

def run_bulk_create():
  start = time()
  TestComplexModel.objects.bulk_create(models)
  print(time() - start)
  
def run_bulk_insert_models():
  start = time()
  bulk_insert_models(models)
  print(time() - start)

API

Just import and use the functions below. No need to change settings.py

bulk_insert_models()

INSERT a batch of models. It makes use of the Postgres COPY command to improve speed. If a row already exist, the entire insert will fail. See bulk_load.py for descriptions of all parameters.

from django_bulk_load import bulk_insert_models

bulk_insert_models(
    models: Sequence[Model],
    ignore_conflicts: bool = False,
    return_models: bool = False,
)

bulk_upsert_models()

UPSERT a batch of models. It replicates UPSERTing. By default, it matches existing models using the model pk, but you can specify matching on other fields with pk_field_names. See bulk_load.py for descriptions of all parameters.

from django_bulk_load import bulk_upsert_models

bulk_upsert_models(
    models: Sequence[Model],
    pk_field_names: Sequence[str] = None,
    insert_only_field_names: Sequence[str] = None,
    model_changed_field_names: Sequence[str] = None,
    update_if_null_field_names: Sequence[str] = None,
    update_where: Callable[[Sequence[Field], str, str], Composable] = None,
    return_models: bool = False,
)

bulk_update_models()

UPDATE a batch of models. By default, it matches existing models using the model pk, but you can specify matching on other fields with pk_field_names. If the model is not found in the database, it is ignored. See bulk_load.py for descriptions of all parameters.

from django_bulk_load import bulk_update_models

bulk_update_models(
    models: Sequence[Model],
    update_field_names: Sequence[str] = None,
    pk_field_names: Sequence[str] = None,
    model_changed_field_names: Sequence[str] = None,
    update_if_null_field_names: Sequence[str] = None,
    update_where: Callable[[Sequence[Field], str, str], Composable] = None,
    return_models: bool = False,
)

bulk_insert_changed_models()

INSERTs a new record in the database when a model field has changed in any of compare_field_names, with respect to its latest state, where "latest" is defined by ordering the records for a given primary key by sorting in descending order on the column passed in order_field_name. Does not INSERT a new record if the latest record has not changed. See bulk_load.py for descriptions of all parameters.

from django_bulk_load import bulk_insert_changed_models

bulk_insert_changed_models(
    models: Sequence[Model],
    pk_field_names: Sequence[str],
    compare_field_names: Sequence[str],
    order_field_name=None,
    return_models=None,
)

bulk_select_model_dicts()

Select/Get model dictionaries by filter_field_names. It returns dictionaries, not Django models for performance reasons. This is useful when querying a very large set of models or multiple field IN clauses.

from django_bulk_load import bulk_select_model_dicts

bulk_select_model_dicts(
    model_class: Type[Model],
    filter_field_names: Iterable[str],
    select_field_names: Iterable[str],
    filter_data: Iterable[Sequence],
    select_for_update=False,
    skip_filter_transform=False,
)

Contributing

We are not accepting pull requests from anyone outside Cedar employees at this time. All pull requests will be closed.

Commit Syntax

All PRs must be a single commit and follow the following syntax https://github.com/angular/angular/blob/master/CONTRIBUTING.md#-commit-message-format

Testing

You will need Docker installed and run the following command

./test.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django-bulk-load-1.4.3.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

django_bulk_load-1.4.3-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file django-bulk-load-1.4.3.tar.gz.

File metadata

  • Download URL: django-bulk-load-1.4.3.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for django-bulk-load-1.4.3.tar.gz
Algorithm Hash digest
SHA256 ac6c9f0166b50ce3d3824b224b620084ff56436f6f741b43da1014fa466012b4
MD5 2afe7a287fcad5cd9a4b31261f8eb8a0
BLAKE2b-256 e7a593e091651d4c226d83ad8ab82d7ecb9875fbe5ca06faa78acec4cd9f77d8

See more details on using hashes here.

File details

Details for the file django_bulk_load-1.4.3-py3-none-any.whl.

File metadata

File hashes

Hashes for django_bulk_load-1.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b9bfd3d725c101d23a12a0e7dd16f06bfab1b92949cf7fc15954ad3382e86141
MD5 1737d980c44c577f313c3a7020b11029
BLAKE2b-256 fe7cfe5a7a15d8b99ed21182df1cd1b9fedcf305ebfcbecc18d1deac99011bb8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page