bulk_update_or_create for Django model managers
Project description
django-bulk-update-or-create
Everyone using Django ORM will eventually find himself doing batch update_or_create operations: ingest files from external sources, sync with external APIs, etc.
If the number of records is big, the slowliness of QuerySet.update_or_create will stand out: it is very practical to use but it always does one SELECT and then one INSERT (if select didn't return anything) or UPDATE/.save (if it did).
Searching online shows that this does indeed happen to quite a few people though it doesn't seem to be any good solution:
bulk_createis really fast if you know all records are new (and you're not using multi-table inheritance)bulk_updatedoes some nice voodoo to update several records with the sameUPDATEstatement (using a hugeWHEREcondition together withCASE), but you need to be sure they all exist- UPSERTs (INSERT .. ON DUPLICATE KEY UPDATE) look interesting (TODO on different package) but they will be retricted by
bulk_createlimitations ==> cannot use on models with multi-table inheritance
This package tries to tackle this introducing bulk_update_or_create to model QuerySet/Manager:
update_or_create:(1 SELECT + 1 INSERT/UPDATE) * Nbulk_update_or_create:1 BIG_SELECT + 1 BIG_UPDATE + (lte_N) INSERT
For a batch of records:
SELECTall from database (based on thematch_fieldparameter)- Update records in memory
- Use
bulk_updatefor those - Use
INSERT/.createon each of the remaining
The (SOFTCORE) performance test looks promising, more than 70% less time (average):
$ make testcmd
# default - sqlite
DJANGO_SETTINGS_MODULE=settings tests/manage.py bulk_it
loop of update_or_create - all creates: 3.966486692428589
loop of update_or_create - all updates: 4.020653247833252
loop of update_or_create - half half: 3.9968857765197754
bulk_update_or_create - all creates: 2.949239730834961
bulk_update_or_create - all updates: 0.15633511543273926
bulk_update_or_create - half half: 1.4585723876953125
# mysql
DJANGO_SETTINGS_MODULE=settings_mysql tests/manage.py bulk_it
loop of update_or_create - all creates: 5.511938571929932
loop of update_or_create - all updates: 5.321666955947876
loop of update_or_create - half half: 5.391834735870361
bulk_update_or_create - all creates: 1.5671980381011963
bulk_update_or_create - all updates: 0.14612770080566406
bulk_update_or_create - half half: 0.7262606620788574
# postgres
DJANGO_SETTINGS_MODULE=settings_postgresql tests/manage.py bulk_it
loop of update_or_create - all creates: 4.3584535121917725
loop of update_or_create - all updates: 3.6183276176452637
loop of update_or_create - half half: 4.145816087722778
bulk_update_or_create - all creates: 1.044851541519165
bulk_update_or_create - all updates: 0.14954638481140137
bulk_update_or_create - half half: 0.8407495021820068
Installation
pip install django-bulk-update-or-create
Add it to your INSTALLED_APPS list in settings.py
Usage
- use
BulkUpdateOrCreateQuerySetas manager of your model(s)
from django.db import models
from bulk_update_or_create import BulkUpdateOrCreateQuerySet
class RandomData(models.Model):
objects = BulkUpdateOrCreateQuerySet.as_manager()
uuid = models.IntegerField(unique=True)
data = models.CharField(max_length=200, null=True, blank=True)
- call
bulk_update_or_create
items = [
RandomData(uuid=1, data='data for 1'),
RandomData(uuid=2, data='data for 2'),
]
RandomData.objects.bulk_update_or_create(items, ['data'], match_field='uuid')
- or use the context manager, if you are updating a big number of items, as it manages a batch queue
with RandomData.objects.bulk_update_or_create_context(['data'], match_field='uuid', batch_size=10) as bulkit:
for i in range(10000):
bulkit.queue(RandomData(uuid=i, data=i + 20))
bulk_update_or_create supports yield_objects=True so you can iterate over the created/updated objects.
bulk_update_or_create_context provides the same information to the callback function specified as status_cb
Docs
WIP
ToDo
- Docs!
- Add option to use
bulk_createfor creates: assert model is not multi-table, if enabled - Fix the collation mess: the keyword arg
case_insensitive_matchshould be dropped and collation detected in runtime - Add support for multiple
match_field- probably will need to useWHERE (K1=X and K2=Y) or (K1=.. and K2 =..)instead ofINfor those, as that SQL standard doesn't seem widely adopted yet - Link to
UPSERTalternative package once done!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file django-bulk-update-or-create-0.3.0.tar.gz.
File metadata
- Download URL: django-bulk-update-or-create-0.3.0.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1cb7bdf93a7264799c275b26595c1a5060f3091dfd3709e743d710a9758bf9ea
|
|
| MD5 |
aab6798e930c00b74703d5ac91f7bc69
|
|
| BLAKE2b-256 |
1b48c5eae329bf835eb0e4d53158e6fcfc784b3f17ae8154519d9b1f044f7489
|
File details
Details for the file django_bulk_update_or_create-0.3.0-py3-none-any.whl.
File metadata
- Download URL: django_bulk_update_or_create-0.3.0-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c251375e0331d469c2e9b2320e3b3d0f6a1658289ed97520ef230fa096ded645
|
|
| MD5 |
dc9f69a0bbe6d57ed87d088ea3ce0e01
|
|
| BLAKE2b-256 |
9ec24077eb674ac3f853009ff118778de7941d224ccb1eb79c693dc84378f1a6
|