Bulk load Django models
Project description
Django Bulk Load
Load large batches of Django models into the DB using the Postgres COPY command. This library is a more performant alternative to bulk_create and bulk_update in Django.
Note: Currently, this library only supports Postgres. Other databases may be added in the future.
Install
pip install django-bulk-load
Benchmarks
bulk_update_models vs Django's bulk_update vs django-bulk-update
Results
count: 1,000
bulk_update (Django): 0.45329761505126953
bulk_update (django-bulk-update): 0.1036691665649414
bulk_update_models: 0.04524850845336914
count: 10,000
bulk_update (Django): 6.0840747356414795
bulk_update (django-bulk-update): 2.433042049407959
bulk_update_models: 0.10899758338928223
count: 100,000
bulk_update (Django): 647.6648473739624
bulk_update (django-bulk-update): 619.0643970966339
bulk_update_modelsL 0.9625072479248047
count: 1,000,000
bulk_update (Django): Does not complete
bulk_update (django-bulk-update): Does not complete
bulk_update_models: 14.923949003219604
See this thread for information on Django performance issues. https://groups.google.com/g/django-updates/c/kAn992Fkk24
Code
models = [TestComplexModel(id=i, integer_field=i, string_field=str(i)) for i in range(count)]
def run_bulk_update_django():
start = time()
TestComplexModel.objects.bulk_update(models, fields=["integer_field", "string_field"])
print(time() - start)
def run_bulk_update_library():
start = time()
TestComplexModel.objects.bulk_update(models, update_fields=["integer_field", "string_field"])
print(time() - start)
def run_bulk_update_models():
start = time()
bulk_update_models(models)
print(time() - start)
bulk_insert_models vs Django's bulk_create
Results
count: 1,000
bulk_create: 0.048630714416503906
bulk_insert_models: 0.03132152557373047
count: 10,000
bulk_create: 0.45952868461608887
bulk_insert_models: 0.1908433437347412
count: 100,000
bulk_create: 4.875206708908081
bulk_insert_models: 1.764514684677124
count: 1,000,000
bulk_create: 59.16990399360657
bulk_insert_models: 18.651455640792847
Code
models = [TestComplexModel(integer_field=i, string_field=str(i)) for i in range(count)]
def run_bulk_create():
start = time()
TestComplexModel.objects.bulk_create(models)
print(time() - start)
def run_bulk_insert_models():
start = time()
bulk_insert_models(models)
print(time() - start)
API
Just import and use the functions below. No need to change settings.py
bulk_insert_models()
INSERT a batch of models. It makes use of the Postgres COPY command to improve speed. If a row already exist, the entire insert will fail. See bulk_load.py for descriptions of all parameters.
from django_bulk_load import bulk_insert_models
bulk_insert_models(
models: Sequence[Model],
ignore_conflicts: bool = False,
return_models: bool = False,
)
bulk_upsert_models()
UPSERT a batch of models. It replicates UPSERTing.
By default, it matches existing models using the model pk
, but you can specify matching on other fields with
pk_field_names
. See bulk_load.py for descriptions of all parameters.
from django_bulk_load import bulk_upsert_models
bulk_upsert_models(
models: Sequence[Model],
pk_field_names: Sequence[str] = None,
insert_only_field_names: Sequence[str] = None,
model_changed_field_names: Sequence[str] = None,
update_if_null_field_names: Sequence[str] = None,
update_where: Callable[[Sequence[Field], str, str], Composable] = None,
return_models: bool = False,
)
bulk_update_models()
UPDATE a batch of models. By default, it matches existing models using the model pk
, but you can specify matching on other fields with
pk_field_names
. If the model is not found in the database, it is ignored. See bulk_load.py for descriptions of all parameters.
from django_bulk_load import bulk_update_models
bulk_update_models(
models: Sequence[Model],
update_field_names: Sequence[str] = None,
pk_field_names: Sequence[str] = None,
model_changed_field_names: Sequence[str] = None,
update_if_null_field_names: Sequence[str] = None,
update_where: Callable[[Sequence[Field], str, str], Composable] = None,
return_models: bool = False,
)
bulk_insert_changed_models()
INSERTs a new record in the database when a model field has changed in any of compare_field_names
,
with respect to its latest state, where "latest" is defined by ordering the records
for a given primary key by sorting in descending order on the column passed in
order_field_name
. Does not INSERT a new record if the latest record has not changed. See bulk_load.py for descriptions of all parameters.
from django_bulk_load import bulk_insert_changed_models
bulk_insert_changed_models(
models: Sequence[Model],
pk_field_names: Sequence[str],
compare_field_names: Sequence[str],
order_field_name=None,
return_models=None,
)
bulk_select_model_dicts()
Select/Get model dictionaries by filter_field_names. It returns dictionaries, not Django models for performance reasons. This is useful when querying a very large set of models or multiple field IN clauses.
from django_bulk_load import bulk_select_model_dicts
bulk_select_model_dicts(
model_class: Type[Model],
filter_field_names: Iterable[str],
select_field_names: Iterable[str],
filter_data: Iterable[Sequence],
select_for_update=False,
skip_filter_transform=False,
)
Contributing
We are not accepting pull requests from anyone outside Cedar employees at this time. All pull requests will be closed.
Commit Syntax
All PRs must be a single commit and follow the following syntax https://github.com/angular/angular/blob/master/CONTRIBUTING.md#-commit-message-format
Testing
You will need Docker installed and run the following command
./test.sh
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file django-bulk-load-1.4.3.tar.gz
.
File metadata
- Download URL: django-bulk-load-1.4.3.tar.gz
- Upload date:
- Size: 14.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac6c9f0166b50ce3d3824b224b620084ff56436f6f741b43da1014fa466012b4 |
|
MD5 | 2afe7a287fcad5cd9a4b31261f8eb8a0 |
|
BLAKE2b-256 | e7a593e091651d4c226d83ad8ab82d7ecb9875fbe5ca06faa78acec4cd9f77d8 |
File details
Details for the file django_bulk_load-1.4.3-py3-none-any.whl
.
File metadata
- Download URL: django_bulk_load-1.4.3-py3-none-any.whl
- Upload date:
- Size: 13.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9bfd3d725c101d23a12a0e7dd16f06bfab1b92949cf7fc15954ad3382e86141 |
|
MD5 | 1737d980c44c577f313c3a7020b11029 |
|
BLAKE2b-256 | fe7cfe5a7a15d8b99ed21182df1cd1b9fedcf305ebfcbecc18d1deac99011bb8 |