DSE - Simplified "bulk" insert/update for Django.
Project description
DSE - Simplified "bulk" insert/update for Django
================================================
Version : 1.0.2
Author : Thomas Weholt <thomas@weholt.org>
License : GPL v3.0
Status : Stable
Url : https://bitbucket.org/weholt/dse
Docs at http://readthedocs.org/docs/dse/en/latest/index.html
==Background==
* DSE is available for one reason - to insert/update lots of data as fast as possible.
* DSE vs Django ORM: typical speed gain is around 10X.
* DSE is aware of default values specified in your Django models and will use
those if no value is given for a field in an insert statement.
* DSE caches SQL-statements, both inserts and updates, and executes them when
a specified number of statements has been prepared or when it`s told manually to flush
cached statements to the database. The actual sql execution is done using
DB API cursor.executemany and this is much faster than executing SQL-statements
in sequence and way faster than using the Django ORM.
* DSE uses a dictionary to specify what fields to update. It differs between
update and insert statements by looking for a key similar to the primary key for
a given model.
* DSE will try to execute as many statements as possible, but will detect update
statements where only a few sql fields/columns are updated and execute those
statements individually.
* It has been designed to be used outside Django as well, but the main focus
is good Django integration.
* My hope is to see something like this in the django core.
==Plans for the future==
* Refactoring of the SQL-builder code.
* Profiling and performance tuning.
* More focus on thread-safety and if needed locking.
==Installation==
pip install dse
or
hg clone https://bitbucket.org/weholt/dse
==Example usage==
You got a model like:
gender = (('M', 'Male'), ('F', 'Female'))
class Person(models.Model):
name = models.CharField(max_length = 30)
age = models.IntegerField(default = 30)
sex = models.CharField(max_length = 1, choices = gender, default = 'M')
Using dse:
import dse
dse.patch_models() # Monkey patch all your models and expose dse for all models:
with Person.dse as d:
for name, age, sex in (('Thomas', 36, 'M'), ('Joe', 40, 'M'), ('Jane', 28, 'F')):
d.add_item(dict(name = name, age = age, sex = sex))
Nothing will be inserted into the database before the loop is done ( or you
insert 1000 items ). Then the items will be inserted using cursor.executemany,
using plain SQL - no ORM in sight.
DSE features singletons (NB! very experimental, no locking/thread support yet!):
import dse.singleton
p1 = dse.singleton.Models.Person()
p2 = dse.singleton.Models.Person()
print p1 is p2 # should print True
p1.add_item(dict(name = 'Joe'))
p2.flush()
print Person.objects.all().count() # should print 1
Singletons makes it possible to cache entries across sections of code and cache
even more data, hitting the db less.
DSE using default values defined in your model:
with Person.dse as d:
# Adding an item, just defining a name and using the default values from the model:
d.add_item({'name': 'John'})
# Overriding the default values? Just specify a valid value
d.add_item({'name': 'Thomas', 'age': 36, 'sex': 'M'})
# Update record with id = 1 and set its name to John. This will trigger
# a SQL-statement for this update alone, since not all columns are specified:
d.add_item({'id': 1, 'name': 'John'})
To use delayed execution of SQL statements:
Person.dse.add_item({'id': 2, 'name': 'Al Capone'}) # will NOT trigger anything
Person.dse.add_item({'id': 3, 'name': 'John Dillinger'}) # will NOT trigger anything
Person.dse.add_item({'name': 'Scarface'}) # will NOT trigger anything
Person.dse.flush() # will execute both update statements and insert a record for "Scarface"
Say you want to update all records with some calculated value, something you
couldn`t find a way to do in SQL. Using dse this is easy and fast:
with Person.dse as d:
# Use Djangos ORM to generate dictionaries to use in DSE; objects.all().values().
for item in Person.objects.all().values():
item['somevar'] = calculated_value
d.add_item(item)
I`ve recieved some questions about transaction handling. Below is an simple example,
but I`m looking into other ways of handling transactions as well:
from django.db import transaction
import dse
@transaction.commit_manually
def some_method():
with SomeModel.dse as d:
for item in somelist:
SomeModel.dse.add_item(
{'some_column': item.some_value,
'another_column': item.another_value})
transaction.commit()
==Release notes==
1.0.2 : reconnect if cursor is dead.
1.0.1 : fixed issue #9 "Factory can eat up memory" reported by vangheem. When finding fields related to a table only the top row is fetched.
1.0.0 : Version bump. Added unittest for issue #8.
1.0.0-RC1 : updated README.txt.
0.9.4 : - PEP8 and pyflake.
0.9.3 : - Fixed issue #7: dse causes django-debug-toolbar to crash. Thanks to ringemup for pointing that out. Added some docstrings.
0.9.2 : - Corrected type in usage.rst and README.txt.
0.9.1 : - Refactored code even more, added usage.rst, singleton support in the singleton-package and some performance tests. Models not monkey patched be default anymore, must call dse.patch_models().
0.9.0 : - Refactored code and cleaned up tests folder. Focus on getting singleton support in before 1.0.0. And more tests.
0.8.2 : - added 'pysqlite2' to _DBMAP. Thanks to David Marble for 0.8.1 and 0.8.2.
0.8.1 : - attempt to fix quoting problems with fields on postgresql.
0.8.0 : - fixed crash when more than one database connection has been configured. No ModelFactory will be triggered.
0.7.0 : - don`t remember.
0.6.0 : - added support for the with-statement.
- added an ModelDelayedExecutor-instance to each model, so you can do Model.dse.add_item
instead of dse.ModelFactory.Model.add_item.
- renamed dse.modelfactory to dse.ModelFactory to be more style-compliant.
0.5.1 : just some notes on transaction handling.
0.5.0 :
- added modelfactory. Upon first import a modelfactory will be created in the DSE module. It`s basically just a
helper-class containing ModelDelayedExecutor-instances for all models in all apps found in INSTALLED_APPS in
settings.py.
- to change the default item limit before automatic execution of cached SQL statements to 10000 instead of the default 1000::
import dse
dse.ITEM_LIMIT = 10000
0.4.0 :
- fixed serious bug when using mass updates. Using cursor.executemany is only possible when values
for all columns are specified. If only values for a subset of the columns is specified that will be
executed as a seperate SQL-call. NOTE! Using dex.get_items() or Djangos Model.objects.values() will give you
all the fields.
- code clean-up.
- added custom exceptions; UpdateManyException, UpdateOneException and InsertManyException.
================================================
Version : 1.0.2
Author : Thomas Weholt <thomas@weholt.org>
License : GPL v3.0
Status : Stable
Url : https://bitbucket.org/weholt/dse
Docs at http://readthedocs.org/docs/dse/en/latest/index.html
==Background==
* DSE is available for one reason - to insert/update lots of data as fast as possible.
* DSE vs Django ORM: typical speed gain is around 10X.
* DSE is aware of default values specified in your Django models and will use
those if no value is given for a field in an insert statement.
* DSE caches SQL-statements, both inserts and updates, and executes them when
a specified number of statements has been prepared or when it`s told manually to flush
cached statements to the database. The actual sql execution is done using
DB API cursor.executemany and this is much faster than executing SQL-statements
in sequence and way faster than using the Django ORM.
* DSE uses a dictionary to specify what fields to update. It differs between
update and insert statements by looking for a key similar to the primary key for
a given model.
* DSE will try to execute as many statements as possible, but will detect update
statements where only a few sql fields/columns are updated and execute those
statements individually.
* It has been designed to be used outside Django as well, but the main focus
is good Django integration.
* My hope is to see something like this in the django core.
==Plans for the future==
* Refactoring of the SQL-builder code.
* Profiling and performance tuning.
* More focus on thread-safety and if needed locking.
==Installation==
pip install dse
or
hg clone https://bitbucket.org/weholt/dse
==Example usage==
You got a model like:
gender = (('M', 'Male'), ('F', 'Female'))
class Person(models.Model):
name = models.CharField(max_length = 30)
age = models.IntegerField(default = 30)
sex = models.CharField(max_length = 1, choices = gender, default = 'M')
Using dse:
import dse
dse.patch_models() # Monkey patch all your models and expose dse for all models:
with Person.dse as d:
for name, age, sex in (('Thomas', 36, 'M'), ('Joe', 40, 'M'), ('Jane', 28, 'F')):
d.add_item(dict(name = name, age = age, sex = sex))
Nothing will be inserted into the database before the loop is done ( or you
insert 1000 items ). Then the items will be inserted using cursor.executemany,
using plain SQL - no ORM in sight.
DSE features singletons (NB! very experimental, no locking/thread support yet!):
import dse.singleton
p1 = dse.singleton.Models.Person()
p2 = dse.singleton.Models.Person()
print p1 is p2 # should print True
p1.add_item(dict(name = 'Joe'))
p2.flush()
print Person.objects.all().count() # should print 1
Singletons makes it possible to cache entries across sections of code and cache
even more data, hitting the db less.
DSE using default values defined in your model:
with Person.dse as d:
# Adding an item, just defining a name and using the default values from the model:
d.add_item({'name': 'John'})
# Overriding the default values? Just specify a valid value
d.add_item({'name': 'Thomas', 'age': 36, 'sex': 'M'})
# Update record with id = 1 and set its name to John. This will trigger
# a SQL-statement for this update alone, since not all columns are specified:
d.add_item({'id': 1, 'name': 'John'})
To use delayed execution of SQL statements:
Person.dse.add_item({'id': 2, 'name': 'Al Capone'}) # will NOT trigger anything
Person.dse.add_item({'id': 3, 'name': 'John Dillinger'}) # will NOT trigger anything
Person.dse.add_item({'name': 'Scarface'}) # will NOT trigger anything
Person.dse.flush() # will execute both update statements and insert a record for "Scarface"
Say you want to update all records with some calculated value, something you
couldn`t find a way to do in SQL. Using dse this is easy and fast:
with Person.dse as d:
# Use Djangos ORM to generate dictionaries to use in DSE; objects.all().values().
for item in Person.objects.all().values():
item['somevar'] = calculated_value
d.add_item(item)
I`ve recieved some questions about transaction handling. Below is an simple example,
but I`m looking into other ways of handling transactions as well:
from django.db import transaction
import dse
@transaction.commit_manually
def some_method():
with SomeModel.dse as d:
for item in somelist:
SomeModel.dse.add_item(
{'some_column': item.some_value,
'another_column': item.another_value})
transaction.commit()
==Release notes==
1.0.2 : reconnect if cursor is dead.
1.0.1 : fixed issue #9 "Factory can eat up memory" reported by vangheem. When finding fields related to a table only the top row is fetched.
1.0.0 : Version bump. Added unittest for issue #8.
1.0.0-RC1 : updated README.txt.
0.9.4 : - PEP8 and pyflake.
0.9.3 : - Fixed issue #7: dse causes django-debug-toolbar to crash. Thanks to ringemup for pointing that out. Added some docstrings.
0.9.2 : - Corrected type in usage.rst and README.txt.
0.9.1 : - Refactored code even more, added usage.rst, singleton support in the singleton-package and some performance tests. Models not monkey patched be default anymore, must call dse.patch_models().
0.9.0 : - Refactored code and cleaned up tests folder. Focus on getting singleton support in before 1.0.0. And more tests.
0.8.2 : - added 'pysqlite2' to _DBMAP. Thanks to David Marble for 0.8.1 and 0.8.2.
0.8.1 : - attempt to fix quoting problems with fields on postgresql.
0.8.0 : - fixed crash when more than one database connection has been configured. No ModelFactory will be triggered.
0.7.0 : - don`t remember.
0.6.0 : - added support for the with-statement.
- added an ModelDelayedExecutor-instance to each model, so you can do Model.dse.add_item
instead of dse.ModelFactory.Model.add_item.
- renamed dse.modelfactory to dse.ModelFactory to be more style-compliant.
0.5.1 : just some notes on transaction handling.
0.5.0 :
- added modelfactory. Upon first import a modelfactory will be created in the DSE module. It`s basically just a
helper-class containing ModelDelayedExecutor-instances for all models in all apps found in INSTALLED_APPS in
settings.py.
- to change the default item limit before automatic execution of cached SQL statements to 10000 instead of the default 1000::
import dse
dse.ITEM_LIMIT = 10000
0.4.0 :
- fixed serious bug when using mass updates. Using cursor.executemany is only possible when values
for all columns are specified. If only values for a subset of the columns is specified that will be
executed as a seperate SQL-call. NOTE! Using dex.get_items() or Djangos Model.objects.values() will give you
all the fields.
- code clean-up.
- added custom exceptions; UpdateManyException, UpdateOneException and InsertManyException.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dse-1.0.2.tar.gz
(10.2 kB
view hashes)