Skip to main content

DSE - Delayed SQL Executor

Project description

==========================
DSE - Delayed SQL Executor
==========================

Version : 0.5.1
Author : Thomas Weholt <thomas@weholt.org>
License : GPL v3.0
Status : Beta
Url : https://bitbucket.org/weholt/dse

Background
==========

DSE is concept of caching SQL-statements, both inserts and updates, and executing them when a specified
number of statements has been prepared. This is done using DB API cursor.executemany(list of cached statements)
and this is way faster than executing SQL-statements in sequence.

DSE also is a way to solve a recurring problem when using the Django ORM; how to insert or update a bunch of
records without the huge performance hit of using the ORM to do it, for instance when you want to
scan a filesystem and add or update a record for each file found.

It has been designed to be used outside Django as well, but the main focus is good Django integration.

A prepared parameter based SQL-statement is built based on the structure/schema of the table in the database
or the model when using Django. To update or insert a record you add a plain dictionary with keys
corresponding to fields in a table/model to DSE using the add_item()-method. You only add values you want
to update and/or values for any required field. DSE also handles getting any defined default value from a model.

If a key in the dictionary is similar to the primary key of the table it will result in an update being executed,
ie. in most cases using Django, if the dictionary contains a key called "id", it will be interpreted as an update.

When calling add_item(dict) you put the dict in a cache. When the cache reaches a specified number of elements
or close()/flush() is called a cursor.executemany(cache) will be called and the cache will be cleared.

By default no SQL-statements will be executed until 1000 elements are cached or the flush/close-method is called.
You can use another cache limit when you create a DSE instance, like so ::

import dse
dex = dse.ModelDelayedExecutor(djangomodel, item_limit = 5000)

NOTE! This is still more of a proof-of-concept type of code and I`d like to get comments, warnings and
suggestions on how to improve it. It has been tested on Sqlite3 so far without problems. The code needs
clean-up and documentation. I`ve got a testsuite running, but haven`t found a way to get it included in the
setup.py file. I`ll try to add it in the next release.

Release notes
=============

0.5.1 : just some notes on transaction handling.

0.5.0 :
- added modelfactory. Upon first import a modelfactory will be created in the DSE module. It`s basically just a
helper-class containing ModelDelayedExecutor-instances for all models in all apps found in INSTALLED_APPS in
settings.py.
- to change the default item limit before automatic execution of cached SQL statements to 10000 instead of the default 1000::

import dse
dse.ITEM_LIMIT = 10000

0.4.0 :
- fixed serious bug when using mass updates. Using cursor.executemany is only possible when values
for all columns are specified. If only values for a subset of the columns is specified that will be
executed as a seperate SQL-call. NOTE! Using dex.get_items() or Djangos Model.objects.values() will give you
all the fields.
- code clean-up.
- added custom exceptions; UpdateManyException, UpdateOneException and InsertManyException.

Example usage
=============

You got a model called foobar in an app called someapp, looking like::

#!/usr/bin/env python
class foobar(models.Model):
name = models.CharField(max_length = 200)
age = models.IntegerField(default = 20)
sex = models.CharField(max_length = 1, choices = (('F', 'F'), ('M', 'M')), default = "M")

import dse
from someapp.models import foobar

# Constructing a DSE-instance based in the foobar-model:
dex = dse.ModelDelayedExecutor(foobar)

# Adding a new item, just defining a name and using the default values from the model:
dex.add_item({'name': 'John'})

# Overriding the default values? Just specify a valid value
dex.add_item({'name': 'Thomas', 'age': 36, 'sex': 'M'})

# Update record with id = 1 and set its name to John ( as of version 0.4.0 this will trigger a SQL-statement for this update alone, since not all columns are specified ):
dex.add_item({'id': 1, 'name': 'John'})

# Calling close() or flush() will trigger executing all prepared SQL-statements
dex.close()

Or using the modelfactory introduced in version 0.5.0. Given you have defined a model foobar and added your app to INSTALLED_APPS in settings.py,
all you have to do to update a foobar-record with id 1 and set name to "Johnny Handsome" and age to 40::

import dse
dse.modelfactory.foobar.execute({'id': 1, 'name': 'Johnny Handsome', 'age': 40}) # will trigger an UPDATE-statement

To use delayed execution of SQL statements::

dse.modelfactory.foobar.add_item({'id': 2, 'name': 'Al Capone'}) # will NOT trigger anything
dse.modelfactory.foobar.add_item({'id': 3, 'name': 'John Dillinger'}) # will NOT trigger anything
dse.modelfactory.foobar.add_item({'name': 'Scarface'}) # will NOT trigger anything
dse.modelfactory.flush() # will execute both update SQL statements and insert a record for "Scarface"

Say you want to update all records with some calculated value, something you couldn`t find a way to do in SQL.
Using dse this is easy and fast::

# Use Djangos ORM to generate dictionaries to use in DSE; objects.all().values().
for item in foobar.objects.all().values():
item['somevar'] = calculated_value
dse.modelfactory.foobar.add_item(item)
dse.modelfactory.close() # Will trigger one call to cursor.executemany because .objects.values() gives you all the columns.

I`ve recieved some questions about transaction handling. Below is an simple example, but I`m looking into other ways of handling transactions as well::

from django.db import transaction
import dse

@transaction.commit_manually
def some_method():
for item in somelist:
dse.modelfactory.SomeModel.add_item(
{'some_column': item.some_value,
'another_column': item.another_value})

dse.modelfactory.close()
transaction.commit()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dse-0.5.1.zip (14.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page