forgetSQL is a Python module for accessing SQL databases by creating
|Author:||Stian Soiland <email@example.com>|
|License:||GNU Lesser General Public License (LGPL) See the file COPYING for details.|
forgetSQL is a Python module for accessing SQL databases by creating classes that maps SQL tables to objects, normally one class pr. SQL table. The idea is to forget everything about SQL and just worrying about normal classes and objects.
- What is forgetSQL?
- Normal use
- Advanced use
- Specializing the forgetters
- Wrapping SQL queries
- Framework suggestion
Installation of the forgetSQL module is pretty straight forward:
python setup.py install
This will install forgetSQL.py into site-packages/ of your Python distribution.
- Python 2.2.1 (True/False, new style classes, classmethods, iterators)
- Some database module (tested: MySQLdb, psycopg)
If using psycopg, mx.DateTime is needed to avoid a psycopg bug related to re-inserting dates. psycopg depends on mx.DateTime, so that shouldn’t be a problem.
Let’s start by showing an example using an imaginary database mydatabase:
This example is based on these SQL tables:
And should output something like:
Account details for Stian S�iland Group unix (15) Other members: Magnus Nordseth Erlend Mjaavatten
In regular SQL programming, this could be done something like this:
cursor = dbconn.cursor() cursor.execute("SELECT fullname,groupid FROM account WHERE accountid=%s", ('stain',)) fullname,groupid = cursor.fetchone() print "Account details for", fullname cursor.execute("SELECT name FROM group WHERE groupid=%s" % groupid) (groupname,) = cursor.fetchone() print "Group %s (%s)" % (groupid, name) cursor.execute("""SELECT fullname FROM account JOIN group USING (groupid) WHERE group.groupid=%s AND NOT account.accountid=%s""", (groupid, accountid)) print "Other members:" for (membername,) in cursor.fetchall(): print membername
Now, using forgetSQL:
from mydatabase import * account = Account("stain") # primary key print "Account details for", account.fullname group = account.group print "Group %s (%s)" % (group.name, group.groupid) print "Other members: " for member in group.getChildren(Account): # find Account with group as foreign key if member <> account: print member.fullname
Notice the difference in size and complexity of these two examples.
The first example is tightly bound against SQL. The programmer is forced to think about SQL instead of the real code. This programming style tends to move high-level details to SQL, even if it is not neccessary. In this example, when getting “other members”, the detail of skipping the active user is done in SQL.
This would hardly save any CPU time on modern computers, but has made the code more complex. Thinking in SQL makes your program very large, as everything can be solved by some specialized SQL. Trying to change your program or database structure at a later time would be a nightmare.
Now, forgetSQL removes all those details for the every-day-SQL tasks. It will not be hyper-effective or give you points in the largest-join-ever-possible-contest, but it will help you focus on what you should be thinking of, making your program work.
If you at a later point (when everything runs without failures) discovers that you need to optimize something with a mega-query in SQL, you could just replace that code with regular SQL operations. Of course, if you’ve been using test-driven development (like in http://c2.com/cgi/wiki?ExtremeProgramming ) your tests will show if the replaced code works.
Another alternative could be to use views and stored procedure, and layer forgetSQL on top of those views and procedures. This has never been tested, though. =)
For each table in your database, a class is created. Each instance created of these classes refer to a row in the given table. Each instance have attributes that refer to the fields in the database. Note that the instance is not created until you access that particular row.
So accessing a column of a row is simply accessing the attribute row.column. Now, if this column is a reference to another table, a foreign key, instead of an identifier you will in row.column find an instance from the other table, ie. from the other class.
This is what happens in the example above, group = account.group retrieves this instance. Further attribute access within this instance is resolved from the matching row in the group table.
If you want to change some value, you could just change the attribute value. In the example, if you want to change my name, simply run account.fullname = "Knut Carlsen" (my boss).
You can retrieve every row in some table that refers to the current object. This is what happens in group.getChildren(Account), which will return a list of those Accounts that have a foreign key refering to group.
If you retrieve the objects several times, the constructor will return the same object the second time (unless some timeout has expired). This means that changes done to the object is immediately visible to all instances. This is to reflect normal behaviour in object oriented programming.
>>> stain = Account("stain") >>> stain2 = Account("stain") >>> stain.fullname = "Knut Carlsen" >>> print stain2.fullname Knut Carlsen
forgetSQL is not a way to store objects in a database. It is a way to use databases as objects. You cannot store arbitrary objects in the database unless you use pickling.
forgetSQL does not help you with database design, although you might choose a development style that uses regular classes and objects at first, and then design the database afterwards. You could then change your classes to use forgetSQL for data retrieval and storage, and later possibly replace forgetSQL classes with even more advanced objects.
forgetSQL does not remove the need of heavy duty SQL. In some situations, SQL is simply the best solution. forgetSQL might involve many SQL operations for something that could be done in a single operations with a large magic query. If something does not scale up with forgetSQL, even if you refactored your code, you might try using SQL instead. This example would use excessive time in a table with a million rows:
for row in table.getAll(): row.backedUp = True row.save()
This would involve creating one million object instances (each row), one million SELECTs (to get the other values that needs to be saved), and one million UPDATEs. By using getAllIterator you could reduce this to just one million UPDATEs (one SELECT, reusing the same object), but still it would be far much slower than UPDATE table SET backedUp=true.
forgetSQL does not support commits/rollback. This might be implemented later, but I’m still unsure of how to actually use this in programming. Any suggestions are welcome.
forgetSQL does not ensure that objects in memory are in sync with what is stored in the database. The values in the object will be a snapshot of how the row were at the time you first tried to retrieve an attribute. If you change some value, and then save the object, the row is updated to your version, no matter what has happened in the database meanwhile. An object does not timeout while in memory, it does not refresh it’s values unless you call _loadDB() manually, as automatically updating could confuse programmers. However, a timeout value is set, and if exceeded, new objects retrieved from database (ie. Account("stain") will be fresh.
It is not easy to make a general way to ensure objects are updated. For instance, always checking it could be heavy. It could also confuse some programs if an object suddenly changes some of it’s attributes without telling, this could fuck up any updates the program is attempting to do. On the other hand, saving a changed object as forgetSQL is now, will overwrite all attributes, not just the changed ones.
Before you can use forgetSQL, you will need to generate a module containg the classes representing database tables. Luckily, forgetSQL ships with a program that can do this for you by guessing.
The program is called forgetsql-generate and should be installed by setup.py or the packaging system. You might need the devel-version of the forgetSQL package.
Create a file tables.txt, with a list of database tables, one per line. (This is needed since there is no consistent way to query a database about it’s tables)
Then generate the module representing your tables:
forgetsql-generate --dbmodule psycopg --username=johndoe --password=Jens1PuLe --database=genious --tables tables.txt --output Genious.py
Alternative, you could pipe the table list to forgetsql-generate and avoid --tables – and likewise drop --output and capture stdout from forgetsql-generate.
The generated module is ready for use, except that you need should set database connecting details. One possible way is included in the generated code, commented out and without a password.
It is recommended to set connection details from the outside instead, since the tables might be used by different parts of a system using different database passwords, connection details could be in a configuration file, you need persistent database connections, etc.
The way to do this is to set Genious._Wrapper.cursor to a cursor method, and Genious._Wrapper._dbModule to the database module used:
import Genious import psycopg conn = psycopg.connect(user="blal", pass="sdlksdlk", database="blabla") Genious._Wrapper.cursor = conn.cursor() Genious._Wrapper._dbModule = psycopg
This should be refactored to a more userfriendly interface.
We’ll call a class that is a representation of a database table a forgetter, because it inherits forgetSQL.Forgetter. This chapter will present normal usage of such forgetters by examples.
account = Account("stain") print account.fullname
If the primary key is wrong (ie. the row does not exist) accessing account.fullname will raise forgetSQL.NotFound. The object is actually not loaded from the database until a attribute is read. (delayed loading) One problem with that is that forgetSQL.NotFound will not be raised until the attribute is read.
To test if the primary key is valid, force a load:
account = Account("stain") try: account.load() except forgetSQL.NotFound(): print "Cannot find stain" return
allAccounts = Account.getAll() for account in allAccounts: print account.accountid, account.fullname
Note that getAll is a class method, so it is available even before creating some Account. The returned list will be empty if nothing is found.
Also note that if what you want to do is to iterate, using getAllIterator() would work well. This avoids creating all objects at once.
account = Account() account.accountid = "jennyme" # primary key account.fullname = "Jenny Marie Ellingsaeter" account.save()
If you have forgotten to set some required fields, save() will fail. If you don’t set the primary key, forgetSQL will try to guess the sequence name (tablename_primarykey_seq) to retrieve a new one. This might or might not work. For MySQL some other magic is involved, but it should work.
account = Account("stain") account.fullname = "Stian Stornes" # got married to a beautiful man
You can choose wether you want to call save() or not. If you don’t call save(), the object will be saved when the object reference disappaers (ie. del account, end of function, etc.) and collected by the garbage collector. Note that this might be delayed, and that any errors will be disgarded.
If you are unsure if you have used the correct datatype or want to catch save-errors, use save():
group = Group(17) group.accountid = 'itil' # a string won't work in a integer field try: group.save() except Exception, e: print "Could not save group %s: %s" % (group, e)
The exception raised will be database module specific, like psycopg.ProgrammingError, possible containing some useful information.
save() will return True if successful.
If you changed an attribute, and you don’t want to save the change to the database (as this will happen when the garbage collector kicks in), you have two choices:
reset the instance to a blank state:
This sets everything to None, including the primary key. If you have referenced the instance anywhere else, they will now experience a blank instance.
reload from database:
Note, load() will perform a new SELECT.
Note that you don’t have to reset() if you haven’t changed any attributes, the instance will only save if anything has changed.
account = Account("stain") print account.group.accountid print account.group.name
An attribute which is a foreign key to some other table will be identified by forgetsql-generate if it’s name is something like other_table_id. If the generator could not identify foreign keys correctly, modify _userClasses in the generated Forgetter definition. (See Specializing the forgetters).
To access the real primary key, use account.group.accountid or account.group._getID(). Note that the latter will return a tupple (in case the primary key contained several columns).o
You can set a foreign key attribute to a new object from the foreign class:
import random allGroups = Group.getAll() for account in Account.getAll(): # Set the group to one of the Group instances # in allGroups account.group = random.choice(allGroups) del account # Note that by reusing the account variable all of these # will be saved by the garbage collector
or to just the foreign primary key:
account.group = 18
Note that this referencing magic makes JOIN unneccessary in many cases, but be aware that due to lazy loading (attributes are not loaded from database before they are accessed for the first time), in some cases this might result in many SELECT-calls. There are ways to avoid this, see Wrapping SQL queries.
You might want to walk in reverse, finding all accounts that have a given group as a foreign key:
group = Group(15) members = group.getChildren(Account)
This is equivalent to SQL:
SELECT * FROM account WHERE groupid=15
Note that although rows are represented as instances, they will not be deleted from the database by dereferencing. Simply removing a name binding only removes the representation. (and actually forces a save() if anything has changed).
To remove a row from the database:
account = Account("stornes") account.delete()
delete() might fail if your database claims reference integrity but does not cascade delete:
group = Group(17) group.delete()
You may specify a where-sentence to be inserted into the SELECT-call of getAll-methods:
members = Account.getAll(where="groupid=17")
Note that you must take care of proper escaping on your own by using this approach. Most database modules have some form of escape functions.
In many cases, what you want to do with WHERE is probably the same as with getChildren():
group = Group(17) members = group.getChildren(Account)
This will be as effective as generating a WHERE-clasule, since group.load() won’t be run (no attributes accessed, only the primary key).
The sentence is directly inserted, so you need to use the actual SQL column names, not the attribute names. You can use AND and OR as you like.
If you have several clauses to be AND-ed together, forgetSQL can do this for you, as the where-parameter can be a list:
where =  where.append("groupid=17") if something: where.append("fullname like 'Stian%'") Account.getAll(where=where)
If you have specified _orderBy (see Specializing the forgetters), the results of getAll* and getChildren will be ordered by those attributes.
If you want to specify ordering manually, you can supply a keyword argument to getAll:
all = Account.getAll(orderBy="fullname")
The value of orderBy could be either a string (representing the object attribute to be sorted) or a tupple of strings (order by A, then B, etc.). Note that you can only order by attributes defined in the given table.
If you want some other fancy sorting, sort the list after retrieval using regular list.sort():
all = Account.getAll() all.sort(lambda a,b: cmp(a.split()[-1], b.split()[-1])) # order by last name! :=)
There are specialized getAll methods for different situations.
If you just want the IDs in a table:
>>> all = Account.getAllIDs() ['stornes', 'stain', 'magnun', 'mjaavatt']
The regular getAll() actually runs getAllIDs(), and returns a list of instances based on those IDs. The real data is not loaded until attribute access. In some cases, this might be OK, for instance if you want to call getChildren and really don’t care about the attribute values.
If you are going to iterate through the list, a common case, use instead:
for account in Account.getAllIterator(): print account.fullname
This will return an iterator, not a list, returning Account objects. For each iteration, a new instance is returned, with all fields loaded. Internally in the iterator, a buffer of results from SELECT * is contained.
In Python, object creation is a bit expensive, so you might reuse the same object for each iteration by creating it first and specifying it as the keyword argument useObject:
for account in Account.getAllIterator(useObject=Account()): print account.fullname
Note that changes made to account in this case will be flushed unless you manually call save(). Do not pass this instance on, as it’s content will change for each iteration.
Finally, getAllText() will use _shortView (See Specializing the forgetters) and return tupples of (id, text). This is useful for a dropdown-list of selectors.
Sorry, this section is currently unfinished.
Sorry, this section is currently unfinished.
Sorry, this section is currently unfinished.