a little fork of peewee
According to Charles, the Peewee ORM is currently going through a major 3.0 rewrite and isn’t accepting many (any?) new features for the 2.x line. Because of this, I made Herman as a light fork of Peewee (2.8.1) implementing some syntactic sugar and minor tweaks I’ve liked from other ORMs.
This is mostly for my company’s internal use, but anyone else is free to use it. I’m maintaining API compatibility as a drop in replacement for peewee 2.x.
sudo pip uninstall peewee sudo pip install herman
Or to upgrade:
sudo pip install herman --upgrade
There is no `herman` package. Herman represents itself as peewee (with the same peewee __version__) as a drop-in replacement. To check if you’re running Peewee or Herman, check for peewee.__herman_version__.
The SQL generated by an ORM isn’t just an artifact to be consumed by the database. Developers need to be able to read it as well. The more readable it is and the closer it is to what one would naturally write, the less time it takes to manually evaluate / debug it.
For example, this query:
Article.select() .join(Person, join_type=pw.JOIN.LEFT_OUTER, on=(Article.author==Person.id)) .where(Article.author==derek)
In Peewee generates this:
SELECT "t1"."id", "t1"."title", "t1"."author_id", "t1"."editor_id" FROM "article" AS t1 LEFT OUTER JOIN "person" AS t2 ON ("t1"."author_id" = "t2"."id") WHERE ("t1"."author_id" = ?
Whereas Herman generates:
SELECT a1.id, a1.title, a1.author_id, a1.editor_id FROM article AS a1 LEFT OUTER JOIN person AS p1 ON (a1.author_id = p1.id) WHERE (a1.author_id = ?)
You’ll notice that in Herman:
In Peewee generates this:
SELECT "t1"."id", "t1"."title", "t1"."author_id", "t1"."editor_id" FROM "article" AS t1
Where in Herman this is generated:
SELECT a1.id, a1.title, a1.author_id, a1.editor_id FROM article AS a1 LIMIT 1
Notice the additional “LIMIT 1” in Herman. Peewee 2.8.1 rather loads entire result set and returns this first object. (It’s my understanding future versions of Peewee will also add a “LIMIT 1”.)
In Peewee you can create table aliases as a variable outside your query. For example:
author_table = Person.alias() Article.select() .join(author_table) .where(author_table.name=='Derek')
This gets you a very respectable:
SELECT "t1"."id", "t1"."title", "t1"."author_id", "t1"."editor_id" FROM "article" AS t1 INNER JOIN "person" AS t2 ON ("t1"."author_id" = "t2"."id") WHERE ("t2"."name" = ?)
But in some queries it’s very useful to specify what the aliases are. In Herman you can do this:
author_table = Person.alias('author') Article.select() .join(author_table) .where(author_table.name=='Derek')
Which generates this:
SELECT a1.id, a1.title, a1.author_id, a1.editor_id FROM article AS a1 INNER JOIN person AS author ON (a1.author_id = author.id) WHERE (author."name" = ?)
These two statements are equivalent in Herman:
author_table = Person.alias('author') author_table = Person.as_('author')
And because aliases are named, you’re no longer required to use the external variable:
Article.select() .join(Person.as_('author')) .where(Person.as_('author').name=='Derek')
Whether in-lining the aliases makes it easier or harder to read is entirely dependent on the query and the code block it’s in, but it’s good to have the option. Both are supported in Herman.
If a query is otherwise invalid and there is only one possible interpretation of the query author’s intent, automatically tie the un-aliased column to the appropriate alias. For example:
Person.alias('xyz').select().where(Person.name == 'Derek')
Generates invalid SQL in Peewee (which is run on the database, which throws an exception):
SELECT "t1"."id", "t1"."name", "t1"."parent_id" FROM "person" AS t1 WHERE ("t2"."name" = ?) peewee.OperationalError: no such column: t2.name
But in Herman it’s valid:
SELECT xyz.id, xyz."name", xyz.parent_id FROM person AS xyz WHERE (xyz."name" = ?)
This would NOT work if more than one Person alias were included in the query.
Peewee will generate and run on the database SQL it knows is invalid. For example:
Person.select().where(Article.title == 'xyz').first()
In Peewee will throw a database error:
SELECT "t1"."id", "t1"."name", "t1"."parent_id" FROM "person" AS t1 WHERE ("t2"."title" = ?) peewee.OperationalError: no such column: t2.title
This isn’t good for two reasons. First, I don’t like relying on the database to catch easily detectable errors for us. Second, the error is opaque and specific to the internal implementation details of peewee (the “t2”).
Herman on the other hand will not generate the SQL at all, instead raising:
peewee.ProgrammingError: <class '__main__.Article'> is not a part of this query
The get() method in Peewee adds a “LIMIT 1” and returns the first object. I feel this is incorrect behavior. The difference between first() and get() is get() should assert that only one matching record exists. (This is something Django got right IMO.) If my query criteria hasn’t correctly isolated a unique row the ORM should throw an exception.
This is why Herman added a “LIMIT 2” to the SQL genned from get(), and does a check on the number of object returned. For example:
derek = Person.create(name='Derek') callie = Person.create(name='Callie') Person.select().get()
peewee.DataError: Too many instances matching query exist: SQL: SELECT p1.id, p1."name", p1.parent_id FROM person AS p1 LIMIT 2
Rather than returning a random Person object selected by the database.
In Herman, this:
Is the same as this:
It’s just a nomenclature I preferred from Django. I made it uppercase to prevent conflicts with any columns named “all”, and to highlight that it’s effectively a constant.
If I have a set of models:
class Person(pw.Model): name = pw.CharField() class Article(pw.Model): title = pw.CharField() author = pw.ForeignKeyField(db_column='author_id', rel_model=Person, to_field='id') editor = pw.ForeignKeyField(db_column='editor_id', rel_model=Person, to_field='id', related_name='edited_articles', null=True) class Reply(pw.Model): text = pw.CharField() article = pw.ForeignKeyField(db_column='article_id', rel_model=Article, to_field='id')
And I want to do something fancy like get all replies with their articles and authors and editors, in Peewee I have to do something like this:
author_table = Person.alias() editor_table = Person.alias() replies = Reply.select(Reply, author_table, editor_table) .join(Article) .join(author_table, join_type=pw.JOIN.LEFT_OUTER, on=(author_table==Article.author)) .switch(Article) .join(editor_table, join_type=pw.JOIN.LEFT_OUTER, on=(editor_table==Article.editor)) .where(author_table.name=="Derek")
Which is all sorts of complicated. For instance:
Herman offers a simpler syntax:
Reply.ALL .plus(Reply.article, Article.author.as_('author')) .plus(Reply.article, Article.editor) .where(Person.as_('author').name=="Derek")
Which generates the same SQL:
SELECT r1.id, r1."text", r1.article_id, a1.id, a1.title, a1.author_id, a1.editor_id, author.id, author."name", author.parent_id, p1.id, p1."name", p1.parent_id FROM reply AS r1 LEFT OUTER JOIN article AS a1 ON (r1.article_id = a1.id) LEFT OUTER JOIN person AS author ON (a1.author_id = author.id) LEFT OUTER JOIN person AS p1 ON (a1.editor_id = p1.id) WHERE (author."name" = ?)
The plus() method takes a variable number of ForeignKeyField objects which represent a path away from the primary query object (Reply in this case). For example:
Gets all the replies with their associated articles.
Gets all the replies with their associated articles and all the articles editors. Note that “Reply.article” is a foreign key from Reply to Article, and Article.editor is a foreign key from Article to Person. The list of foreign keys must create a logical path where the “to” type of one FK matches the “from” type of the next FK.
Following the same path twice is harmless. For instance:
Is no different than calling plus() once. This is why we can specify multiple paths that have some overlap, for example:
Reply.ALL .plus(Reply.article, Article.author) .plus(Reply.article, Article.editor)
Which will join on the article table only once.
You can also alias your joined tables with:
Which allows you to reference it later in your conditional:
Herman’s plus() also supports following foreign keys from one-to-many. Like:
Which internally does a prefetch to populate the article with all of its replies. There will be O(k) SQL statements executed, where k is the number of two-many relationships. All of these queries will be grouped into one transaction to guarantee correctness.
Remember that foreign keys represent edges in your object graph, and a call to plus(*edges) tells Herman to include that path from the object graph in your query.
This semantics for plus() have been co-opted from the DKO project, which I authored for my former employer. DKO’s version of this syntax has been in broad production use since 2010 by hundreds of developers, accessing some of the largest (billions of rows) conventional relational databases that exist.
If you call:
Peewee this will load a list of all objects, permanently cache said list, and then call len() on that cache.
In Herman this will call count() on the database and return the resulting integer. It does not build the list of objects in python nor cache anything. However, for backwards compatability, if something else has already populated the cached results of the query, it will call len() on that.
The semantics behind Peewee’s circular foreign key dependencies get kind of unwieldy when you have more than a few models (and they’re spread over multiple files). This is because the DeferredRelation object has to be defined, used, then the other model defined in another file, then set_model has to be called on the original, and then you’re left with the object reference dangling around that has no purpose. IE the example in the docs:
# Create a reference object to stand in for our as-yet-undefined Tweet model. DeferredTweet = DeferredRelation() class User(Model): username = CharField() # Tweet has not been defined yet so use the deferred reference. favorite_tweet = ForeignKeyField(DeferredTweet, null=True) class Tweet(Model): message = TextField() user = ForeignKeyField(User, related_name='tweets') # Now that Tweet is defined, we can initialize the reference. DeferredTweet.set_model(Tweet)
Ours happens all in the model definition with an optional parameter given to DeferredRelation. Like:
class User(Model): username = CharField() # Tweet has not been defined yet so use the deferred reference. favorite_tweet = ForeignKeyField(DeferredRelation('Tweet'), null=True) class Tweet(Model): message = TextField() user = ForeignKeyField(User, related_name='tweets')
This removes the need for the extra variable in the global namespace and the coordination of it over multiple files. And since the parameter is optional, it is fully backwards-compatible with the old syntax.
Our patch for this has been incorporated upstream, so this is forwards-compatible too, following Peewee’s next release.
If you try to do a IN operation on an empty list:
User.select().where(User.id << )
Peewee will generate the following SQL:
SELECT "t1"."id", "t1"."username" FROM "user" AS t1 WHERE ("t1"."id" IN ())
Which the database will reject as invalid, throwing an exception. We instead generate a “false” statement:
SELECT u1.id, u1.username FROM "user" AS u1 WHERE (0 = 1)
So you don’t have to manually test for empty lists every time you use a SQL IN.
Peewee is a simple and small ORM. It has few (but expressive) concepts, making it easy to learn and intuitive to use.
New to peewee? Here is a list of documents you might find most helpful when getting started:
Defining models is similar to Django or SQLAlchemy:
from peewee import * from playhouse.sqlite_ext import SqliteExtDatabase import datetime db = SqliteExtDatabase('my_database.db') class BaseModel(Model): class Meta: database = db class User(BaseModel): username = CharField(unique=True) class Tweet(BaseModel): user = ForeignKeyField(User, related_name='tweets') message = TextField() created_date = DateTimeField(default=datetime.datetime.now) is_published = BooleanField(default=True)
Connect to the database and create tables:
db.connect() db.create_tables([User, Tweet])
Create a few rows:
charlie = User.create(username='charlie') huey = User(username='huey') huey.save() # No need to set `is_published` or `created_date` since they # will just use the default values we specified. Tweet.create(user=charlie, message='My first tweet')
Queries are expressive and composable:
# A simple query selecting a user. User.get(User.username == 'charles') # Get tweets created by one of several users. The "<<" operator # corresponds to the SQL "IN" operator. usernames = ['charlie', 'huey', 'mickey'] users = User.select().where(User.username << usernames) tweets = Tweet.select().where(Tweet.user << users) # We could accomplish the same using a JOIN: tweets = (Tweet .select() .join(User) .where(User.username << usernames)) # How many tweets were published today? tweets_today = (Tweet .select() .where( (Tweet.created_date >= datetime.date.today()) & (Tweet.is_published == True)) .count()) # Paginate the user table and show me page 3 (users 41-60). User.select().order_by(User.username).paginate(3, 20) # Order users by the number of tweets they've created: tweet_ct = fn.Count(Tweet.id) users = (User .select(User, tweet_ct.alias('ct')) .join(Tweet, JOIN.LEFT_OUTER) .group_by(User) .order_by(tweet_ct.desc())) # Do an atomic update Counter.update(count=Counter.count + 1).where( Counter.url == request.url)
Check out the example app for a working Twitter-clone website written with Flask.
Check the documentation for more examples.
Specific question? Come hang out in the #peewee channel on irc.freenode.net, or post to the mailing list, http://groups.google.com/group/peewee-orm . If you would like to report a bug, create a new issue on GitHub.
I’ve written a number of blog posts about building applications and web-services with peewee (and usually Flask). If you’d like to see some real-life applications that use peewee, the following resources may be useful: