Wraps dicts in an object for convenient document management
Project description
Document is a simple wrapper for dicts that provides an object-oriented interface for accessing keys, as well as the ability to add metadata and utility functions to your data. The primary purpose of the Document class is to make working with PyMongo data easier, but it is in no way restricted to this use case. It has no dependencies outside of Python’s standard library.
Document is released under MIT license.
Installation
Document can be installed from PyPI:
easy_install document
or:
pip install document
You can also simply download the document module and add it to your project.
Basics
Let’s first take a look at the constructor. The Document constructor takes any number of keyword argument which are stored as a dict internally.
>>> from document import Document >>> my_doc = Document(foo='bar', baz=12)
The dictionary keys can be accessed either as properties or keys:
>>> my_doc.foo 'bar' >>> my_doc['baz'] 12
When using property access, you can also set new keys:
>>> my_doc.bar = 1
If you access a missing property, you will get a KeyError instead of AttributeError because, under the hood, we are looking up dictionary keys rather than attributes.
>>> my_doc.bogus Traceback (most recent call last): .... KeyError: 'bogus'
This difference is worth noting if you are a practicioner of EAFP.
Unlike normal Python dictionaries, key access can drill down multiple levels. Consider this example:
>>> another_doc = Document(foo={'bar': 'baz'}) >>> another_doc['foo.bar'] 'baz'
As you can see, using a period in the key name will give us access to the nested dict’s key. For breviti, we will call such keys ‘multipart’ keys.
The multipart keys also work when setting values:
>>> another_doc['foo.bar'] = 'fam' >>> another_doc.foo {'bar': 'fam'}
You can also use the get() method with the multipart keys.
>>> another_doc.get('foo.bar') 'fam' >>> another_doc.get('foo.baz') None
Testing for existence of a key works with multipart keys as well:
>>> 'foo.bar' in another_doc True >>> 'foo.baz' in another_doc False
Because of the multipart keys, you cannot use periods in your keys. Those will simply become inaccessible through the normal interface. You can still access them through the private _document key, but that is not recommended, since the private property is an implementation detail and may be renamed or removed in future releases.
Although Document sports the full array of dict methods like pop() and items(), they don’t work with mutlipart keys but only with top-level keys.
Apart from dict methods, Document implements a few non-standard methods. One of them is slice() which allows you to get a dict containing a subset of the keys.
>>> a_doc = Document(foo=1, bar=2, baz=3) >>> a_doc.slice('foo', 'baz') {'foo': 1, 'baz': 3}
To get back the full dict with all keys, use the to_dict() method:
>>> a_doc.to_dict() {'foo': 1, 'bar': 2, 'baz': 3}
Note that to_dict() always returns a copy of the internal dict, not a reference to it. Any modification you do to the dict returned by to_dict() will not reflect on whatever is stored in the document.
For convenience, and for Python purists, the Document object provides a from_dict() method that returns a new document from a dict.
>>> b_doc = Document.from_dict({'foo': 'bar'}) >>> b_doc.foo 'bar'
If you don’t care about purity, you can always use the ** magic and the constructor.
>>> b_doc = Document(**{'foo': 'bar'})
The main difference between using from_dict and the ** magic is the type of the keys that end up in the dict. When you use the magic (and keyword arguments for that matter), the keys all become strings (in Python 2.x), whereas unicode keys can be preserved when using from_dict() (and also the update() method).
Extending
Now you might be wondering why you need a whole class to deal with dicts when dicts work perfectly fine in Python. That’s a valid question. The main motivation behind Document was to allow developers to define custom methods and especially properties that would be separate from the data, but still accessible using a similar interface. What this allows us is to have ultitiy methods and metadata attached to our data, that are not serialized and/or saved into the database.
To demonstrate this we will create a custom User document.
To create such a document, we first subclass the Document class. This is generally the intended purpose of the Document class, and you should always subclass it and add new properties. If you feel you don’t need to subclass, you can probably get away with a plain dict.
Back to our example, let’s say we have a user document that should have an authenticated flag that is, for obvious reasons, only used during a request-response cycle, and not saved to the database. We also want to have a method that will check passwords, as well as one that will set it. The subclass might look something like this:
class User(Document): authenticated = False def check_password(self, password): return encrypt(password) == self.password def set_password(self, password): self.password = encrypt(password)
Now we can, say, retrieve a dict from a database and convert it to a user document (using some imaginary database and request API in this example):
user_dict = db.users.get(username='foo') password = request.params['password'] user = User.from_dict(user_dict) if user.check_password(password): user.authenticated = True session['user'] = user return 'success!' return 'wrong username or password'
Suppose the database expects us to save a new record by passing it a dict representing the record’s data (which is how PyMongo works, for example). Let’s store a new user:
username = request.params['username'] password = request.params['password'] user = User(usernam=username) user.set_password(password) db.users.save(user.to_dict())
By using the to_dict() method, we avoid having to deal with authenticated property, as well as the two methods we have defined on the User document. Only the username and encrypted passwords are saved. This provides a clean separation of what we consider metadata and actual data.
This separation has other consequences. Comparing two records with different metadata will only compare the actual data. For example:
>>> class FooDoc(Document): ... meta = True >>> foo1 = FooDoc(foo=1) >>> foo2 = FooDoc(foo=1) >>> foo1.meta = False >>> foo1 == foo2 True
Despite the two documents having different values for the meta property, they are still considered equal because the actual data is equal.
Another thing to note that, because we can have custom properties, and also assign dictionary keys using properties, only the properties that are defined on the class can actually be set as properties, and everything else is considered a dictionary key. To demonstrate this, we will use the FooDoc class defined before.
>>> foo1 = FooDoc(foo=1) >>> foo1.meta = True # Sets the ``meta`` property >>> foo1.metadata = 'bar' # Creates an actual dict key called ``metadata`` >>> foo1.to_dict() {'foo': 1, 'metadata': 'bar'}
API documentation
The whole document module is a little under 440 lines of code including inline documentation and doctests. Therefore, you are advised to look at the source code for in-depth API documentation. All examples in the inline documentation double as unit tests so they are virtually guaranteed to work as documented.
Reporting bugs
Report all bugs to the BitBucket issue tracker
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.