Skip to main content

Wraps dicts in an object for convenient document management

Project description

Document is a simple wrapper for dicts that provides an object-oriented interface for accessing keys, as well as the ability to add metadata and utility functions to your data. The primary purpose of the Document class is to make working with PyMongo data easier, but it is in no way restricted to this use case. It has no dependencies outside of Python’s standard library.

Document is released under MIT license.


Document can be installed from PyPI:

easy_install document


pip install document

You can also simply download the document module and add it to your project.


Let’s first take a look at the constructor. The Document constructor takes any number of keyword argument which are stored as a dict internally.

>>> from document import Document
>>> my_doc = Document(foo='bar', baz=12)

The dictionary keys can be accessed either as properties or keys:

>>> my_doc['baz']

When using property access, you can also set new keys:

>>> = 1

If you access a missing property, you will get a KeyError instead of AttributeError because, under the hood, we are looking up dictionary keys rather than attributes.

>>> my_doc.bogus
Traceback (most recent call last):
KeyError: 'bogus'

This difference is worth noting if you are a practicioner of EAFP.

Unlike normal Python dictionaries, key access can drill down multiple levels. Consider this example:

>>> another_doc = Document(foo={'bar': 'baz'})
>>> another_doc['']

As you can see, using a period in the key name will give us access to the nested dict’s key. For breviti, we will call such keys ‘multipart’ keys.

The multipart keys also work when setting values:

>>> another_doc[''] = 'fam'
{'bar': 'fam'}

You can also use the get() method with the multipart keys.

>>> another_doc.get('')
>>> another_doc.get('foo.baz')

Testing for existence of a key works with multipart keys as well:

>>> '' in another_doc
>>> 'foo.baz' in another_doc

Because of the multipart keys, you cannot use periods in your keys. Those will simply become inaccessible through the normal interface. You can still access them through the private _document key, but that is not recommended, since the private property is an implementation detail and may be renamed or removed in future releases.

Although Document sports the full array of dict methods like pop() and items(), they don’t work with mutlipart keys but only with top-level keys.

Apart from dict methods, Document implements a few non-standard methods. One of them is slice() which allows you to get a dict containing a subset of the keys.

>>> a_doc = Document(foo=1, bar=2, baz=3)
>>> a_doc.slice('foo', 'baz')
{'foo': 1, 'baz': 3}

To get back the full dict with all keys, use the to_dict() method:

>>> a_doc.to_dict()
{'foo': 1, 'bar': 2, 'baz': 3}

Note that to_dict() always returns a copy of the internal dict, not a reference to it. Any modification you do to the dict returned by to_dict() will not reflect on whatever is stored in the document.

For convenience, and for Python purists, the Document object provides a from_dict() method that returns a new document from a dict.

>>> b_doc = Document.from_dict({'foo': 'bar'})

If you don’t care about purity, you can always use the ** magic and the constructor.

>>> b_doc = Document(**{'foo': 'bar'})

The main difference between using from_dict and the ** magic is the type of the keys that end up in the dict. When you use the magic (and keyword arguments for that matter), the keys all become strings (in Python 2.x), whereas unicode keys can be preserved when using from_dict() (and also the update() method).


Now you might be wondering why you need a whole class to deal with dicts when dicts work perfectly fine in Python. That’s a valid question. The main motivation behind Document was to allow developers to define custom methods and especially properties that would be separate from the data, but still accessible using a similar interface. What this allows us is to have ultitiy methods and metadata attached to our data, that are not serialized and/or saved into the database.

To demonstrate this we will create a custom User document.

To create such a document, we first subclass the Document class. This is generally the intended purpose of the Document class, and you should always subclass it and add new properties. If you feel you don’t need to subclass, you can probably get away with a plain dict.

Back to our example, let’s say we have a user document that should have an authenticated flag that is, for obvious reasons, only used during a request-response cycle, and not saved to the database. We also want to have a method that will check passwords, as well as one that will set it. The subclass might look something like this:

class User(Document):
    authenticated = False

    def check_password(self, password):
        return encrypt(password) == self.password

    def set_password(self, password):
        self.password = encrypt(password)

Now we can, say, retrieve a dict from a database and convert it to a user document (using some imaginary database and request API in this example):

user_dict = db.users.get(username='foo')
password = request.params['password']
user = User.from_dict(user_dict)
if user.check_password(password):
    user.authenticated = True
    session['user'] = user
    return 'success!'
return 'wrong username or password'

Suppose the database expects us to save a new record by passing it a dict representing the record’s data (which is how PyMongo works, for example). Let’s store a new user:

username = request.params['username']
password = request.params['password']
user = User(usernam=username)

By using the to_dict() method, we avoid having to deal with authenticated property, as well as the two methods we have defined on the User document. Only the username and encrypted passwords are saved. This provides a clean separation of what we consider metadata and actual data.

This separation has other consequences. Comparing two records with different metadata will only compare the actual data. For example:

>>> class FooDoc(Document):
...    meta = True
>>> foo1 = FooDoc(foo=1)
>>> foo2 = FooDoc(foo=1)
>>> foo1.meta = False
>>> foo1 == foo2

Despite the two documents having different values for the meta property, they are still considered equal because the actual data is equal.

Another thing to note that, because we can have custom properties, and also assign dictionary keys using properties, only the properties that are defined on the class can actually be set as properties, and everything else is considered a dictionary key. To demonstrate this, we will use the FooDoc class defined before.

>>> foo1 = FooDoc(foo=1)
>>> foo1.meta = True  # Sets the ``meta`` property
>>> foo1.metadata = 'bar'  # Creates an actual dict key called ``metadata``
>>> foo1.to_dict()
{'foo': 1, 'metadata': 'bar'}

API documentation

The whole document module is a little under 440 lines of code including inline documentation and doctests. Therefore, you are advised to look at the source code for in-depth API documentation. All examples in the inline documentation double as unit tests so they are virtually guaranteed to work as documented.

Reporting bugs

Report all bugs to the BitBucket issue tracker

Project details

Release history Release notifications | RSS feed

This version


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions (12.1 kB view hashes)

Uploaded source

document-1.0.tar.gz (8.7 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page