Skip to main content

Basic MongoDB wrapper for object-oriented collection handling

Project description

Python Basic Utilities - Mongo pbumongo

Available on PyPi

Table of Contents

  1. Installation
  2. Usage
  3. Classes
    1. AbstractMongoStore - abstract class for handling MongoDB collection access
      1. MongoConnection - a helper class to assist with creating multiple store instances
    2. AbstractMongoDocument - abstract class for wrapping MongoDB BSON documents
    3. ProgressUpdater - a collection of classes to help with updating job progress
  4. Archives

Installation

Install via pip:

pip install pbumongo

Usage

It is good practice associating a sub-class of AbstractMongoDocument with a sub-class of AbstractMongoStore. This is done through the deserialised_class parameter in the super() constructor call of the store class. Any method for querying documents will use that class to deserialise the BSON document into the provided class, which should extend AbstractMongoDocument.

Example: let's say we want to implement access to a collection containing user documents. We'll define a class User that extends AbstractMongoDocument and a class UserStore that extends AbstractMongoStore.

# main imports
from pbumongo import AbstractMongoDocument, AbstractMongoStore
# supporting imports
import crypt
from typing import List, Optional
from time import time


# this is an example of a minimum viable class
class User(AbstractMongoDocument):
    def __init__(self):
        super().__init__()
        # define attributes with meaningful defaults
        self.username: str = None
        self.password: str = None
        self.permissions: List[str] = []
        self.last_login: int = 0

    def get_attribute_mapping(self) -> dict:
        # the values are what is used inside MongoDB documents
        return {
            "username": "username",
            "password": "password",
            "permissions": "permissions",
            "last_login": "lastLogin"
        }

    @staticmethod
    def from_json(json: dict):
        user = User()
        user.extract_system_fields(json)
        return user


class UserStore(AbstractMongoStore):
    def __init__(self, mongo_url, mongo_db, collection_name):
        super().__init__(mongo_url, mongo_db, collection_name, deserialised_class=User, data_model_version=1)

    def login(self, username, password) -> Optional[User]:
        # encrypt the password!
        pw_encrypted = crypt.crypt(password, crypt.METHOD_MD5)
        user: Optional[User] = self.query_one({"username": username, "password": pw_encrypted})
        if user is not None:
            # update last_login attribute and save it in database as well
            user.last_login = round(time())
            self.update_one(AbstractMongoStore.id_query(user.id),
                            AbstractMongoStore.set_update("lastLogin", user.last_login))
        return user

    def create_user(self, username, password) -> User:
        # check if this user already exists
        existing = self.query_one({"username": username})
        if existing is not None:
            raise ValueError(f"User with username '{username}' already exists.")
        # create new user object
        user = User()
        user.username = username
        user.password = crypt.crypt(password, crypt.METHOD_MD5)
        # store in database and return document
        user_id = self.create(user)
        return self.get(user_id)

MongoConnection

To use these classes in your application, you can use the MongoConnection helper or create the UserStore class instance directly. The MongoConnection helper is useful, when you have a lot of collections and don't want to repeat the mongo connection URL and DB name for every constructor.

from pbumongo import MongoConnection
from mypackage import UserStore  # see implementation above

con = MongoConnection("mongodb://localhost:27017", "myDbName")
user_store = con.create_store(store_class=UserStore, collection_name="users")

user = user_store.login(username="admin", password="mypassword")

Classes

AbstractMongoStore

This is an abstract class and cannot be instantiated directly. Instead, define a class that extends this class.

Constructor

__init__(mongo_url, mongo_db, collection_name, deserialised_class, data_model_version=1, archive_store)

  • mongo_url - this is the Mongo connection URL containing the host, port and optional username, password
  • mongo_db - this is the Mongo DB name - the one you provide when using use <dbname> on the Mongo shell
  • collection_name - the name of the collection - e.g. myCollection for db.myCollection.find({}) on the Mongo shell
  • deserialised_class - used for all the query methods to deserialise the BSON document into a class with attributes for easier access
  • data_model_version - a number that can be used for database migration as an app develops over time

Methods

  • get(doc_id: str) - fetches a single document with a matching doc_id == document["_id"]
  • get_all() - fetches the entire collection content and deserialises every document. Careful, this is not an iterator, but returns a list of all the documents and can consume quite a bit of compute and memory.
  • create(document) - creates a new document and returns the _id of the newly created BSON document as string. The document can be either dict or an instance of the deserialised_class provided in the super().__init(..) call.
    • Since version 1.0.1 a new parameter is available create(document, return_doc=True) which will return the entire document/object instead of just the _id of the newly created document.
  • query_one(query: dict) - fetches a single document and deserialises it or returns None if no document can be found
  • query(query: dict, sorting, paging) - fetches multiple documents and deserialises them. sorting can be an attribute name (as provided in the BSON) or a dictionary with the sort order. paging is an instance of pbumongo.PagingInformation.
  • update_one(query: dict, update: dict) - proxies the db.collection.updateOne(..) function from the Mongo shell
  • update(query:, update: dict - same as update_one, but will update multiple documents, if the query matches
  • update_full(document) - shortcut for updating the entire document with an updated version, the query will be constructed from the id/_id provided by the document.
  • delete(doc_id) - deletes a single document with the provided document ID
  • delete_many(query: dict) - deletes multiple documents matching the query.
  • set_archive(archive_store: AbstractMongoStore) - pass another store instance used for backups/archives, should also used to create indexes in the main store - see Archives
  • run_archive(options: Optional) - can be implemented by the sub-class, by default does nothing. Options can be anything the implementing class wants

Static Methods

  • AbstractMongoStore.id_query(string_id: str) - creates a query { "_id": ObjectId(string_id) }, which can be used to query the database
  • AbstractMongoStore.set_update(keys, values) - creates a $set update statement. If only a single attribute is updated, you can pass them directly as parameters, e.g. updating a key "checked": True, can be done by .set_update("checked", True). If you update multiple attributes provide them as list in the matching order.
  • AbstractMongoStore.unset_update(keys) - creates an $unset update statement with the attributes listed as keys. Similarly to .set_update, you can provide a single key without a list for ease of use.

AbstractMongoDocument

This is an abstract class and cannot be instantiated directly. Instead, define a class that extends this class.

Constructor

__init__(doc_id=None, data_model_version=None)

The parameters are entirely optional. Generally it is recommended to use the static method from_json(json: dict) to create BSON documents you've loaded from the database instead of calling the constructor. For new documents, you would not provide the _id as the store class handles that.

Methods

For methods and static methods please see the documentation of JsonDocument from pbu. AbstractMongoDocument extends that class.

ProgressUpdater

The ProgressUpdaer class is part of a set of classes that assist with keeping track of job progress. The other classes are:

  • ProgressObject: a database object with fields for a status (see pbu > JobStatus), start and end timestamp, total count, processed count, a list of errors and a main error.
  • ProgressObjectStore: an abstract class that provides store methods to update status, progress and errors of a ProgressObject
  • ProgressError: a JSON document containing an error message as well as a dictionary for data related to the error. These objects will be appeneded to a ProgressObject's errors list.
  • ProgressUpdater: an object to pass into a processor, which holds references to the progress store and progress object and provides methods for updating progress and handling errors.

Both, ProgressObject and ProgressObjectStore are abstract classes and should be extended with remaining attributes of a process / job definition (like a name/label, extra configuration, etc.). ProgressObject is an AbstractMongoDocument and ProgressUpdateStore is an AbstractMongoStore.

Archives

Since 1.3.0 each AbstractMongoStore provides an interface for archives/backups with the folowing goals in mind:

  • In some cases, when a collection contains lots of documents and you have a few indexes for faster queries running, MongoDBs memory consumption can get quite high. So it can make sense to archive older documents in a separate store/collection that is identical, but doesn't have these indexes.
  • The main store should have access to it's own archive.
  • It's possible to provide a different store class as archive store.

Usage:

from pbumongo import MongoConnection
from my_stores import InvoiceStore

con = MongoConnection("mongodb://localhost:27017", "myDbName")
invoice_store = con.create_store(
    store_class=InvoiceStore, 
    collection_name="invoices"
).set_archive_store(
    con.create_store(
        store_class=InvoiceStore, 
        collection_name="invoicesArchive"
    )
)

This creates 2 instances of InvoiceStore, each with their own collection name. The invoice_store (the main store) knows about its archive store and can access it as self.archive_store.

The second instance of InvoiceStore (the archive store) can detect whether it is the archive by checking: if self.archive_store is None.

The set_archive_store method is only called for the main store, which makes it the best place to create indexes instead of doing this in the constructor.

class InvoiceStore(AbstractMongoStore):
    def set_archive_store(self, archive_store):
        # create/ensure your indexes
        self.collection.create_index(...)
        self.collection.create_index(...)
        self.collection.create_index(...)

        # ensure to call super() - this will set self.archive_store
        return super().set_archive_store(archive_store)

You can pass a different class as archive_store. I would not recommend doing this, as it complicates things if you want to use the archive for lookups as well, in case your main store does not return any results (e.g. for a start/end date query, see example below). This can however be mitigated by the archive store translating its own document structure into the structure of the main store.

By default no other method will use the archive store. It is purely there for convenience. And so is the run_archive() method.

Archive lookups can be expensive and slow down a system. Be careful about when to allow to access the archive store and when not and allow for longer query times when using them, as they shouldn't have the same indexes as the main store.

Example of query that uses the archive store:

class MyStore(AbstractMongoStore):
    def find_by_dates(self, start: int, end: int):
        query = {"timestamp": {"$gte": start, "$lte": end}}
        result = self.query(query)

        if len(result) == 0 and self.archive_store is not None:
            # regular query did not return anything, proxy the query to the archive store
            return self.archive_store.find_by_dates(start, end)
        return result

You can also combine archive results with regular results, provided they map to the same object. The start parameter in above example is perfect to be used for this by checking if start < last_archive_date.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pbumongo-1.3.0.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pbumongo-1.3.0-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file pbumongo-1.3.0.tar.gz.

File metadata

  • Download URL: pbumongo-1.3.0.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.11

File hashes

Hashes for pbumongo-1.3.0.tar.gz
Algorithm Hash digest
SHA256 7902d9e8b5cdfb203366c497963985596af5d86efc43c9def416c3c879f3941d
MD5 d91a0ed8bbd1a78095460402481ac33b
BLAKE2b-256 e8848d39c90c035c73714af548acd63cd602a205a7024f248aeff7bdd292add5

See more details on using hashes here.

File details

Details for the file pbumongo-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: pbumongo-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.11

File hashes

Hashes for pbumongo-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 67a2bd460b1d393e1485b834f116eb36050b95e99ab8ec52a554240cc820dadb
MD5 46caf470e5fd43423b67fc581a07238e
BLAKE2b-256 46359e303887a46936cb929499257ebc6152f3b99fae776311c4c05a0195c753

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page