Skip to main content

MongoDB aggregation pipelines made easy. Joins, grouping, counting and much more...

Project description

Overview

Monggregate is a library that aims at simplifying usage of MongoDB aggregation pipelines in python. It is based on MongoDB official python driver, pymongo and on pydantic.

Features

  • provides an OOP interface to the aggregation pipeline.
  • allows you to focus on your requirements rather than MongoDB syntax
  • integrates all the MongoDB documentation and allows you to quickly refer to it without having to navigate to the website.
  • enables autocompletion on the various MongoDB features.
  • offers a pandas-style way to chain operations on data.

Requirements

This package requires python > 3.10, pydantic > 1.8.0

Installation

The repo is now available on PyPI:

pip install monggregate

Usage

The below examples reference the MongoDB sample_mflix database

Basic Pipeline usage

from dotenv import load_dotenv
import pymongo
from monggregate import Pipeline, S

# Load config from a .env file:
load_dotenv(verbose=True)
MONGODB_URI = os.environ["MONGODB_URI"]

# Connect to your MongoDB cluster:
client = pymongo.MongoClient(MONGODB_URI)

# Get a reference to the "sample_mflix" database:
db = client["sample_mflix"]

# Creating the pipeline
pipeline = Pipeline()

# The below pipeline will return the most recent movie with the title "A Star is Born"
pipeline.match(
    title="A Star is Born"
).sort(
    value="year"
).limit(
    value=1
)

# Executing the pipeline
results = db["movies"].aggregate(pipeline.export())

print(results)

More advanced usage, with MongoDB operators

from dotenv import load_dotenv
import pymongo
from monggregate import Pipeline, S

# Load config from a .env file:
load_dotenv(verbose=True)
MONGODB_URI = os.environ["MONGODB_URI"]

# Connect to your MongoDB cluster:
client = pymongo.MongoClient(MONGODB_URI)

# Get a reference to the "sample_mflix" database:
db = client["sample_mflix"]


# Creating the pipeline
pipeline = Pipeline()
pipeline.match(
    year=S.type_("number") # Filtering out documents where the year field is not a number
).group(
    by="year",
    query = {
        "movie_count":S.sum(1), # Aggregating the movies per year
        "movie_titles":S.push("$title")
    }
).sort(
    by="_id",
    descending=True
).limit(10)

# Executing the pipeline
results = db["movies"].aggregate(pipeline.export())

print(results)

Advanced usage with Expressions

from monggregate import Pipeline, S, Expression

pipeline = Pipeline()
pipeline.lookup(
    right="comments",
    right_on="_id",
    left_on="movie_id",
    name="comments
).add_fields(
    comment_count=Expression.field("related_comments").size()
).match(
    comment_count=S.gte(2)
)

Motivation

The main driver for building this package was how unconvenient it was for me to build aggregation pipelines using pymongo or any other tool.

With pymongo, which is the official MongoDB driver for python, there is no direct support for aggregation pipelines.

pymongo exposes an aggregate method but the pipeline inside is just a list of complex dictionaries that quickly become quite long, nested and overwhelming.

At the end, it is barely readable for the one who built the pipeline. Let alone other developers. Besides, during the development process, it is often necessary to refer to the online documentation multiple times. Thus, the package aims at integrating the online documentation through in the docstrings of the various classes and modules of the package. Basically, the package mirrors every* stage and operator available on MongoDB.

*Actually, it only covers a subset of the stages and operators available. Please come help me to increase the coverage.

Roadmap

As of now, the package covers around 40% of the available stages and 25% of the available operators. I would argue, that the most important stages and operators are probably covered but this is subjective. The goal is to quickly reach 100% of both stages and operators. The source code integrates most of the online MongoDB documentation. If the online documentation evolves, it will need to be updated here as well. The current documentation is not consistent throughout the package it will need to be standardized later on. Some minor refactoring tasks are required also.

There are already a couple issue, that I noted myself for the next tasks that are going to be tackled.

Feel free to open an issue, if you found a bug or to propose enhancements. Feel free to do a PR, to propose new interfaces for stages that have not been dealt with yet.

Going further

  • Check out this GitHub repo for more examples.
  • Check out this tutorial on Medium. (It's not under the paywall)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

monggregate-0.16.2.tar.gz (101.9 kB view details)

Uploaded Source

Built Distribution

monggregate-0.16.2-py3-none-any.whl (150.2 kB view details)

Uploaded Python 3

File details

Details for the file monggregate-0.16.2.tar.gz.

File metadata

  • Download URL: monggregate-0.16.2.tar.gz
  • Upload date:
  • Size: 101.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for monggregate-0.16.2.tar.gz
Algorithm Hash digest
SHA256 b69007a5222527f1370ec48dc79f6e8cb94bc438258bfeb4f89ad4529f7f3838
MD5 1a0da2a57fb85bb603c280ae143b258a
BLAKE2b-256 70394c1c60cd38e5854d5b36f81f5761bb27220f298e38b15ea97f814adf8ed4

See more details on using hashes here.

File details

Details for the file monggregate-0.16.2-py3-none-any.whl.

File metadata

  • Download URL: monggregate-0.16.2-py3-none-any.whl
  • Upload date:
  • Size: 150.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for monggregate-0.16.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d9a5e640bf94ff5c8a2fbe985688bd25326c781901943eb0bb45eac9294f783b
MD5 764d52a4cf9e438a6b51463b5fdc9d3a
BLAKE2b-256 13e7a5322c0b076f89a5fc228f0039b4995547ec050df8d20233186008a0797a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page