Skip to main content

MongoDB aggregation pipelines made easy. Joins, grouping, counting and much more...

Project description

Overview

Monggregate is a library that aims at simplifying usage of MongoDB aggregation pipeline in python. It is based on MongoDB official python driver, pymongo and on pydantic.

Features

  • provides an OOP interface to the aggregation pipeline.
  • allows you to focus on your requirements rather than MongoDB syntax
  • integrates all the MongoDB documentation and allows you to quickly refer to it without having to navigate to the website.
  • offers a pandas-style way to chain operations on data.

Requirements

This package requires python > 3.10, pydantic > 1.8.0

Installation

Manually

  1. Download the repo from https://github.com/VianneyMI/mongreggate
  2. Copy the repo to your project
  3. Navigate to the folder containing the downloaded repo
  4. Install the repo locally by executing the following command: python -m pip install -e .

PIP

The repo is now available on PyPI:

pip install monggregate

Usage

The below examples reference the MongoDB sample_mflix database

... through the stage classes

from dotenv import load_dotenv
import pymongo
from monggregate.stages import Match, Limit Sort

# Load config from a .env file:
load_dotenv(verbose=True)
MONGODB_URI = os.environ["MONGODB_URI"]

# Connect to your MongoDB cluster:
client = pymongo.MongoClient(MONGODB_URI)

# Get a reference to the "sample_mflix" database:
db = client["sample_mflix"]

# Get a reference to the "movies" collection:
movie_collection = db["movies"]

# Creating the pipeline
filter_on_title = Match(
    query = {
        "title" : "A Star is Born"
    }
)
sorting_per_year = Sort(
    query = {
        "year":1
    }
)

limiting_to_most_recent = Limit(
    value=1
)

pipeline = [filter_on_title, sorting_per_year, limiting_to_most_recent]
pipeline = [stage.statment for stage in pipeline]

# Lauching the pipeline

results = move_collection.aggregate(pipeline)

... through the pipeline inteface

Approach #1

from dotenv import load_dotenv
import pymongo
from monggregate.pipeline import Pipeline

# Load config from a .env file:
load_dotenv(verbose=True)
MONGODB_URI = os.environ["MONGODB_URI"]

# Connect to your MongoDB cluster:
client = pymongo.MongoClient(MONGODB_URI)

# Get a reference to the "sample_mflix" database:
db = client["sample_mflix"]

# Creating the pipeline
pipeline = Pipeline(
    collection="movies",
)

pipeline.match(
    query = {
        "title" : "A Star is Born"
    }
).sort(
    query = {
        "year":1
    }
).limit(
    value=1
)

# Executing the pipeline
db["movies"].aggregate(pipeline())

Approach #2

from dotenv import load_dotenv
import pymongo
from monggregate.pipeline import Pipeline

# Load config from a .env file:
load_dotenv(verbose=True)
MONGODB_URI = os.environ["MONGODB_URI"]

# Connect to your MongoDB cluster:
client = pymongo.MongoClient(MONGODB_URI)

# Get a reference to the "sample_mflix" database:
db = client["sample_mflix"]

# Creating the pipeline
pipeline = Pipeline(
    _db=db,
    on_call="run",
    collection="movies",
)

pipeline.match(
    query = {
        "title" : "A Star is Born"
    }
).sort(
    query = {
        "year":1
    }
).limit(
    value=1
)

# Executing the pipeline
pipeline()

Motivation

The main driver for building this package was how unconvenient it was for me to build aggregation pipelines using pymongo or any other tool.

With pymongo, which is the official MongoDB driver for python, there is no direct support for aggregation pipelines.

pymongo exposes an aggregate method but the pipeline inside is just a list of complex dictionaries that quickly become quite long, nested and overwhelming.

At the end, it is barely readable for the one who built the pipeline. Let alone other developers. Besides, during the development process, it is often necessary to refer to the online documentation multiple times. Thus, the package aims at integrating the online document through the various docstrings of the classes and modules of the package.

Roadmap

As of now, the package covers 33% of the available stages and barely 18% of the available operators. The goal is to quickly reach 100% of both stages and operators. The source code integrates most of the online MongoDB documentation. If the online documentation evolves, it will need to be updated here as well. The current documentation is not consistent throughout the package it will need to be standardized later on. Some minor refactoring tasks are required also.

There are already a couple issue, that I noted myself for the next tasks that are going to be tackled.

Feel free to open an issue, if you found a bug or to propose enhancements. Feel free to do a PR, to propose new interfaces for stages that have not been dealt with yet.

Going further

  • Check out this GitHub repo for more examples.
  • Check out this tutorial on Medium. (It's not under the paywall)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

monggregate-0.11.0.tar.gz (64.3 kB view details)

Uploaded Source

Built Distribution

monggregate-0.11.0-py3-none-any.whl (93.0 kB view details)

Uploaded Python 3

File details

Details for the file monggregate-0.11.0.tar.gz.

File metadata

  • Download URL: monggregate-0.11.0.tar.gz
  • Upload date:
  • Size: 64.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for monggregate-0.11.0.tar.gz
Algorithm Hash digest
SHA256 40b9eca8503ffe0001d6d272b3ad70fdff37a8f6d17e258f4d1d698f4ba784eb
MD5 e92c3b52966170c792bddf289da4b2fd
BLAKE2b-256 bcc8e6b2259aecbf8734d58c9852254e0e644f4c8e4b7f284b5b580f24452e6f

See more details on using hashes here.

File details

Details for the file monggregate-0.11.0-py3-none-any.whl.

File metadata

  • Download URL: monggregate-0.11.0-py3-none-any.whl
  • Upload date:
  • Size: 93.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for monggregate-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ff4df2947b621f9f4698e15c2705e0b9945aceb421c691a46fcac0c80ee7430e
MD5 bd299d3f215e276c226b5516bdd6e8cb
BLAKE2b-256 b5247efaaa229c2f21043b6e4ec8b3923951061473b9e96ee080bde119f2d261

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page