Skip to main content

Interface to ease creation and usage of MongoDB aggregation pipelines in python

Project description

Overview

Monggregate is a library that aims at simplifying usage of MongoDB aggregation pipeline in python. It is based on MongoDB official python driver, pymongo and on pydantic.

Features

  • provides an OOP interface to the aggregation pipeline.
  • allows you to focus on your requirements rather than MongoDB syntax
  • integrates all the MongoDB documentation and allows you to quickly refer to it without having to navigate to the website.
  • offers a pandas-style way to chain operations on data.

Requirements

This package requires python > 3.10, pydantic > 1.8.0

Installation

  1. Download the repo from https://github.com/VianneyMI/mongreggate
  2. Copy the repo to your project
  3. Navigate to the folder containing the downloaded repo
  4. Install the repo locally by executing the following command: python -m pip install -e .

Usage

The below examples reference the MongoDB sample_mflix database

... through the stage classes

from dotenv import load_dotenv
import pymongo
from monggregate.stages import Match, Limit Sort

# Load config from a .env file:
load_dotenv(verbose=True)
MONGODB_URI = os.environ["MONGODB_URI"]

# Connect to your MongoDB cluster:
client = pymongo.MongoClient(MONGODB_URI)

# Get a reference to the "sample_mflix" database:
db = client["sample_mflix"]

# Get a reference to the "movies" collection:
movie_collection = db["movies"]

# Creating the pipeline
filter_on_title = Match(
    query = {
        "title" : "A Star is Born"
    }
)
sorting_per_year = Sort(
    query = {
        "year":1
    }
)

limiting_to_most_recent = Limit(
    value=1
)

pipeline = [filter_on_title, sorting_per_year, limiting_to_most_recent]
pipeline = [stage.statment for stage in pipeline]

# Lauching the pipeline

results = move_collection.aggregate(pipeline)

... through the pipeline inteface

Approach #1

from dotenv import load_dotenv
import pymongo
from monggregate.pipeline import Pipeline

# Load config from a .env file:
load_dotenv(verbose=True)
MONGODB_URI = os.environ["MONGODB_URI"]

# Connect to your MongoDB cluster:
client = pymongo.MongoClient(MONGODB_URI)

# Get a reference to the "sample_mflix" database:
db = client["sample_mflix"]

# Creating the pipeline
pipeline = Pipeline(
    collection="movies",
)

pipeline.match(
    query = {
        "title" : "A Star is Born"
    }
).sort(
    query = {
        "year":1
    }
).limit(
    value=1
)

# Executing the pipeline
db["movies"].aggregate(pipeline())

Approach #2

from dotenv import load_dotenv
import pymongo
from monggregate.pipeline import Pipeline

# Load config from a .env file:
load_dotenv(verbose=True)
MONGODB_URI = os.environ["MONGODB_URI"]

# Connect to your MongoDB cluster:
client = pymongo.MongoClient(MONGODB_URI)

# Get a reference to the "sample_mflix" database:
db = client["sample_mflix"]

# Creating the pipeline
pipeline = Pipeline(
    _db=db,
    on_call="run",
    collection="movies",
)

pipeline.match(
    query = {
        "title" : "A Star is Born"
    }
).sort(
    query = {
        "year":1
    }
).limit(
    value=1
)

# Executing the pipeline
pipeline()

Motivation

The main driver for building this package was how unconvenient it was for me to build aggregation pipelines using pymongo or any other tool.

With pymongo, which is the official MongoDB driver for python, there is no direct support for aggregation pipelines.

pymongo exposes an aggregate method but the pipeline inside is just a list of complex dictionaries that quickly become quite big and overwhelming.

At the end, it is barely readable for the one who built the pipeline. Let alone other developers. Besides, during the development process, it is often necessary to refer to the online documentation multiple times. Thus, the package aims at integrating the online document through the various docstrings of the classes and modules of the package.

Roadmap

The goal is to publish the package to PyPI, before the end of the year. For now, I am stil building. Feel free to open an issue, if you find a bug or to propose enhancements.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

monggregate-0.9.0.tar.gz (60.8 kB view details)

Uploaded Source

Built Distribution

monggregate-0.9.0-py3-none-any.whl (90.6 kB view details)

Uploaded Python 3

File details

Details for the file monggregate-0.9.0.tar.gz.

File metadata

  • Download URL: monggregate-0.9.0.tar.gz
  • Upload date:
  • Size: 60.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.4

File hashes

Hashes for monggregate-0.9.0.tar.gz
Algorithm Hash digest
SHA256 3c0e01f9b502bd19c11ced49fb07ee599efa2dfff34fa96145f47ed26eba2246
MD5 71237d4ff64f56d0ac8cf30a06d6c900
BLAKE2b-256 ad0233f1897ba937daf59f2079fd674642fd495d369e10bfdd3c12d411b0b654

See more details on using hashes here.

File details

Details for the file monggregate-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: monggregate-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 90.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.4

File hashes

Hashes for monggregate-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b698e4bc4c2591d8292748775d2321546b28099e52db5d28c276f1e64896cee6
MD5 d2529835ca159ca3b6b7663238806e0e
BLAKE2b-256 8f0deffd015197dfdea1d7c5c124a93ef3fd29afd8b3a89edaa522f0df68f500

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page