Skip to main content

Transfer data between pandas dataframes and MongoDB

Project description

Overview

This package allows you to read/write pandas dataframes in MongoDB in the simplest way possible.

  • Free software: MIT license

Quick Start

Install pdmongo:

pip install pdmongo

Write a pandas DataFrame to a MongoDB collection:

import pandas as pd
import pdmongo as pdm

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df.to_mongo("MyCollection", "mongodb://localhost:27017/mydb")

Read a MongoDB collection into a pandas DataFrame:

import pdmongo as pdm

df = pdm.read_mongo("MyCollection", [], "mongodb://localhost:27017/mydb")
print(df)

Examples / use cases

Reading a MongoDB collection into a pandas data frame (aggregation query)

You can use an aggregation query to filter/transform data in MongoDB before fetching them into a data frame. This allows you to delegate the slow operation to MongoDB.

Reading a collection from MongoDB into a pandas DataFrame by using an aggregation query:

import pdmongo as pdm
import pandas as pd

# First generate some data and write them to MongoDB
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df.to_mongo(df, 'MyCollection', "mongodb://localhost:27017/mydb")

# Filter with an aggregate query and parse results into a data frame.
query = [{"$match": {'A': 1} }]
df = pdm.read_mongo("MyCollection", query, "mongodb://localhost:27017/mydb")
print(df) # Only values where A > 1 is returned

The query accepts the same arguments as the aggregate method of pymongo package.

Write MongoDB to a PostgreSQL table

You can write a MongoDB collection to a PostgreSQL table:

import numpy as np
import pandas as pd
import pdmongo as pdm
from sqlalchemy import create_engine

# Generate some data and write them to MongoDB
df = pd.DataFrame({'A': [1, 2, 3]})
df.to_mongo("MyCollection", "mongodb://localhost:27017/mydb")

# Read data from MongoDB and write them to PostgreSQL
new_df = pdm.read_mongo("MyCollection", [], "mongodb://localhost:27017/mydb")
engine = create_engine('postgres://postgres:postgres@localhost:5432', echo=False)
new_df[["A"]].to_sql("APostgresTable", engine)

Plot data retrieved from a MongoDB Collection

You can plot a collection retrieved from MongoDB

import numpy as np
import pandas as pd
import pdmongo as pdm
import matplotlib.pyplot as plt

# Generate data and write them to MongoDB
df = pd.DataFrame({'Value': np.random.randn(1000)})
df.to_mongo('TimeSeries', 'mongodb://localhost:27017/mydb')

# Read collection from MongoDB and plot data
new_df = pdm.read_mongo("TimeSeries", [], "mongodb://localhost:27017/mydb")
new_df.plot()
plt.show()

Installation

pip install pdmongo

You can also install the in-development version with:

pip install https://github.com/pakallis/python-pandas-mongo/archive/master.zip

Documentation

You can find the documentation at:

https://python-pandas-mongo.readthedocs.io/

Development

To run the all tests run:

tox

Note, to combine the coverage data from all the tox environments run:

Windows

set PYTEST_ADDOPTS=--cov-append
tox

Other

PYTEST_ADDOPTS=--cov-append tox

Changelog

0.3.4 (2022-11-17)

  • Support for python3.7-3.10

  • Fix wrong version of Python in CI

0.3.3 (2022-11-17)

  • Restrict pandas to >=0.20,<1.6

  • Restrict pymongo to >=13,<4.4

  • Remove hypothesis

  • Run tests with tox in CI

  • Add flake8 checks in CI

0.2.3 (2022-11-12)

  • Add prepare release script

0.2.2 (2022-11-12)

  • Fix lint offenses

0.2.1 (2022-11-12)

  • Minor changes

0.2.0 (2022-11-12)

  • Add compatibility for pymongo 4+

0.1.0 (2020-05-05)

  • Added static typing

  • Added mypy to travis CI

  • Removed unecessary params

0.0.2 (2020-05-04)

  • Dropped support for pypy3

0.0.1 (2020-04-30)

  • Added read_mongo and basic support for reading MongoDB collections into pandas dataframes

  • Added to_mongo and basic support for writing pandas dataframes in MongoDB collections

0.0.0 (2020-03-22)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdmongo-0.3.4.tar.gz (15.1 kB view hashes)

Uploaded Source

Built Distribution

pdmongo-0.3.4-py2.py3-none-any.whl (6.5 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page