Skip to main content

Transfer data between pandas dataframes and MongoDB

Project description

Overview

This package allows you to read/write pandas dataframes in MongoDB in the simplest way possible.

  • Free software: MIT license

Quick Start

Install pdmongo:

pip install pdmongo

Write a pandas DataFrame to a MongoDB collection:

import pandas as pd
import pdmongo as pdm

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df.to_mongo("MyCollection", "mongodb://localhost:27017/mydb")

Read a MongoDB collection into a pandas DataFrame:

import pdmongo as pdm

df = pdm.read_mongo("MyCollection", [], "mongodb://localhost:27017/mydb")
print(df)

Examples / use cases

Reading a MongoDB collection into a pandas data frame (aggregation query)

You can use an aggregation query to filter/transform data in MongoDB before fetching them into a data frame. This allows you to delegate the slow operation to MongoDB.

Reading a collection from MongoDB into a pandas DataFrame by using an aggregation query:

import pdmongo as pdm
import pandas as pd

# First generate some data and write them to MongoDB
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df.to_mongo(df, 'MyCollection', "mongodb://localhost:27017/mydb")

# Filter with an aggregate query and parse results into a data frame.
query = [{"$match": {'A': 1} }]
df = pdm.read_mongo("MyCollection", query, "mongodb://localhost:27017/mydb")
print(df) # Only values where A > 1 is returned

The query accepts the same arguments as the aggregate method of pymongo package.

Write MongoDB to a PostgreSQL table

You can write a MongoDB collection to a PostgreSQL table:

import numpy as np
import pandas as pd
import pdmongo as pdm
from sqlalchemy import create_engine

# Generate some data and write them to MongoDB
df = pd.DataFrame({'A': [1, 2, 3]})
df.to_mongo("MyCollection", "mongodb://localhost:27017/mydb")

# Read data from MongoDB and write them to PostgreSQL
new_df = pdm.read_mongo("MyCollection", [], "mongodb://localhost:27017/mydb")
engine = create_engine('postgres://postgres:postgres@localhost:5432', echo=False)
new_df[["A"]].to_sql("APostgresTable", engine)

Plot data retrieved from a MongoDB Collection

You can plot a collection retrieved from MongoDB

import numpy as np
import pandas as pd
import pdmongo as pdm
import matplotlib.pyplot as plt

# Generate data and write them to MongoDB
df = pd.DataFrame({'Value': np.random.randn(1000)})
df.to_mongo('TimeSeries', 'mongodb://localhost:27017/mydb')

# Read collection from MongoDB and plot data
new_df = pdm.read_mongo("TimeSeries", [], "mongodb://localhost:27017/mydb")
new_df.plot()
plt.show()

Installation

pip install pdmongo

You can also install the in-development version with:

pip install https://github.com/pakallis/python-pandas-mongo/archive/master.zip

Documentation

You can find the documentation at:

https://python-pandas-mongo.readthedocs.io/

Development

To run the all tests run:

tox

Note, to combine the coverage data from all the tox environments run:

Windows

set PYTEST_ADDOPTS=--cov-append
tox

Other

PYTEST_ADDOPTS=--cov-append tox

Changelog

0.3.4 (2022-11-17)

  • Support for python3.7-3.10

  • Fix wrong version of Python in CI

0.3.3 (2022-11-17)

  • Restrict pandas to >=0.20,<1.6

  • Restrict pymongo to >=13,<4.4

  • Remove hypothesis

  • Run tests with tox in CI

  • Add flake8 checks in CI

0.2.3 (2022-11-12)

  • Add prepare release script

0.2.2 (2022-11-12)

  • Fix lint offenses

0.2.1 (2022-11-12)

  • Minor changes

0.2.0 (2022-11-12)

  • Add compatibility for pymongo 4+

0.1.0 (2020-05-05)

  • Added static typing

  • Added mypy to travis CI

  • Removed unecessary params

0.0.2 (2020-05-04)

  • Dropped support for pypy3

0.0.1 (2020-04-30)

  • Added read_mongo and basic support for reading MongoDB collections into pandas dataframes

  • Added to_mongo and basic support for writing pandas dataframes in MongoDB collections

0.0.0 (2020-03-22)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdmongo-0.3.4.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

pdmongo-0.3.4-py2.py3-none-any.whl (6.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file pdmongo-0.3.4.tar.gz.

File metadata

  • Download URL: pdmongo-0.3.4.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.5

File hashes

Hashes for pdmongo-0.3.4.tar.gz
Algorithm Hash digest
SHA256 40d9ff3de1d6f3c2736c3424a765b88b71c27bb137557a420e7fc2e90fca5311
MD5 27bb924740b04ecf7a4eb3e11d36ef5f
BLAKE2b-256 bcd840ec0a1bd0da59307eda0c2ade656ef951b13b79c7ba4f9c98e47ca9c209

See more details on using hashes here.

File details

Details for the file pdmongo-0.3.4-py2.py3-none-any.whl.

File metadata

  • Download URL: pdmongo-0.3.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.5

File hashes

Hashes for pdmongo-0.3.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 fd9ed822e71839584810467a4c37b18087d0703ed566d734f9745be60e45cd20
MD5 1bd81f5aeab04dd541711c9464316b50
BLAKE2b-256 c07198e04941e02bd1af1a7b4cadefa60c3b1cb21fda3fcb0e88a4927d5e5472

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page