Skip to main content

Storing Pandas Data in a MongoDB database

Project description

Antarctic

CI Release Binder

Project to persist Pandas data structures in a MongoDB database.

Installation

pip install antarctic

Usage

This project (unless the popular arctic project which I admire) is based on top of MongoEngine, see https://pypi.org/project/mongoengine/ MongoEngine is an ORM for MongoDB. MongoDB stores documents. We introduce here two new fields --- one for a Pandas Series and one for a Pandas DataFrame.

from mongoengine import Document, connect
from antarctic.PandasFields import SeriesField, FrameField

# connect with your existing MongoDB (here I am using a popular interface mocking a MongoDB)
client = connect(db="test", host="mongomock://localhost")

# Define the blueprint for a portfolio document
class Portfolio(Document):
    nav = SeriesField()
    weights = FrameField()
    prices = FrameField()

The portfolio objects works exactly the way you think it works

p = Portfolio()
p.nav = pd.Series(...)
p.prices = pd.DataFrame(...)
p.save()

print(p.nav)
print(p.prices)

Behind the scenes we convert the both Series and Frame objects into json documents and store them in a MongoDB database.

We don't apply any clever conversion into compressed bytestreams. Performance is not our main concern here.

Database?

Storing json or bytestream representations of Pandas objects is not exactly a database. Appending is rather expensive as one would have to extract the original Pandas object, append to it and convert the new object back into a json or bytestream representation. Clever sharding can mitigate such effects but at the end of the day you shouldn't update such objects too often. Often practitioners use a small database for recording (e.g. over the last 24h) and update the MongoDB database once a day. It's extremely fast to read the Pandas objects out of such a construction.

Also note that in theory one could try to build this on top of pyarrow and support both R and Python.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antarctic-0.2.1.tar.gz (6.1 kB view hashes)

Uploaded Source

Built Distribution

antarctic-0.2.1-py3-none-any.whl (5.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page