Storing Pandas Data in a MongoDB database
Project description
Antarctic
Project to persist Pandas data structures in a MongoDB database.
Installation
pip install antarctic
Usage
This project (unless the popular arctic project which I admire) is based on top of MongoEngine, see https://pypi.org/project/mongoengine/ MongoEngine is an ORM for MongoDB. MongoDB stores documents. We introduce here two new fields --- one for a Pandas Series and one for a Pandas DataFrame.
from mongoengine import Document, connect
from antarctic.PandasFields import SeriesField, FrameField
# connect with your existing MongoDB (here I am using a popular interface mocking a MongoDB)
client = connect(db="test", host="mongomock://localhost")
# Define the blueprint for a portfolio document
class Portfolio(Document):
nav = SeriesField()
weights = FrameField()
prices = FrameField()
The portfolio objects works exactly the way you think it works
p = Portfolio()
p.nav = pd.Series(...)
p.prices = pd.DataFrame(...)
p.save()
print(p.nav)
print(p.prices)
Behind the scenes we convert the both Series and Frame objects into json documents and store them in a MongoDB database.
We don't apply any clever conversion into compressed bytestreams. Performance is not our main concern here.
Database?
Storing json or bytestream representations of Pandas objects is not exactly a database. Appending is rather expensive as one would have to extract the original Pandas object, append to it and convert the new object back into a json or bytestream representation. Clever sharding can mitigate such effects but at the end of the day you shouldn't update such objects too often. Often practitioners use a small database for recording (e.g. over the last 24h) and update the MongoDB database once a day. It's extremely fast to read the Pandas objects out of such a construction.
Also note that in theory one could try to build this on top of pyarrow and support both R and Python.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for antarctic-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24f5633e2d6daea724242fda78394f77d9c9f7f17b71f9b0769164d4979d0370 |
|
MD5 | f81b0c8341da8da019232ccccfc3eb93 |
|
BLAKE2b-256 | d9c1b3b49bd6cc836b3fb6b975a1ed88926287d0d62a0fbb3fab83e3702bf466 |