Skip to main content

...

Project description

Antarctic

Test Book Release DeepSource Publish Docker image Binder

Project to persist Pandas data structures in a MongoDB database.

Installation

pip install antarctic

Usage

This project (unless the popular arctic project which I admire) is based on top of MongoEngine. MongoEngine is an ORM for MongoDB. MongoDB stores documents. We introduce a new field and extend the Document class to make Antarctic a convenient choice for storing Pandas (time series) data.

Fields

We introduce first a new field --- the PandasField.

from mongoengine import Document, connect
from antarctic.pandas_field import PandasField

# connect with your existing MongoDB (here I am using a popular interface mocking a MongoDB)
client = connect(db="test", host="mongomock://localhost")


# Define the blueprint for a portfolio document
class Portfolio(Document):
	nav = PandasField()
	weights = PandasField()
	prices = PandasField()

The portfolio objects works exactly the way you think it works

p = Portfolio()
p.nav = pd.Series(...).to_frame(name="nav")
p.prices = pd.DataFrame(...)
p.save()

print(p.nav["nav"])
print(p.prices)

Behind the scenes we convert the Frame objects into parquet bytestreams and store them in a MongoDB database.

The format should also be readable by R.

Documents

In most cases we have copies of very similar documents, e.g. we store Portfolios and Symbols rather than just a Portfolio or a Symbol. For this purpose we have developed the abstract XDocument class relying on the Document class of MongoEngine. It provides some convenient tools to simplify looping over all or a subset of Documents of the same type, e.g.

from antarctic.document import XDocument
from antarctic.pandas_field import PandasField

client = connect(db="test", host="mongodb://localhost")


class Symbol(XDocument):
	price = PandasField()

We define a bunch of symbols and assign a price for each (or some of it):

s1 = Symbol(name="A", price=pd.Series(...).to_frame(name="price")).save()
s2 = Symbol(name="B", price=pd.Series(...).to_frame(name="price")).save()

# We can access subsets like
for symbol in Symbol.subset(names=["B"]):
	print(symbol)

# often we need a dictionary of Symbols:
Symbol.to_dict(objects=[s1, s2])

# Each XDocument also provides a field for reference data:
s1.reference["MyProp1"] = "ABC"
s2.reference["MyProp2"] = "BCD"

# You can loop over (subsets) of Symbols and extract reference and/or series data
print(Symbol.reference_frame(objects=[s1, s2]))
print(Symbol.frame(series="price", key="price"))
print(Symbol.apply(func=lambda x: x.price["price"].mean(), default=np.nan))

The XDocument class is exposing DataFrames both for reference and time series data. There is an apply method for using a function on (subset) of documents.

Database vs. Datastore

Storing json or bytestream representations of Pandas objects is not exactly a database. Appending is rather expensive as one would have to extract the original Pandas object, append to it and convert the new object back into a json or bytestream representation. Clever sharding can mitigate such effects but at the end of the day you shouldn't update such objects too often. Often practitioners use a small database for recording (e.g. over the last 24h) and update the MongoDB database once a day. It's extremely fast to read the Pandas objects out of such a construction.

Often such concepts are called DataStores.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antarctic-0.6.2.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

antarctic-0.6.2-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file antarctic-0.6.2.tar.gz.

File metadata

  • Download URL: antarctic-0.6.2.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: poetry/1.4.2 CPython/3.10.11 Linux/5.15.0-1036-azure

File hashes

Hashes for antarctic-0.6.2.tar.gz
Algorithm Hash digest
SHA256 35c14ea61de38bcd74e3e04952a1b50d0a8f0c7d27228653620c3f2c0a07fb39
MD5 3988d377923befc6ec40456c200762fb
BLAKE2b-256 e54708bf695812467e718eff36da922ba0c99eb64830673f0bc3313693632fc2

See more details on using hashes here.

File details

Details for the file antarctic-0.6.2-py3-none-any.whl.

File metadata

  • Download URL: antarctic-0.6.2-py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: poetry/1.4.2 CPython/3.10.11 Linux/5.15.0-1036-azure

File hashes

Hashes for antarctic-0.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cf9725fce80fc300d7d24313defb3e1348c1bb5269075a025f99e632db73aaff
MD5 18928db62dcff0fc45038bdad38b6d2f
BLAKE2b-256 41f85302c58d9f192ed201e5ff0abdb3d59ba68b149c1f22bea2f02148dfcf9b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page