Skip to main content

...

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

Antarctic

License: MIT PyPI Downloads Coverage Book Release CodeFactor Renovate enabled

Project to persist Pandas and Polars data structures in a MongoDB database.

Installation

pip install antarctic

Usage

This project (unlike the popular arctic project which I admire) is based on top of MongoEngine. MongoEngine is an ORM for MongoDB. MongoDB stores documents. We introduce new fields and extend the Document class to make Antarctic a convenient choice for storing Pandas and Polars (time series) data.

PandasField

We introduce first the PandasField for storing Pandas DataFrames.

import mongomock
import pandas as pd
import numpy as np

from mongoengine import Document, connect
from antarctic.pandas_field import PandasField

# connect with your existing MongoDB
# (here I am using a popular interface mocking a MongoDB)
client = connect('mongoenginetest',
                  host='mongodb://localhost',
                  mongo_client_class=mongomock.MongoClient,
                  uuidRepresentation="standard")

# Define the blueprint for a portfolio document
class Portfolio(Document):
    nav = PandasField()
    weights = PandasField()
    prices = PandasField()

The portfolio objects works exactly the way you think it works

data = pd.read_csv("tests/test_antarctic/resources/price.csv", index_col=0, parse_dates=True)

p = Portfolio()
p.nav = data["A"].to_frame(name="nav")
p.prices = data[["B","C","D"]] #pd.DataFrame(...)
portfolio = p.save()

nav = p.nav["nav"]
prices = p.prices

Behind the scenes we convert the Frame objects into parquet bytestreams and store them in a MongoDB database.

The format should also be readable by R.

PolarsField

Antarctic also supports storing Polars DataFrames using the PolarsField.

import polars as pl
from mongoengine import Document, StringField
from antarctic.polars_field import PolarsField

class Artist(Document):
    name = StringField(unique=True, required=True)
    data = PolarsField()

The PolarsField works similarly to PandasField:

a = Artist(name="Artist1")
a.data = pl.DataFrame({"A": [2.0, 2.0], "B": [2.0, 2.0]})
a.save()

# Retrieve the data
df = a.data

PolarsField uses zstd compression by default for efficient storage, but you can specify other compression algorithms:

class CustomArtist(Document):
    name = StringField(unique=True, required=True)
    data = PolarsField(compression="snappy")  # Options: lz4, uncompressed, snappy, gzip, brotli, zstd

XDocument

In most cases we have copies of very similar documents, e.g. we store Portfolios and Symbols rather than just a Portfolio or a Symbol. For this purpose we have developed the abstract XDocument class relying on the Document class of MongoEngine. It provides some convenient tools to simplify looping over all or a subset of Documents of the same type, e.g.

from antarctic.document import XDocument
from antarctic.pandas_field import PandasField

class Symbol(XDocument):
    price = PandasField()

We define a bunch of symbols and assign a price for each (or some of it):

s1 = Symbol(name="A", price=data["A"].to_frame(name="price")).save()
s2 = Symbol(name="B", price=data["B"].to_frame(name="price")).save()

# We can access subsets like
for symbol in Symbol.subset(names=["B"]):
    _ = symbol  # no-op: avoid printing during tests

# often we need a dictionary of Symbols:
symbols = Symbol.to_dict(objects=[s1, s2])

# Each XDocument also provides a field for reference data:
s1.reference["MyProp1"] = "ABC"
s2.reference["MyProp2"] = "BCD"

# You can loop over (subsets) of Symbols and extract reference and/or series data
_reference = Symbol.reference_frame(objects=[s1, s2])
_frame = Symbol.frame(series="price", key="price")
_applied = list(Symbol.apply(func=lambda x: x.price["price"].mean(), default=np.nan))

The XDocument class is exposing DataFrames both for reference and time series data. There is an apply method for using a function on (subset) of documents.

Database vs. Datastore

Storing json or bytestream representations of Pandas objects is not exactly a database. Appending is rather expensive as one would have to extract the original Pandas object, append to it and convert the new object back into a json or bytestream representation. Clever sharding can mitigate such effects but at the end of the day you shouldn't update such objects too often. Often practitioners use a small database for recording (e.g. over the last 24h) and update the MongoDB database once a day. It's extremely fast to read the Pandas objects out of such a construction.

Often such concepts are called DataStores.

uv

Starting with

make install

will install uv and create the virtual environment defined in pyproject.toml and locked in uv.lock.

marimo

We install marimo on the fly within the aforementioned virtual environment. Executing

make marimo

will install and start marimo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antarctic-0.9.9.tar.gz (260.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

antarctic-0.9.9-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file antarctic-0.9.9.tar.gz.

File metadata

  • Download URL: antarctic-0.9.9.tar.gz
  • Upload date:
  • Size: 260.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for antarctic-0.9.9.tar.gz
Algorithm Hash digest
SHA256 418c5f57feaba809aa2051ee51e78c071a487ca8679af5439955d7f3f90ed5d1
MD5 69a7f8a8e806425494a941298b64bfc9
BLAKE2b-256 9edbb7461ed0c71e02252208b207a908a4ef7adb8525d029c4054e8be100164d

See more details on using hashes here.

Provenance

The following attestation bundles were made for antarctic-0.9.9.tar.gz:

Publisher: rhiza_release.yml on tschm/antarctic

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file antarctic-0.9.9-py3-none-any.whl.

File metadata

  • Download URL: antarctic-0.9.9-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for antarctic-0.9.9-py3-none-any.whl
Algorithm Hash digest
SHA256 caa3c9df05656b4cf47eb85dbbefaef600455055130ca788aa121f6e13dea166
MD5 c4837428eeae6d9ae07d9fa7248be616
BLAKE2b-256 b1939339745cd26611b2d8cae9f84434884889c6a246f44484daeb183fb52e55

See more details on using hashes here.

Provenance

The following attestation bundles were made for antarctic-0.9.9-py3-none-any.whl:

Publisher: rhiza_release.yml on tschm/antarctic

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page