Skip to main content

Dynamic metadata storage

Project description

image image image Code Coverage Actions status

dyna-store

Efficient handling of high cardinality metadata.

In order to explain the main concept, let's go through an example:

How it works

schema

Given the id cwxpd-3BDb2jXPk:

  1. cwxpd is the template id. We can use it to retrieve the template from a database.
  2. the userId is an integer encoded in the id as 3BD -> 213123
  3. the timestamp is an datetime encoded in the id as b2jXPk -> 2024-05-22 15:13:56+00:00
  4. the id can be fully parsed, using both the high cardinality fields (user_id, timestamp) from the id itself and the other fields (algorithm, promoted) in the template.

Use case

We have an online shopping website, where user are shown recommended products. Each recommendation can lead to a bunch of user events (viewed, clicked, purchased, etc.). These events are stored in a database for further analysis. For each event, we want to be able to relate it to the recommendation that led to it - in particular when the recommendation was made, with which algorithm, etc.

Without dyna-store

An approach would be to store all the recommendations in a database table:

id userId timestamp algorithm
BShivYLGif user1 1716384775942 algo-1
5SvIjZIXMm user1 1716384793233 algo-2
DkBoUmvMs0 user2 1716384489455 algo-2
Nm8NabCct8 user2 1716384483847 algo-2
5ZO053OGpX user2 1716384448985 algo-2

Each recommendation would have a unique identifier (the primary key), maybe generated by the database.

Then, we can attach to each event the recommendation id. At any point we can query the recommendation table to get the details of the recommendation.

This works, but has some limitation: the recommendation table can grow very large:

  • if you are computing recommendations on the fly, a single user session can generate a lot of recommendations
  • if you are pre-computing recommendations (to ensure a fast first page view), that's also a lot of data to every day.

With dyna-store

Dyna store intend to address this limitation, by:

  • store less information in the database
  • store more information in the recommendation id itself

We will first split our fields between two categories:

  • the low cardinality field (algorithm in our example) - they don't have many different values and can be stored in the database
  • the high cardinality field (userId, timestamp in our example) - they have many different values and will be stored in the id.

Then in the databse we will store the low cardinality fields, as well as the information needed to parse the informations contained in the id in a new template table:

id userId timestamp algorithm
BShivYLGif { __hcf: 1, i: 0, l: 5, t: "string" } { __hcf: 1, i: 5, l: 5, t: "datetime" } algo-1
5SvIjZIXMm { __hcf: 1, i: 0, l: 5, t: "string" } { __hcf: 1, i: 5, l: 5, t: "datetime" } algo-2

then the recommendation id will contain two part:

  • the database id of the template - BShivYLGif
  • the high cardinality fields, b62 encoded - user1dXed which will give us the recommendation id BShivYLGif-user1dXed

This id will be then attached to each event. From the id we can regenerate the original metadata, assuming we have access to the templates table. In some cases, that can lead to a drastic reduction of the amount of data stored in the database.

Usage

from datetime import datetime

from dyna_store import DynaStore, Metadata, MetadataId

# a model for your recommendations metadata
class Recommendation:
    user_id: int
    timestamp: datetime
    algorithm: str

# create a store by extending the DynaStore class
class RecommendationStore(DynaStore):
    def save_metadata(self, _metadata: Metadata) -> MetadataId:
        # here you need to handle the saving of the metadata
        # could be in your database, in a file, etc.
        # you need to create and return a unique id for this metadata.
        pass

    def load_metadata(self, _id: MetadataId) -> Metadata:
        # here you need to handle the loading of the metadata from an id
        # could from your database, from a file, etc.
        pass

store = RecommendationStore(hcf=["user_id", "timestamp"])

# saving recommendations
id = store.create(Recommendation(user_id="user1", timestamp=datetime.now(), algorithm="algo-1"))
# returns a Recommendation id

# loading recommendations
store.parse(id)
# returns a Recommendation object

FAQ

What database does it support?

all. none. You need to handle the storage of the metadata yourself. It could be in a database, in a file, etc.

What about security?

the high cardinality fields are stored in the id, so they are not encrypted. Anyone in possession of this id could:

  • access the high cardinality fields values
  • generate new ids with the different high cardinality fields values

This needs to be taken into account.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dyna_store-0.0.6.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dyna_store-0.0.6-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file dyna_store-0.0.6.tar.gz.

File metadata

  • Download URL: dyna_store-0.0.6.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for dyna_store-0.0.6.tar.gz
Algorithm Hash digest
SHA256 1e396281d008a5343d74d59450b5cef1f8d805a60c417f050f43438396bc0ee2
MD5 c27af74fa3ac27afe0158b7a65fd4aa0
BLAKE2b-256 2098626a63989b403944b20bbd73536940f63ffd3a47e7515bd138378a6eab42

See more details on using hashes here.

Provenance

The following attestation bundles were made for dyna_store-0.0.6.tar.gz:

Publisher: test.yml on brightnetwork/dyna-store

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dyna_store-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: dyna_store-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for dyna_store-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 47e7f5476201d54c6ca1f20a6bf0d4cace8c70a8f907a9fd1749e0c056a79d98
MD5 0dda013dc43ee76ce1dd9d5a119958a2
BLAKE2b-256 0606051e5f544f8bfb35725a4c077d200742861ed99126251840a674e6da1a94

See more details on using hashes here.

Provenance

The following attestation bundles were made for dyna_store-0.0.6-py3-none-any.whl:

Publisher: test.yml on brightnetwork/dyna-store

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page