Skip to main content

Collection of tools for schema parsing and workload generation used by MongoDB Research

Project description

mdbrtools

This package contains experimental tools for schema analysis and query workload generation used by MongoDB Research (MDBR).

Disclaimer

This tool is not officially supported or endorsed by MongoDB Inc. The code is released for use "AS IS" without any warranties of any kind, including, but not limited to its installation, use, or performance. Do not run this tool in critical production systems.

Installation

Installation with pip

This tool requires python 3.x and pip on your system. To install mdbrtools, run the following command:

pip install mdbrtools

Installation from source

Clone the respository from github. From the top-level directory, run:

pip install -e .

This installs an editable development version of mdbrtools in your current Python environment.

Usage

See the ./notebooks directory for more detailed examples for schema parsing and workload generation.

Schema Parsing

Schema parsing operates on a list of Python dictionaries.

from mdbrtools.schema import parse_schema
from pprint import pprint

docs = [
    {"_id": 1, "mixed_field": "world", "missing_field": False},
    {"_id": 2, "mixed_field": 123},
    {"_id": 3, "mixed_field": False, "missing_field": True},
]

schema = parse_schema(docs)
pprint(dict(schema))

Converting the schema object to a dictionary will output some general information about the schema:

{'_id': [{'counter': 3, 'type': 'int'}],
 'missing_field': [{'counter': 2, 'type': 'bool'}],
 'mixed_field': [{'counter': 1, 'type': 'str'},
                 {'counter': 1, 'type': 'int'},
                 {'counter': 1, 'type': 'bool'}]}

For access to types, values and uniqueness information, see the examples in ./notebooks/schema_parsing.ipynb.

Workload Generation

Workload generation takes either a list of Python dictionaries, or a MongoCollection object as input.

from mdbrtools.workload import Workload

docs = [
    {"_id": 1, "mixed_field": "world", "missing_field": False},
    {"_id": 2, "mixed_field": 123},
    {"_id": 3, "mixed_field": False, "missing_field": True},
]

workload = Workload()
workload.generate(docs, num_queries=5)

for query in workload:
    print(query.to_mql())

The generated MQL queries are:

{'missing_field': True}
{'missing_field': {'$exists': False}, '_id': {'$gte': 3}}
{'_id': {'$gt': 3}, 'mixed_field': False, 'missing_field': {'$exists': False}}
{'mixed_field': {'$gte': 'world'}, '_id': 3, 'missing_field': {'$ne': False}}
{'mixed_field': 'world'}

The workload generator supports a number of different constraints on the queries:

  • min. and max. number of predicates per query
  • allowing only certain fields
  • which query operators are allowed for which data types
  • control over the weights by which operators are randomly chosen
  • min. and max. query selectivity constraints

See the notebook under ./notebooks/workload_generation.ipynb for examples.

Tests

To execute the unit tests, run from the top-level directory:

python -m unittest discover ./tests

License

MIT, see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdbrtools-0.1.1.tar.gz (23.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mdbrtools-0.1.1-py3-none-any.whl (25.3 kB view details)

Uploaded Python 3

File details

Details for the file mdbrtools-0.1.1.tar.gz.

File metadata

  • Download URL: mdbrtools-0.1.1.tar.gz
  • Upload date:
  • Size: 23.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for mdbrtools-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e22eaa86b5b595cf6ce83321417561fca6412e4c7b40d17137d9066d87f2cdd8
MD5 c05eb5529d1ac3f961a1e63d47858a3c
BLAKE2b-256 5b1cb871fa8894102b476a34e928ec1a83210d5b8a9d5fd175a1a5b35839709a

See more details on using hashes here.

File details

Details for the file mdbrtools-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: mdbrtools-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 25.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for mdbrtools-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e7de812155e13f7f8a07516cc5d9a74af4d35fffbbb430e9e4f161a51a2160a0
MD5 4effbf22f029ef4a54f3889a33d634d4
BLAKE2b-256 1172050551524d314fc3f9a6f5e632f5c5bd0448f816727e6edd09ad04a4ce2c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page