Skip to main content

Python-native mocking of realistic datasets by defining schemas for prototyping, testing, and demos

Project description

datamock

Python-native mocking of realistic datasets by defining schemas for prototyping, testing, and demos

Installation

pip install datamock

Usage

Here's a moderately complex example demonstrating how to model an e-commerce system with customers, orders, and products. This showcases features like nested schemas, ListOf, and Derived fields.

import json

from datamock import Schema, ListOf, Derived, String, Float, Choice
from datamock.field import Name, Email

# Define a schema for a product
class Product(Schema):
    name = String(min_length=5, max_length=20)
    price = Float((10, 1000), round_to=2)
    category = Choice(choices=['electronics', 'books', 'clothing', 'home goods'])

# Define a schema for an order, which contains a list of products
class Order(Schema):
    order_id = String(regex=r'[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}')
    products = ListOf(Product(), min_items=1, max_items=3)
    # The total cost is derived from the sum of the prices of the products in the order
    total_cost = Derived(
        lambda context: sum(p['price'] for p in context['products'])
    )

# Define a schema for a customer, who can have multiple orders
class Customer(Schema):
    name = Name()
    email = Email()
    orders = ListOf(Order(), min_items=1, max_items=3)
    # The total amount spent by the customer is derived from the sum of the total costs of their orders
    total_spent = Derived(
        lambda context: sum(o['total_cost'] for o in context['orders'])
    )

# Generate a batch of 3 customers
customers = Customer.generate_batch(3)

print(json.dumps(customers, indent=2))

The above will generate random data satisfying the schema. An example output is:

[
  {
    "name": "Joseph May",
    "email": "ymathis@example.com",
    "orders": [
      {
        "order_id": "ca077fe5-a5a6-349d-d45d-83bd54c4ffb9",
        "products": [
          {
            "name": "x4_E'`\r68F~",
            "price": 82.26,
            "category": "electronics"
          },
          {
            "name": "B.J_$",
            "price": 457.28,
            "category": "clothing"
          }
        ],
        "total_cost": 539.54
      }
    ],
    "total_spent": 539.54
  },
  {
    "name": "Timothy Sanchez",
    "email": "connor13@example.org",
    "orders": [
      {
        "order_id": "ac51d095-155c-9f57-188b-a2d91034e06a",
        "products": [
          {
            "name": "HaYD;eK\\^i",
            "price": 814.76,
            "category": "clothing"
          }
        ],
        "total_cost": 814.76
      },
      {
        "order_id": "7cfca1f6-43af-8e4f-c31b-754a88e0b5c8",
        "products": [
          {
            "name": "D6eE<Y`AC2o",
            "price": 106.45,
            "category": "electronics"
          },
          {
            "name": "FUwcTh)hX\u000bb5]DeK",
            "price": 936.42,
            "category": "clothing"
          }
        ],
        "total_cost": 1042.87
      },
      {
        "order_id": "48104c52-3076-1e53-9070-91795c55afab",
        "products": [
          {
            "name": "z8bG3g*I7R#eyW",
            "price": 182.25,
            "category": "books"
          }
        ],
        "total_cost": 182.25
      }
    ],
    "total_spent": 2039.8799999999999
  },
  {
    "name": "Robert Lam",
    "email": "middletonamanda@example.org",
    "orders": [
      {
        "order_id": "af634dc5-3d67-e501-1b16-c4e2de49d66b",
        "products": [
          {
            "name": ";\\9^%u0Vt#'?Un\\( ;U6",
            "price": 499.4,
            "category": "books"
          },
          {
            "name": "@.,W(@nP-ZfOrq",
            "price": 373.37,
            "category": "clothing"
          }
        ],
        "total_cost": 872.77
      },
      {
        "order_id": "568d3603-14b7-7e20-7c8f-ec94f7fa98e9",
        "products": [
          {
            "name": ")GG,Tv]9m\"(\u000bn\r<5 ",
            "price": 640.68,
            "category": "home goods"
          },
          {
            "name": "pqy-Ze\u000bf9`PHde9\u000b00,`",
            "price": 184.34,
            "category": "clothing"
          }
        ],
        "total_cost": 825.02
      }
    ],
    "total_spent": 1697.79
  }
]

Field Types

Static

Field type that only requires a fixed value. The fixed value can be of any type.

from datamock import Static

static_value = Static(value='my-static-value')
generated_value = static_value.generate()
print(generated_value)

Choice

Choice field type to be used when generated values should come from a pre-defined set of choices.

from datamock import Choice

# No weights
my_choices = ['option1', 'option2', 'option3']
string_field = Choice(choices=my_choices)
generated_value = string_field.generate()
print(generated_value)

# Weighting
my_choices = [{'key1': 1}, {'key2': 'something'}, {'key3': 2.0, 'key4': 'this'}]
weights = [0.1, 0.8, 0.1]
string_field = Choice(choices=my_choices, weights=weights)
generated_value = string_field.generate()
print(generated_value)

String

String type with support for various types of values.

from datamock import String

social_security_no_regex = r'\d{3}-\d{2}-\d{4}'
string_field = String(regex=social_security_no_regex)
generated_value = string_field.generate()
print(generated_value)

Float

Float type to generate random floating point numbers. This can be controlled in various ways:

  1. Generate a float from $\mathcal{U}(\texttt{min}, \texttt{max})$
  2. Generate a float from $\mathcal{N}(\mu, \sigma)$

The optional may also be optionally rounded to a specified number of decimal places.

from datamock import Float

# Uniform distribution
float_field = Float((0.0, 1.0))
generated_value = float_field.generate()
print(generated_value)

# Normal distribution
float_field = Float(
    distribution="normal",
    distribution_params={"mean": 0.5, "std": 0.1},
    round_to=4
)
generated_value = float_field.generate()
print(generated_value)

Int

Int type to generate uniform random integers within a range.

from datamock import Int

int_field = Int((0, 100))
generated_value = int_field.generate()
print(generated_value)

Maybe

Field type with behaviour like Optional (i.e. makes a field null with a specified probability).

from datamock import Maybe, Int

maybe_int_field = Maybe(Int(), probability=0.1)
generated_value = maybe_int_field.generate()
print(generated_value)

Boolean

Boolean field type, with optional weighting. If provided, the weights array is of the format [<true_weight>, <false_weight>].

from datamock import Boolean

boolean_field = Boolean(weights=[0.8, 0.2])
generated_value = boolean_field.generate()
print(generated_value)

Date

Date field type to generate a random date within a specified date range.

from datamock import Date

date_field = Date(start='2000-01-01', end='2030-01-01', fmt='%Y-%m-%d')
generated_value = date_field.generate()
print(generated_value)

Custom

Custom field type that allows the user to specify how values are generated for the field. This is enabled by providing a callable that should accept no arguments. This can be used to generate custom values that do not conform to one of the existing field types. Possible use cases are making APIs calls, running inference on ML models, and calling external libraries.

import random

from datamock import Custom

def generate_random_embedding():
    return [random.random() for _ in range(512)]

custom_field = Custom(generate_random_embedding)
generated_value = custom_field.generate()
print(generated_value)

Derived

Field type that enables the computation of values deriving from other values of other fields in the schema.

from datamock import Derived, Schema, Float, Int


# Example 1 - single source field
class Example1Schema(Schema):
    base_price = Float((100, 200))
    price_with_vat = Derived(lambda ctx: ctx['base_price'] * 1.2)

schema = Example1Schema()
print(schema.generate())

# Example 2 - multiple source fields
class Example2Schema(Schema):
    quantity = Int((1, 100))
    unit_price = Float((100, 200))
    total = Derived(lambda ctx: ctx['quantity'] * ctx['unit_price'])

schema = Example2Schema()
print(schema.generate())

FakerProvider

Field type from which any supported provider from the Faker library can be leveraged.

from datamock import FakeProvider

string_field = FakeProvider(faker_provider='name')
generated_value = string_field.generate()
print(generated_value)

Common providers are provided as lightweight fields and can be used as follows:

from datamock.field import City, Name, UUID, URL  # and more...

# Example usage:
city_field = City()
generated_value = city_field.generate()
print(generated_value)

ListOf

List field type to be used to create lists of fields (either field types or (possibly nested) schemas). ListOf can also be nested as required.

from datamock import ListOf, Float

list_field = ListOf(Float())
generated_value = list_field.generate()
print(generated_value)

Schemas

Fields can be combined into schemas. Data can then be generated for schemas (on an instance or batch level). For example:

from datamock import Schema, Float, Int, Maybe, Boolean, ListOf
from datamock.field import Name

class Person(Schema):
    name = Name()
    salary = Maybe(Float((10_000, 100_000)))
    is_member = Boolean()
    device_ids = ListOf(Int())

person = Person()
print(person.generate())
print(person.generate_batch(10))

Schemas can also be nested:

from datamock import Schema, Float, Int, Maybe, Boolean, ListOf
from datamock.field import Name, Country

class Manufacturer(Schema):
    name = Name()
    country = Country()

class Product(Schema):
    manufacturer = Manufacturer()
    price = Float()

product = Product()
print(product.generate())
print(product.generate_batch(5))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datamock-0.1.0.tar.gz (48.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datamock-0.1.0-py3-none-any.whl (35.9 kB view details)

Uploaded Python 3

File details

Details for the file datamock-0.1.0.tar.gz.

File metadata

  • Download URL: datamock-0.1.0.tar.gz
  • Upload date:
  • Size: 48.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datamock-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7422cc4855bff9b94b11f5e789262f9c355f7473275aa5f4bfc59f15c4b0fe38
MD5 484c1740deeadc7bb606f1d053399bda
BLAKE2b-256 97d20dc9ace6231f760846e563ba6808955e42bd30e84833a8cdef3359894790

See more details on using hashes here.

Provenance

The following attestation bundles were made for datamock-0.1.0.tar.gz:

Publisher: ci.yaml on DavidTorpey/datamock

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file datamock-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: datamock-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 35.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datamock-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2ac0ba8f022e7ef16c61be6a429106386c8a421cd56f13ad52c8c23269bfdc51
MD5 901342075a2f2cce6170b849b6163c5e
BLAKE2b-256 9eb8bea56bb01b2122a2615f52f410b83e3c88b559be8c6af2d29fbba24bd758

See more details on using hashes here.

Provenance

The following attestation bundles were made for datamock-0.1.0-py3-none-any.whl:

Publisher: ci.yaml on DavidTorpey/datamock

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page