datafactory

datafactory generates testdata.

These details have not been verified by PyPI

Project links

Homepage

Project description

https://badge.fury.io/py/datafactory.svg

https://github.com/walkframe/datafactory/workflows/master/badge.svg

https://img.shields.io/badge/code%20style-black-000000.svg

https://codecov.io/gh/walkframe/datafactory/branch/master/graph/badge.svg

https://img.shields.io/badge/License-Apache%202.0-blue.svg

Overview

datafactory makes flexible data according to the given rules.

The features are divided into field, model, container, and formatter. If you compare it to a DB, fields are columns, models are records, and containers are tables.

The great thing about the datafactory is its flexibility in type specification. Containers can also be nested.

formatter supports data formatting and file output.

Requirements

Python 3.5 or later.

Install

$ pip install datafactory

Usage

Basic Example

In [1]: import datafactory

In [2]: model = datafactory.Model({
   ...:     'id': datafactory.IncrementField(),
   ...:     'x': datafactory.CycleField(['a', 'b', 'c']),
   ...:     # BLANK will be omit.
   ...:     'option': datafactory.ChoiceField([True, False, datafactory.BLANK]),
   ...: })

In [3]: container = datafactory.Container(model, 5, render=True)

In [4]: container
Out[4]:
[{'id': 1, 'x': 'a'},
 {'id': 2, 'x': 'b', 'option': False},
 {'id': 3, 'x': 'c', 'option': True},
 {'id': 4, 'x': 'a'},
 {'id': 5, 'x': 'b'}]

# specify rewrite=True, if file already exists.
In [5]: datafactory.JsonFormatter(container).write('/tmp/test.json', rewrite=True)

In [6]: !cat /tmp/test.json
[
 {
  "x": "a",
  "id": 1
 },
 {
  "x": "b",
  "id": 2,
  "option": false
 },
 {
  "x": "c",
  "id": 3,
  "option": true
 },
 {
  "x": "a",
  "id": 4
 },
 {
  "x": "b",
  "id": 5
 }
]

TSV Example

In [1]: import datafactory

In [2]: model = datafactory.ListModel([
   ...:     datafactory.IncrementField(start=10, step=5),
   ...:     datafactory.HashOfField(2, 'md5'),  # hashing value of the third column.
   ...:     datafactory.ChoiceField(['foo', 'bar', 'baz']),
   ...:     datafactory.CycleField(range(0, 30, 10)),
   ...: ]).ordering(2)  # render at first index:2(third column)

# IterContainer is saving memory, because generating an element each time.
In [3]: container = datafactory.IterContainer(model, 10)  # repeat 10 times.

In [4]: datafactory.CsvFormatter(
   ...:     container,
   ...:     delimiter='\t',
   ...:     header=['id', 'hash-of-name', 'name', 'value']
   ...: ).write('/tmp/test.csv', rewrite=True)

In [5]: !cat /tmp/test.csv
id    hash-of-name    name    value
10    acbd18db4cc2f85cedef654fccc4a4d8        foo     0
15    acbd18db4cc2f85cedef654fccc4a4d8        foo     10
20    73feffa4b7f6bb68e44cf984c85f6e88        baz     20
25    acbd18db4cc2f85cedef654fccc4a4d8        foo     0
30    acbd18db4cc2f85cedef654fccc4a4d8        foo     10
35    73feffa4b7f6bb68e44cf984c85f6e88        baz     20
40    73feffa4b7f6bb68e44cf984c85f6e88        baz     0
45    73feffa4b7f6bb68e44cf984c85f6e88        baz     10
50    37b51d194a7513e45b56f6524f2d51f2        bar     20
55    37b51d194a7513e45b56f6524f2d51f2        bar     0

Custom Example

If object is callable, it stores execution result.

Model

In [1]: import datafactory

In [2]: def square(k, i):
   ...:     return k * i
   ...:

In [3]: container = datafactory.DictContainer(square)

In [4]: container(['a', 'b', 'c', 'd', 'e'])
Out[4]: {'a': '', 'b': 'b', 'c': 'cc', 'd': 'ddd', 'e': 'eeee'}

Field

In [1]: import datafactory

In [2]: model = datafactory.Model({
   ...:    'col1': (lambda r, i: i),
   ...:    'col2': (lambda r: r['col1'] + 1),
   ...:    'col3': (lambda r: r['col2'] * 2),
   ...:    'col4': 100,  # fixed value
   ...: }).ordering('col1', 'col2', 'col3')

In [3]: container = datafactory.ListContainer(model)

In [4]: container(4)
Out[4]:
[{'col1': 0, 'col2': 1, 'col3': 2, 'col4': 100},
 {'col1': 1, 'col2': 2, 'col3': 4, 'col4': 100},
 {'col1': 2, 'col2': 3, 'col3': 6, 'col4': 100},
 {'col1': 3, 'col2': 4, 'col3': 8, 'col4': 100}]

Limited number of element Example

In [1]: import datafactory

In [2]: model = datafactory.Model({
   ...:     # x: a is 1times limited. / b is 2times limited. / c is 3times limited.
   ...:     'x': datafactory.PickoutField({'a': 1, 'b': 2, 'c': 3}, missing=None),
   ...:     # y: a is 2times limited. / b and c is 1times limited.
   ...:     'y': datafactory.PickoutField(['a', 'a', 'b', 'c'], missing='*'),
   ...:     # z: a and b can't be selected. / c is 5times limited.
   ...:     'z': datafactory.PickoutField(['c']*5, missing=None),
   ...: })

In [3]: container = datafactory.ListContainer(model)

In [4]: container(6)
Out[4]:
[{'x': 'a', 'y': 'a', 'z': 'c'},
 {'x': 'c', 'y': 'b', 'z': 'c'},
 {'x': 'c', 'y': 'a', 'z': 'c'},
 {'x': 'b', 'y': 'c', 'z': 'c'},
 {'x': 'c', 'y': '*', 'z': 'c'},
 {'x': 'b', 'y': '*', 'z': None}]

Combination Example

To generate the testdata that combines multiple elements can be achieved by using the repeat-argument of CycleField and SequenceField.

In [1]: import datafactory

In [2]: l0 = ['a', 'b']

In [3]: l1 = ['a', 'b', 'c']

In [4]: l2 = ['a', 'b', 'c', 'd']

In [5]: model = datafactory.ListModel([
   ...:     datafactory.SequenceField(l0, repeat=len(l1)*len(l2), missing=datafactory.ESCAPE),
   ...:     datafactory.CycleField(l1, repeat=len(l2)),
   ...:     datafactory.CycleField(l2),
   ...: ])

In [6]: container = datafactory.Container(model)

# by specifying the ESCAPE to missing-argument
# automatically detect end of elements and escape before reaching 10000.
In [7]: container(10000)
Out[7]:
[['a', 'a', 'a'],
 ['a', 'a', 'b'],
 ['a', 'a', 'c'],
 ['a', 'a', 'd'],
 ['a', 'b', 'a'],
 ['a', 'b', 'b'],
 ['a', 'b', 'c'],
 ['a', 'b', 'd'],
 ['a', 'c', 'a'],
 ['a', 'c', 'b'],
 ['a', 'c', 'c'],
 ['a', 'c', 'd'],
 ['b', 'a', 'a'],
 ['b', 'a', 'b'],
 ['b', 'a', 'c'],
 ['b', 'a', 'd'],
 ['b', 'b', 'a'],
 ['b', 'b', 'b'],
 ['b', 'b', 'c'],
 ['b', 'b', 'd'],
 ['b', 'c', 'a'],
 ['b', 'c', 'b'],
 ['b', 'c', 'c'],
 ['b', 'c', 'd']]

nested example

In [1]: import datafactory

In [2]: model = datafactory.Model({
   ...:     'a': datafactory.ListModel([
   ...:         datafactory.CycleField(['b', 'c']),
   ...:         datafactory.CycleField(['d', 'e']),
   ...:     ]),
   ...:     datafactory.ChoiceField(['f', 'g', 'h']): datafactory.DictContainer(lambda x: x * 2, 5)
   ...: })

In [3]: datafactory.Container(model, 10, render=True)
Out[3]:
[{'a': ['b', 'd'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
 {'a': ['c', 'e'], 'f': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
 {'a': ['b', 'd'], 'f': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
 {'a': ['c', 'e'], 'g': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
 {'a': ['b', 'd'], 'f': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
 {'a': ['c', 'e'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
 {'a': ['b', 'd'], 'g': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
 {'a': ['c', 'e'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
 {'a': ['b', 'd'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
 {'a': ['c', 'e'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}}]

datetime Utility

choice

random choice between start and end.

In [1]: from datafactory.utils.datetime import choice


In [2]: choice(1988, '2015-11-11T11:11:11.111111')
Out[2]: datetime.datetime(2009, 11, 30, 23, 25, 43, 240031)

# tuple: datetime(*tuple), dict: datetime(**dict)
In [3]: choice((1988, 5, 22), {'year': 2015, 'month': 11, 'day': 11})
Out[3]: datetime.datetime(1996, 7, 1, 11, 14, 59, 314809)

In [4]: from datetime import datetime, date

In [5]: choice(date(1988, 5, 22), datetime(2015, 11, 11, 11, 11, 11))
Out[5]: datetime.datetime(2011, 3, 23, 19, 39, 14, 476901)

generator

generator that generate the datetime object at regular intervals.

In [1]: from datetime import timedelta
In [2]: from datafactory.utils.datetime import generator

# if you omit end-argument, then it creates an object infinitely.
In [3]: g = generator(start=2015, interval=timedelta(days=1, hours=12))

In [4]: next(g)
Out[4]: datetime.datetime(2015, 1, 1, 0, 0)

In [5]: next(g)
Out[5]: datetime.datetime(2015, 1, 2, 12, 0)

In [6]: next(g)
Out[6]: datetime.datetime(2015, 1, 4, 0, 0)

In [7]: next(g)
Out[7]: datetime.datetime(2015, 1, 5, 12, 0)

range

generate list object that includes regularly generated datetime objects element.

In [1]: from datetime import timedelta
In [2]: from datafactory.utils.datetime import range

In [3]: range(2015, '2015/2/1')
Out[3]:
[datetime.datetime(2015, 1, 1, 0, 0),
 datetime.datetime(2015, 1, 2, 0, 0),
 datetime.datetime(2015, 1, 3, 0, 0),
 datetime.datetime(2015, 1, 4, 0, 0),
 datetime.datetime(2015, 1, 5, 0, 0),
 datetime.datetime(2015, 1, 6, 0, 0),
 datetime.datetime(2015, 1, 7, 0, 0),
 datetime.datetime(2015, 1, 8, 0, 0),
 datetime.datetime(2015, 1, 9, 0, 0),
 datetime.datetime(2015, 1, 10, 0, 0),
 datetime.datetime(2015, 1, 11, 0, 0),
 datetime.datetime(2015, 1, 12, 0, 0),
 datetime.datetime(2015, 1, 13, 0, 0),
 datetime.datetime(2015, 1, 14, 0, 0),
 datetime.datetime(2015, 1, 15, 0, 0),
 datetime.datetime(2015, 1, 16, 0, 0),
 datetime.datetime(2015, 1, 17, 0, 0),
 datetime.datetime(2015, 1, 18, 0, 0),
 datetime.datetime(2015, 1, 19, 0, 0),
 datetime.datetime(2015, 1, 20, 0, 0),
 datetime.datetime(2015, 1, 21, 0, 0),
 datetime.datetime(2015, 1, 22, 0, 0),
 datetime.datetime(2015, 1, 23, 0, 0),
 datetime.datetime(2015, 1, 24, 0, 0),
 datetime.datetime(2015, 1, 25, 0, 0),
 datetime.datetime(2015, 1, 26, 0, 0),
 datetime.datetime(2015, 1, 27, 0, 0),
 datetime.datetime(2015, 1, 28, 0, 0),
 datetime.datetime(2015, 1, 29, 0, 0),
 datetime.datetime(2015, 1, 30, 0, 0),
 datetime.datetime(2015, 1, 31, 0, 0),
 datetime.datetime(2015, 2, 1, 0, 0)]

# +-3 hour noise, +5 minute noise
In [4]: range(2015, '2015-01-15', hours=3, minutes=(0, 5))
Out[4]:
[datetime.datetime(2015, 1, 1, 3, 1),
 datetime.datetime(2015, 1, 2, 0, 3),
 datetime.datetime(2015, 1, 3, 2, 0),
 datetime.datetime(2015, 1, 3, 22, 2),
 datetime.datetime(2015, 1, 4, 22, 3),
 datetime.datetime(2015, 1, 6, 0, 2),
 datetime.datetime(2015, 1, 7, 0, 4),
 datetime.datetime(2015, 1, 8, 0, 4),
 datetime.datetime(2015, 1, 8, 21, 3),
 datetime.datetime(2015, 1, 9, 22, 0),
 datetime.datetime(2015, 1, 11, 0, 0),
 datetime.datetime(2015, 1, 11, 22, 1),
 datetime.datetime(2015, 1, 12, 22, 5),
 datetime.datetime(2015, 1, 14, 3, 0),
 datetime.datetime(2015, 1, 15, 2, 5)]

# it is able to specify minus direction as interval.
In [5]: range(start='2015-5-22', end='2015-04-22', interval=timedelta(days=-1))
Out[5]:
[datetime.datetime(2015, 5, 22, 0, 0),
 datetime.datetime(2015, 5, 21, 0, 0),
 datetime.datetime(2015, 5, 20, 0, 0),
 datetime.datetime(2015, 5, 19, 0, 0),
 datetime.datetime(2015, 5, 18, 0, 0),
 datetime.datetime(2015, 5, 17, 0, 0),
 datetime.datetime(2015, 5, 16, 0, 0),
 datetime.datetime(2015, 5, 15, 0, 0),
 datetime.datetime(2015, 5, 14, 0, 0),
 datetime.datetime(2015, 5, 13, 0, 0),
 datetime.datetime(2015, 5, 12, 0, 0),
 datetime.datetime(2015, 5, 11, 0, 0),
 datetime.datetime(2015, 5, 10, 0, 0),
 datetime.datetime(2015, 5, 9, 0, 0),
 datetime.datetime(2015, 5, 8, 0, 0),
 datetime.datetime(2015, 5, 7, 0, 0),
 datetime.datetime(2015, 5, 6, 0, 0),
 datetime.datetime(2015, 5, 5, 0, 0),
 datetime.datetime(2015, 5, 4, 0, 0),
 datetime.datetime(2015, 5, 3, 0, 0),
 datetime.datetime(2015, 5, 2, 0, 0),
 datetime.datetime(2015, 5, 1, 0, 0),
 datetime.datetime(2015, 4, 30, 0, 0),
 datetime.datetime(2015, 4, 29, 0, 0),
 datetime.datetime(2015, 4, 28, 0, 0),
 datetime.datetime(2015, 4, 27, 0, 0),
 datetime.datetime(2015, 4, 26, 0, 0),
 datetime.datetime(2015, 4, 25, 0, 0),
 datetime.datetime(2015, 4, 24, 0, 0),
 datetime.datetime(2015, 4, 23, 0, 0),
 datetime.datetime(2015, 4, 22, 0, 0)]

common

noise

It is possible to specify the gap between the actual time as noise parameters. allow to specify the noise parameters are “datetimes.generator” and “datetimes.range” functions.

**noise is specified in the kwargs format and they are not required.

The available keys are same with timedelta-args.

days
hours
minute
seconds
microseconds

argtype

The acceptable arguments as the other than datetime type are the following.

int:: It is evaluated as a year.
str:: It is parsed as datetime from the numeric part of the string.
tuple:: It will be passed into datetime args.
dict:: It will be passed into datetime kwargs.
date:: It will be converted datetime type.

history

1.0.x

Initialize.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.1

Jun 6, 2020

1.0.0

Jun 4, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datafactory-1.0.1.tar.gz (22.1 kB view details)

Uploaded Jun 6, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datafactory-1.0.1-py3-none-any.whl (36.4 kB view details)

Uploaded Jun 6, 2020 Python 3

File details

Details for the file datafactory-1.0.1.tar.gz.

File metadata

Download URL: datafactory-1.0.1.tar.gz
Upload date: Jun 6, 2020
Size: 22.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.3

File hashes

Hashes for datafactory-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`bc60824d1d29ca55be9772ac8450c7ec5774bcd1d1630c8f0ab8e30b62f1887b`
MD5	`a1d6ecbcaff9cff3ccc868905d7a9be3`
BLAKE2b-256	`b978d57f9c802b7ef482bf444263791db32d58672ebdd0ed7203ce06ef59253d`

See more details on using hashes here.

File details

Details for the file datafactory-1.0.1-py3-none-any.whl.

File metadata

Download URL: datafactory-1.0.1-py3-none-any.whl
Upload date: Jun 6, 2020
Size: 36.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.3

File hashes

Hashes for datafactory-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`41ea7201fe4bc18ea54aea9f32fd3090c97c6fe9601f42c3aeca612a498c58fb`
MD5	`f65f7e9c06b4f32f61a6198d737465e7`
BLAKE2b-256	`5fcede9c6eb552cc04bee8cf70a37bad42ea4e9293efb0ad16e5e961565c6f90`

See more details on using hashes here.

datafactory 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Overview

Requirements

Install

Usage

Basic Example

TSV Example

Custom Example

Model

Field

Limited number of element Example

Combination Example

nested example

datetime Utility

choice

generator

range

common

history

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes