datafactory generates testdata.
Project description
Requirements
Python 3.5 or later.
Install
$ pip install datafactory
Usage
Basic Example
In [1]: import datafactory
In [2]: model = datafactory.Model({
...: 'id': datafactory.IncrementField(),
...: 'x': datafactory.CycleField(['a', 'b', 'c']),
...: # BLANK will be omit.
...: 'option': datafactory.ChoiceField([True, False, datafactory.BLANK]),
...: })
In [3]: container = datafactory.Container(model, 5, render=True)
In [4]: container
Out[4]:
[{'id': 1, 'x': 'a'},
{'id': 2, 'x': 'b', 'option': False},
{'id': 3, 'x': 'c', 'option': True},
{'id': 4, 'x': 'a'},
{'id': 5, 'x': 'b'}]
# specify rewrite=True, if file already exists.
In [5]: datafactory.JsonFormatter(container).write('/tmp/test.json', rewrite=True)
In [6]: !cat /tmp/test.json
[
{
"x": "a",
"id": 1
},
{
"x": "b",
"id": 2,
"option": false
},
{
"x": "c",
"id": 3,
"option": true
},
{
"x": "a",
"id": 4
},
{
"x": "b",
"id": 5
}
]
TSV Example
In [1]: import datafactory
In [2]: model = datafactory.ListModel([
...: datafactory.IncrementField(start=10, step=5),
...: datafactory.HashOfField(2, 'md5'), # hashing value of the third column.
...: datafactory.ChoiceField(['foo', 'bar', 'baz']),
...: datafactory.CycleField(range(0, 30, 10)),
...: ]).ordering(2) # render at first index:2(third column)
# IterContainer is saving memory, because generating an element each time.
In [3]: container = datafactory.IterContainer(model, 10) # repeat 10 times.
In [4]: datafactory.CsvFormatter(
...: container,
...: delimiter='\t',
...: header=['id', 'hash-of-name', 'name', 'value']
...: ).write('/tmp/test.csv', rewrite=True)
In [5]: !cat /tmp/test.csv
id hash-of-name name value
10 acbd18db4cc2f85cedef654fccc4a4d8 foo 0
15 acbd18db4cc2f85cedef654fccc4a4d8 foo 10
20 73feffa4b7f6bb68e44cf984c85f6e88 baz 20
25 acbd18db4cc2f85cedef654fccc4a4d8 foo 0
30 acbd18db4cc2f85cedef654fccc4a4d8 foo 10
35 73feffa4b7f6bb68e44cf984c85f6e88 baz 20
40 73feffa4b7f6bb68e44cf984c85f6e88 baz 0
45 73feffa4b7f6bb68e44cf984c85f6e88 baz 10
50 37b51d194a7513e45b56f6524f2d51f2 bar 20
55 37b51d194a7513e45b56f6524f2d51f2 bar 0
Custom Example
if object is callable, it stores execution result.
Model
In [1]: import datafactory
In [2]: def square(k, i):
...: return k * i
...:
In [3]: container = datafactory.DictContainer(square)
In [4]: container(['a', 'b', 'c', 'd', 'e'])
Out[4]: {'a': '', 'b': 'b', 'c': 'cc', 'd': 'ddd', 'e': 'eeee'}
Field
In [1]: import datafactory
In [2]: model = datafactory.Model({
...: 'col1': (lambda r, i: i),
...: 'col2': (lambda r: r['col1'] + 1),
...: 'col3': (lambda r: r['col2'] * 2),
...: 'col4': 100, # fixed value
...: }).ordering('col1', 'col2', 'col3')
In [3]: container = datafactory.ListContainer(model)
In [4]: container(4)
Out[4]:
[{'col1': 0, 'col2': 1, 'col3': 2, 'col4': 100},
{'col1': 1, 'col2': 2, 'col3': 4, 'col4': 100},
{'col1': 2, 'col2': 3, 'col3': 6, 'col4': 100},
{'col1': 3, 'col2': 4, 'col3': 8, 'col4': 100}]
Limited number of element Example
In [1]: import datafactory
In [2]: model = datafactory.Model({
...: # x: a is 1times limited. / b is 2times limited. / c is 3times limited.
...: 'x': datafactory.PickoutField({'a': 1, 'b': 2, 'c': 3}, missing=None),
...: # y: a is 2times limited. / b and c is 1times limited.
...: 'y': datafactory.PickoutField(['a', 'a', 'b', 'c'], missing='*'),
...: # z: a and b can't be selected. / c is 5times limited.
...: 'z': datafactory.PickoutField(['c']*5, missing=None),
...: })
In [3]: container = datafactory.ListContainer(model)
In [4]: container(6)
Out[4]:
[{'x': 'a', 'y': 'a', 'z': 'c'},
{'x': 'c', 'y': 'b', 'z': 'c'},
{'x': 'c', 'y': 'a', 'z': 'c'},
{'x': 'b', 'y': 'c', 'z': 'c'},
{'x': 'c', 'y': '*', 'z': 'c'},
{'x': 'b', 'y': '*', 'z': None}]
Combination Example
To generate the testdata that combines multiple elements can be achieved by using the repeat-argument of CycleField and SequenceField.
In [1]: import datafactory
In [2]: l0 = ['a', 'b']
In [3]: l1 = ['a', 'b', 'c']
In [4]: l2 = ['a', 'b', 'c', 'd']
In [5]: model = datafactory.ListModel([
...: datafactory.SequenceField(l0, repeat=len(l1)*len(l2), missing=datafactory.ESCAPE),
...: datafactory.CycleField(l1, repeat=len(l2)),
...: datafactory.CycleField(l2),
...: ])
In [6]: container = datafactory.Container(model)
# by specifying the ESCAPE to missing-argument
# automatically detect end of elements and escape before reaching 10000.
In [7]: container(10000)
Out[7]:
[['a', 'a', 'a'],
['a', 'a', 'b'],
['a', 'a', 'c'],
['a', 'a', 'd'],
['a', 'b', 'a'],
['a', 'b', 'b'],
['a', 'b', 'c'],
['a', 'b', 'd'],
['a', 'c', 'a'],
['a', 'c', 'b'],
['a', 'c', 'c'],
['a', 'c', 'd'],
['b', 'a', 'a'],
['b', 'a', 'b'],
['b', 'a', 'c'],
['b', 'a', 'd'],
['b', 'b', 'a'],
['b', 'b', 'b'],
['b', 'b', 'c'],
['b', 'b', 'd'],
['b', 'c', 'a'],
['b', 'c', 'b'],
['b', 'c', 'c'],
['b', 'c', 'd']]
nested example
In [1]: import datafactory
In [2]: model = datafactory.Model({
...: 'a': datafactory.ListModel([
...: datafactory.CycleField(['b', 'c']),
...: datafactory.CycleField(['d', 'e']),
...: ]),
...: datafactory.ChoiceField(['f', 'g', 'h']): datafactory.DictContainer(lambda x: x * 2, 5)
...: })
In [3]: datafactory.Container(model, 10, render=True)
Out[3]:
[{'a': ['b', 'd'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['c', 'e'], 'f': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['b', 'd'], 'f': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['c', 'e'], 'g': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['b', 'd'], 'f': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['c', 'e'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['b', 'd'], 'g': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['c', 'e'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['b', 'd'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['c', 'e'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}}]
datetime Utility
choice
random choice between start and end.
In [1]: from datafactory.utils.datetime import choice
In [2]: choice(1988, '2015-11-11T11:11:11.111111')
Out[2]: datetime.datetime(2009, 11, 30, 23, 25, 43, 240031)
# tuple: datetime(*tuple), dict: datetime(**dict)
In [3]: choice((1988, 5, 22), {'year': 2015, 'month': 11, 'day': 11})
Out[3]: datetime.datetime(1996, 7, 1, 11, 14, 59, 314809)
In [4]: from datetime import datetime, date
In [5]: choice(date(1988, 5, 22), datetime(2015, 11, 11, 11, 11, 11))
Out[5]: datetime.datetime(2011, 3, 23, 19, 39, 14, 476901)
generator
generator that generate the datetime object at regular intervals.
In [1]: from datetime import timedelta
In [2]: from datafactory.utils.datetime import generator
# if you omit end-argument, then it creates an object infinitely.
In [3]: g = generator(start=2015, interval=timedelta(days=1, hours=12))
In [4]: next(g)
Out[4]: datetime.datetime(2015, 1, 1, 0, 0)
In [5]: next(g)
Out[5]: datetime.datetime(2015, 1, 2, 12, 0)
In [6]: next(g)
Out[6]: datetime.datetime(2015, 1, 4, 0, 0)
In [7]: next(g)
Out[7]: datetime.datetime(2015, 1, 5, 12, 0)
range
generate list object that includes regularly generated datetime objects element.
In [1]: from datetime import timedelta
In [2]: from datafactory.utils.datetime import range
In [3]: range(2015, '2015/2/1')
Out[3]:
[datetime.datetime(2015, 1, 1, 0, 0),
datetime.datetime(2015, 1, 2, 0, 0),
datetime.datetime(2015, 1, 3, 0, 0),
datetime.datetime(2015, 1, 4, 0, 0),
datetime.datetime(2015, 1, 5, 0, 0),
datetime.datetime(2015, 1, 6, 0, 0),
datetime.datetime(2015, 1, 7, 0, 0),
datetime.datetime(2015, 1, 8, 0, 0),
datetime.datetime(2015, 1, 9, 0, 0),
datetime.datetime(2015, 1, 10, 0, 0),
datetime.datetime(2015, 1, 11, 0, 0),
datetime.datetime(2015, 1, 12, 0, 0),
datetime.datetime(2015, 1, 13, 0, 0),
datetime.datetime(2015, 1, 14, 0, 0),
datetime.datetime(2015, 1, 15, 0, 0),
datetime.datetime(2015, 1, 16, 0, 0),
datetime.datetime(2015, 1, 17, 0, 0),
datetime.datetime(2015, 1, 18, 0, 0),
datetime.datetime(2015, 1, 19, 0, 0),
datetime.datetime(2015, 1, 20, 0, 0),
datetime.datetime(2015, 1, 21, 0, 0),
datetime.datetime(2015, 1, 22, 0, 0),
datetime.datetime(2015, 1, 23, 0, 0),
datetime.datetime(2015, 1, 24, 0, 0),
datetime.datetime(2015, 1, 25, 0, 0),
datetime.datetime(2015, 1, 26, 0, 0),
datetime.datetime(2015, 1, 27, 0, 0),
datetime.datetime(2015, 1, 28, 0, 0),
datetime.datetime(2015, 1, 29, 0, 0),
datetime.datetime(2015, 1, 30, 0, 0),
datetime.datetime(2015, 1, 31, 0, 0),
datetime.datetime(2015, 2, 1, 0, 0)]
# +-3 hour noise, +5 minute noise
In [4]: range(2015, '2015-01-15', hours=3, minutes=(0, 5))
Out[4]:
[datetime.datetime(2015, 1, 1, 3, 1),
datetime.datetime(2015, 1, 2, 0, 3),
datetime.datetime(2015, 1, 3, 2, 0),
datetime.datetime(2015, 1, 3, 22, 2),
datetime.datetime(2015, 1, 4, 22, 3),
datetime.datetime(2015, 1, 6, 0, 2),
datetime.datetime(2015, 1, 7, 0, 4),
datetime.datetime(2015, 1, 8, 0, 4),
datetime.datetime(2015, 1, 8, 21, 3),
datetime.datetime(2015, 1, 9, 22, 0),
datetime.datetime(2015, 1, 11, 0, 0),
datetime.datetime(2015, 1, 11, 22, 1),
datetime.datetime(2015, 1, 12, 22, 5),
datetime.datetime(2015, 1, 14, 3, 0),
datetime.datetime(2015, 1, 15, 2, 5)]
# it is able to specify minus direction as interval.
In [5]: range(start='2015-5-22', end='2015-04-22', interval=timedelta(days=-1))
Out[5]:
[datetime.datetime(2015, 5, 22, 0, 0),
datetime.datetime(2015, 5, 21, 0, 0),
datetime.datetime(2015, 5, 20, 0, 0),
datetime.datetime(2015, 5, 19, 0, 0),
datetime.datetime(2015, 5, 18, 0, 0),
datetime.datetime(2015, 5, 17, 0, 0),
datetime.datetime(2015, 5, 16, 0, 0),
datetime.datetime(2015, 5, 15, 0, 0),
datetime.datetime(2015, 5, 14, 0, 0),
datetime.datetime(2015, 5, 13, 0, 0),
datetime.datetime(2015, 5, 12, 0, 0),
datetime.datetime(2015, 5, 11, 0, 0),
datetime.datetime(2015, 5, 10, 0, 0),
datetime.datetime(2015, 5, 9, 0, 0),
datetime.datetime(2015, 5, 8, 0, 0),
datetime.datetime(2015, 5, 7, 0, 0),
datetime.datetime(2015, 5, 6, 0, 0),
datetime.datetime(2015, 5, 5, 0, 0),
datetime.datetime(2015, 5, 4, 0, 0),
datetime.datetime(2015, 5, 3, 0, 0),
datetime.datetime(2015, 5, 2, 0, 0),
datetime.datetime(2015, 5, 1, 0, 0),
datetime.datetime(2015, 4, 30, 0, 0),
datetime.datetime(2015, 4, 29, 0, 0),
datetime.datetime(2015, 4, 28, 0, 0),
datetime.datetime(2015, 4, 27, 0, 0),
datetime.datetime(2015, 4, 26, 0, 0),
datetime.datetime(2015, 4, 25, 0, 0),
datetime.datetime(2015, 4, 24, 0, 0),
datetime.datetime(2015, 4, 23, 0, 0),
datetime.datetime(2015, 4, 22, 0, 0)]
common
noise
possible to specify the gap between the actual time as noise parameters. allow to specify the noise parameters are “datetimes.generator” and “datetimes.range” functions. noise-arguments must be specified in the kwargs format. and not required. the available keys are same with timedelta-args. specifically, it is the following.
days
hours
minute
seconds
microseconds
argtype
acceptable argument as datetime other than datetime type are following.
- int:
it is processed as year.
- str or unicode:
create datetime object in the numeric part of string.
- tuple:
it is processed as (year, month, day)
- dict:
these items are processed as datetime arguments.
- date:
hour:minute:second is complemented with 00:00:00.
history
1.0.0
Initialize.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for datafactory-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 94348d5f221cc81c4b5e546f9d67d07ce1abaa86798436cd0b2e1617de061b08 |
|
MD5 | ca0071f2077035769e4034e2e722ab13 |
|
BLAKE2b-256 | 0e90249725bfa9017aeca38cb75dfa547161e78653ffab88989754397924f707 |