Generate fake data conforming to a Table Schema
Project description
Generate tabular fake data conforming to a Table Schema
Usage
Installation
$ pip3 install tsfaker
Simple usage
Generate 3 rows of fake data from a single table schema file.
$ tsfaker https://gitlab.com/healthdatahub/tsfaker/raw/master/tests/schemas/implemented_types.json --nrows 3 --pretty
string number integer date datetime year yearmonth
0 QZluRNRoaJ 8524064526.189381 5603365028 1918-06-09 1963-02-25T15:27:14 1927 1968-03
1 OAXCFryYDVMWmRTnP 8084094810.096195 -9782888534 1995-06-06 1924-06-14T07:41:59 1928 1929-02
2 -6416720321.04726 -1060427558 2006-12-11 2002-12-25T07:41:47 1999 1914-11
Advanced usage
Show help message.
$ tsfaker --help
Usage: tsfaker [OPTIONS] [SCHEMA_DESCRIPTORS]...
...
Download examples schemas from project schema-snds.
$ git clone https://gitlab.com/healthdatahub/schema-snds && cd schema-snds
Generate fake data for all schemas in a schemas folder, and write them to fake_data folder.
$ mkdir fake_data
$ tsfaker schemas -o fake_data
2019-01-01 00:00:00 :: INFO :: Data generated from descriptor 'schemas/PMSI/PMSI MCO/T_MCOaa_nnE.json' will be written on 'fake_data/PMSI/PMSI MCO/T_MCOaa_nnE.csv'
2019-01-01 00:00:00 :: INFO :: Data generated from descriptor 'schemas/PMSI/PMSI MCO/T_MCOaa_nnFASTC.json' will be written on 'fake_data/PMSI/PMSI MCO/T_MCOaa_nnFASTC.csv'
2019-01-01 00:00:00 :: INFO :: Data generated from descriptor 'schemas/PMSI/PMSI SSR/T_SSRaa_nnE.json' will be written on 'fake_data/PMSI/PMSI SSR/T_SSRaa_nnE.csv'
...
Goals
We aim to generate fake data conforming to a schema.
We do not aim to generate realistic data with statistical information (see related work).
Implementation steps
Generate data conforming to types
Generate data conforming to formats and constraints, such as min/max, enum, missing values, unique, length, and regex
Generate multiple tables conforming to foreign key references, with optional tables’ data provided through csv
API
We want to provide both a Python API and a command line API
Development methodology
We will conform to Test Driven Development methodology, hence writing test before writing implementation.
We want generated data to be valid when using goodtables.
We could go by conforming to more and more content checks, which are included in table-schema specification.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.