Generate fake data conforming to a Table Schema
Project description
Generate tabular fake data conforming to a Table Schema.
tsfaker library is available on PyPI.
This library was originally developed to generate a synthetic version of SNDS database, which contains hundreds of tables, hence tsfaker efficiently deals with foreign keys.
Notes :
We aim to generate fake data conforming to a schema, not fake data with realistic statistical information (see Related work section).
This library is in beta and subject to frequent changes (see Releases notes section).
Usage
Installation
$ pip3 install tsfaker
Simple usage
Generate 3 rows of fake data from a single table schema file.
$ tsfaker https://gitlab.com/healthdatahub/tsfaker/raw/master/tests/schemas/implemented_types.json --nrows 3 --pretty boolean string number integer date datetime year yearmonth 0 1 haHoKysholbSI 9780230269.512 -7061309068 1914-10-03 1902-04-11T11:21:11Z 1939 196405 1 0 rLugGhNek 990894536.8945 2529879443 2026-09-08 2015-11-27T16:21:54Z 1932 192909 2 1 ipqVXm -4371053960.8987 -529880373 1994-09-27 1937-01-12T18:40:15Z 2021 193303
Advanced usage
Show help message.
$ tsfaker --help Usage: tsfaker [OPTIONS] [SCHEMA_DESCRIPTORS]... ...
Download examples schemas from project schema-snds.
$ git clone https://gitlab.com/healthdatahub/schema-snds && cd schema-snds
Generate fake data for all schemas in a schemas folder using csv files in nomenclatures folder, and write them to fake_data folder.
$ mkdir fake_data $ tsfaker schemas -o fake_data -r nomenclatures 2019-01-01 00:00:00 :: INFO :: Data generated from descriptor 'schemas/PMSI/PMSI MCO/T_MCOaa_nnE.json' will be written on 'fake_data/PMSI/PMSI MCO/T_MCOaa_nnE.csv' 2019-01-01 00:00:00 :: INFO :: Data generated from descriptor 'schemas/PMSI/PMSI MCO/T_MCOaa_nnFASTC.json' will be written on 'fake_data/PMSI/PMSI MCO/T_MCOaa_nnFASTC.csv' 2019-01-01 00:00:00 :: INFO :: Data generated from descriptor 'schemas/PMSI/PMSI SSR/T_SSRaa_nnE.json' will be written on 'fake_data/PMSI/PMSI SSR/T_SSRaa_nnE.csv' ...
Release notes
Version 0.14
- [Fix] Update command line default value to match Click library version >=8.0
Version 0.13
- [Fix] Adapt maximum default integer value to local system
Version 0.12
- It is possible to specify trueValues and falseValues for boolean type (according to TableSchema standard)
- Only one item is accepted in trueValues and falseValues arrays
- It is possible to specify a format for types date and datetime
Version 0.11
- yearmonth type does not follow ISO 8601 format ‘YYYY-MM’ and is now generated without a dash ‘YYYYMM’
Version 0.10
- boolean type is implemented, default values for this type are 0 for False and 1 for True
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.