Skip to main content

Generate fake data conforming to a Table Schema

Project description

Generate tabular fake data conforming to a Table Schema.

tsfaker library is available on PyPI.

This library was originally developed to generate a synthetic version of SNDS database, which contains hundreds of tables, hence tsfaker efficiently deals with foreign keys.

Notes :

We aim to generate fake data conforming to a schema, not fake data with realistic statistical information (see Related work section).

This library is in beta and subject to frequent changes (see Releases notes section).

Usage

Installation

$ pip3 install tsfaker

Simple usage

Generate 3 rows of fake data from a single table schema file.

$ tsfaker https://gitlab.com/healthdatahub/tsfaker/raw/master/tests/schemas/implemented_types.json  --nrows 3 --pretty
  boolean         string            number      integer        date              datetime  year yearmonth
0       1  haHoKysholbSI    9780230269.512  -7061309068  1914-10-03  1902-04-11T11:21:11Z  1939    196405
1       0      rLugGhNek    990894536.8945   2529879443  2026-09-08  2015-11-27T16:21:54Z  1932    192909
2       1         ipqVXm  -4371053960.8987   -529880373  1994-09-27  1937-01-12T18:40:15Z  2021    193303

Advanced usage

Show help message.

$ tsfaker --help
Usage: tsfaker [OPTIONS] [SCHEMA_DESCRIPTORS]...
...

Download examples schemas from project schema-snds.

$ git clone  https://gitlab.com/healthdatahub/schema-snds && cd schema-snds

Generate fake data for all schemas in a schemas folder using csv files in nomenclatures folder, and write them to fake_data folder.

$ mkdir fake_data
$ tsfaker schemas -o fake_data -r nomenclatures
2019-01-01 00:00:00 :: INFO :: Data generated from descriptor 'schemas/PMSI/PMSI MCO/T_MCOaa_nnE.json' will be written on 'fake_data/PMSI/PMSI MCO/T_MCOaa_nnE.csv'
2019-01-01 00:00:00 :: INFO :: Data generated from descriptor 'schemas/PMSI/PMSI MCO/T_MCOaa_nnFASTC.json' will be written on 'fake_data/PMSI/PMSI MCO/T_MCOaa_nnFASTC.csv'
2019-01-01 00:00:00 :: INFO :: Data generated from descriptor 'schemas/PMSI/PMSI SSR/T_SSRaa_nnE.json' will be written on 'fake_data/PMSI/PMSI SSR/T_SSRaa_nnE.csv'
...

Release notes

Version 0.14

  • [Fix] Update command line default value to match Click library version >=8.0

Version 0.13

  • [Fix] Adapt maximum default integer value to local system

Version 0.12

  • It is possible to specify trueValues and falseValues for boolean type (according to TableSchema standard)
  • Only one item is accepted in trueValues and falseValues arrays
  • It is possible to specify a format for types date and datetime

Version 0.11

  • yearmonth type does not follow ISO 8601 format ‘YYYY-MM’ and is now generated without a dash ‘YYYYMM’

Version 0.10

  • boolean type is implemented, default values for this type are 0 for False and 1 for True

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsfaker-0.14.tar.gz (16.5 kB view hashes)

Uploaded source

Built Distribution

tsfaker-0.14-py3-none-any.whl (22.3 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page