Skip to main content

Generate fake data conforming to a Table Schema

Project description

Generate tabular fake data conforming to a Table Schema.

tsfaker library is available on PyPI.

This library was originally developed to generate a synthetic version of SNDS database, which contains hundreds of tables, hence tsfaker efficiently deals with foreign keys.

Notes :

We aim to generate fake data conforming to a schema, not fake data with realistic statistical information (see Related work section).

This library is in beta and subject to frequent changes (see Releases notes section).

Usage

Installation

$ pip3 install tsfaker

Simple usage

Generate 3 rows of fake data from a single table schema file.

$ tsfaker https://gitlab.com/healthdatahub/tsfaker/raw/master/tests/schemas/implemented_types.json  --nrows 3 --pretty
  boolean         string            number      integer        date              datetime  year yearmonth
0       1  haHoKysholbSI    9780230269.512  -7061309068  1914-10-03  1902-04-11T11:21:11Z  1939    196405
1       0      rLugGhNek    990894536.8945   2529879443  2026-09-08  2015-11-27T16:21:54Z  1932    192909
2       1         ipqVXm  -4371053960.8987   -529880373  1994-09-27  1937-01-12T18:40:15Z  2021    193303

Advanced usage

Show help message.

$ tsfaker --help
Usage: tsfaker [OPTIONS] [SCHEMA_DESCRIPTORS]...
...

Download examples schemas from project schema-snds.

$ git clone  https://gitlab.com/healthdatahub/schema-snds && cd schema-snds

Generate fake data for all schemas in a schemas folder using csv files in nomenclatures folder, and write them to fake_data folder.

$ mkdir fake_data
$ tsfaker schemas -o fake_data -r nomenclatures
2019-01-01 00:00:00 :: INFO :: Data generated from descriptor 'schemas/PMSI/PMSI MCO/T_MCOaa_nnE.json' will be written on 'fake_data/PMSI/PMSI MCO/T_MCOaa_nnE.csv'
2019-01-01 00:00:00 :: INFO :: Data generated from descriptor 'schemas/PMSI/PMSI MCO/T_MCOaa_nnFASTC.json' will be written on 'fake_data/PMSI/PMSI MCO/T_MCOaa_nnFASTC.csv'
2019-01-01 00:00:00 :: INFO :: Data generated from descriptor 'schemas/PMSI/PMSI SSR/T_SSRaa_nnE.json' will be written on 'fake_data/PMSI/PMSI SSR/T_SSRaa_nnE.csv'
...

Release notes

Version 0.14

  • [Fix] Update command line default value to match Click library version >=8.0

Version 0.13

  • [Fix] Adapt maximum default integer value to local system

Version 0.12

  • It is possible to specify trueValues and falseValues for boolean type (according to TableSchema standard)

  • Only one item is accepted in trueValues and falseValues arrays

  • It is possible to specify a format for types date and datetime

Version 0.11

  • yearmonth type does not follow ISO 8601 format ‘YYYY-MM’ and is now generated without a dash ‘YYYYMM’

Version 0.10

  • boolean type is implemented, default values for this type are 0 for False and 1 for True

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsfaker-0.14.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

tsfaker-0.14-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file tsfaker-0.14.tar.gz.

File metadata

  • Download URL: tsfaker-0.14.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.4

File hashes

Hashes for tsfaker-0.14.tar.gz
Algorithm Hash digest
SHA256 056ecef7c0888532b9df02e5afdaf802781bad6afca6b85bcf11d18f9e578e28
MD5 edb6651f765894a5e91a41768bc00780
BLAKE2b-256 62b59d5daddb08a88ca5b9572360e5350581aad73aec0ecc874a275c5bdaa3dc

See more details on using hashes here.

File details

Details for the file tsfaker-0.14-py3-none-any.whl.

File metadata

  • Download URL: tsfaker-0.14-py3-none-any.whl
  • Upload date:
  • Size: 22.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.4

File hashes

Hashes for tsfaker-0.14-py3-none-any.whl
Algorithm Hash digest
SHA256 a083a4820c46cd408302aa62c1f22fc098c16929cb07f0e96c93706184577d0d
MD5 0129df562c620f581a93978c05b0a355
BLAKE2b-256 0b0550d57b90578d2dd65af2a7c0df6270abd1820ac3660cfeb9c01892d143de

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page