Skip to main content

Generate fake data conforming to a Table Schema

Project description

Generate tabular fake data conforming to a Table Schema

Usage

Installation

$ pip3 install tsfaker

Simple usage

Generate 3 rows of fake data from a single table schema file.

$ tsfaker https://gitlab.com/healthdatahub/tsfaker/raw/master/tests/schemas/implemented_types.json  --nrows 3 --pretty
                string              number      integer        date             datetime  year yearmonth
0           QZluRNRoaJ   8524064526.189381   5603365028  1918-06-09  1963-02-25T15:27:14  1927   1968-03
1    OAXCFryYDVMWmRTnP   8084094810.096195  -9782888534  1995-06-06  1924-06-14T07:41:59  1928   1929-02
2                        -6416720321.04726  -1060427558  2006-12-11  2002-12-25T07:41:47  1999   1914-11

Advanced usage

Show help message.

$ tsfaker --help
Usage: tsfaker [OPTIONS] [SCHEMA_DESCRIPTORS]...
...

Download examples schemas from project schema-snds.

$ git clone  https://gitlab.com/healthdatahub/schema-snds && cd schema-snds

Generate fake data for all schemas in a schemas folder, and write them to fake_data folder.

$ mkdir fake_data
$ tsfaker schemas -o fake_data
2019-01-01 00:00:00 :: INFO :: Data generated from descriptor 'schemas/PMSI/PMSI MCO/T_MCOaa_nnE.json' will be written on 'fake_data/PMSI/PMSI MCO/T_MCOaa_nnE.csv'
2019-01-01 00:00:00 :: INFO :: Data generated from descriptor 'schemas/PMSI/PMSI MCO/T_MCOaa_nnFASTC.json' will be written on 'fake_data/PMSI/PMSI MCO/T_MCOaa_nnFASTC.csv'
2019-01-01 00:00:00 :: INFO :: Data generated from descriptor 'schemas/PMSI/PMSI SSR/T_SSRaa_nnE.json' will be written on 'fake_data/PMSI/PMSI SSR/T_SSRaa_nnE.csv'
...

Goals

We aim to generate fake data conforming to a schema.

We do not aim to generate realistic data with statistical information (see related work).

Implementation steps

  • Generate data conforming to types

  • Generate data conforming to formats and constraints, such as min/max, enum, missing values, unique, length, and regex

  • Generate multiple tables conforming to foreign key references, with optional tables’ data provided through csv

API

  • We want to provide both a Python API and a command line API

Development methodology

We will conform to Test Driven Development methodology, hence writing test before writing implementation.

We want generated data to be valid when using goodtables.

We could go by conforming to more and more content checks, which are included in table-schema specification.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsfaker-0.7.tar.gz (16.1 kB view hashes)

Uploaded Source

Built Distribution

tsfaker-0.7-py3-none-any.whl (22.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page