Skip to main content

No project description provided

Project description

ODSynth generates samples of synthetic data, based on the expected schema of your data. This project may be used for generating data for:

  • Seeding your ETL applications

  • Benchmarking of ETL applications

  • Producing data in various formats (json, delimited text, xml, etc) on disc

With the plugin system, developers can use their own providers, formatters and writers locally in their own applications.

Read the full documentation here

Installation

pip install odsynth

Basic Usage

Once installed you can use synth to generate synthetic data in the console or publish to generate data to a medium such as on local disc.

Using synth to generate json data

synth --schema-spec-file=../schema.yaml --format=json --num-samples=3

Using synth to generate csv data

synth --schema-spec-file=../flat_schema.yaml --format=txt --num-samples=3 --formatter-arg delimiter=comma

The delimiter may be one of ‘comma’, ‘tab’ or ‘pipe’

Using ODSynth in your own code

from odsynth.schema import Schema

def generate_data():
    num_samples=3
    batch_size=5                          # Batch size can be greater than num_samples
    format="txt"                          # Format can be json,xml,txt,pandas
    formatter_args=["delimiter=comma"]    # Depending on formatter, args may need to be provided. Default is None
    schema_spec_file="./sample_schema/flat_schema.yaml" # CSV formatter expects a tabular schema.
                                                        # XML, JSON, Pandas and Base Formatters can accept
                                                        # hierarchical data

    generator = Schema(schema_file=schema_spec_file).build_generator(
        num_examples=num_samples,
        batch_size=batch_size,
        format=format,
        formatter_args=formatter_args,
    )
    data = generator.get_data()

    # Prints generated data in csv format
    print(data)

Use ‘publish’ to load synthetic data to local disc in XML format

Publish 100 samples of schema specified in flat_schema.yaml, 10 examples per batch.

publish --schema-spec-file=../flat_schema.yaml --format=xml --writer=local_disc --writer-arg output_dir=../odsynth_out --num-samples=100 --batch-size=10

For more on the data generator and the data publisher, see the help pages for synth and publish publish --help or synth --help

For the following schema:

fields:
    parent_firstname:
        provider: first_name
    parent_lastname:
        provider: last_name
    children:
        fields:
        firstname:
            provider: first_name
        lastname:
            provider: last_name
        max_count: 5
        is_array: true
    parent_age:
        provider: random_int
        provider_args:
        min: 25
        max: 55
    parent_ssn:
        provider: ssn

The following output is expected:

{
    "parent_first_name": "Christopher", "parent_lastname": "Villegas",
    "children": [
        {"firstname": "Jason", "lastname": "Rogers"},
        {"firstname": "Andrea", "lastname": "Young"},
        {"firstname": "Michelle", "lastname": "Kaiser"}
    ],
    "parent_age": 43,
    "parent_ssn": "269-11-8507"
}

License

ODSynth is released under the MIT License. See LICENSE for details.

Credits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

odsynth-0.0.4.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

odsynth-0.0.4-py3-none-any.whl (24.5 kB view details)

Uploaded Python 3

File details

Details for the file odsynth-0.0.4.tar.gz.

File metadata

  • Download URL: odsynth-0.0.4.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for odsynth-0.0.4.tar.gz
Algorithm Hash digest
SHA256 15e62293a27b6bd31b18d19bfd0cf487f51059515ebca60e129215384a9ac90b
MD5 8eb2a60606315c5516984f6f8d3d1f02
BLAKE2b-256 d5e2a3011f0a7a3fb0ac338e99dc2609ca296fb6deea9467882db74fdc290ac4

See more details on using hashes here.

File details

Details for the file odsynth-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: odsynth-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 24.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for odsynth-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ee7427156a546530770032cd08b8ffe473dee5fb45928e2a99d1bf1b2cada37f
MD5 65abd187e30dff893b8c9d24b5ec4bd7
BLAKE2b-256 28a5c78d357ad75dd2d9ae27de5e6b928243e872335c46ae16283291e6382c7b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page