No project description provided
Project description
ODSynth generates samples of synthetic data, based on the expected schema of your data. This project may be used for generating data for:
Seeding your ETL applications
Benchmarking of ETL applications
Producing data in various formats (json, delimited text, xml, etc) on disc
With the plugin system, developers can use their own providers, formatters and writers locally in their own applications.
Read the full documentation here
Installation
pip install odsynth
Basic Usage
Once installed you can use synth to generate synthetic data in the console or publish to generate data to a medium such as on local disc.
Using synth to generate json data
synth --schema-spec-file=../schema.yaml --format=json --num-samples=3
Using synth to generate csv data
synth --schema-spec-file=../flat_schema.yaml --format=txt --num-samples=3 --formatter-arg delimiter=comma
The delimiter may be one of ‘comma’, ‘tab’ or ‘pipe’
Using ODSynth in your own code
from odsynth.schema import Schema def generate_data(): num_samples=3 batch_size=5 # Batch size can be greater than num_samples format="txt" # Format can be json,xml,txt,pandas formatter_args=["delimiter=comma"] # Depending on formatter, args may need to be provided. Default is None schema_spec_file="./sample_schema/flat_schema.yaml" # CSV formatter expects a tabular schema. # XML, JSON, Pandas and Base Formatters can accept # hierarchical data generator = Schema(schema_file=schema_spec_file).build_generator( num_examples=num_samples, batch_size=batch_size, format=format, formatter_args=formatter_args, ) data = generator.get_data() # Prints generated data in csv format print(data)
Use ‘publish’ to load synthetic data to local disc in XML format
Publish 100 samples of schema specified in flat_schema.yaml, 10 examples per batch.
publish --schema-spec-file=../flat_schema.yaml --format=xml --writer=local_disc --writer-arg output_dir=../odsynth_out --num-samples=100 --batch-size=10
For more on the data generator and the data publisher, see the help pages for synth and publish publish --help or synth --help
For the following schema:
fields: parent_firstname: provider: first_name parent_lastname: provider: last_name children: fields: firstname: provider: first_name lastname: provider: last_name max_count: 5 is_array: true parent_age: provider: random_int provider_args: min: 25 max: 55 parent_ssn: provider: ssn
The following output is expected:
{ "parent_first_name": "Christopher", "parent_lastname": "Villegas", "children": [ {"firstname": "Jason", "lastname": "Rogers"}, {"firstname": "Andrea", "lastname": "Young"}, {"firstname": "Michelle", "lastname": "Kaiser"} ], "parent_age": 43, "parent_ssn": "269-11-8507" }
License
ODSynth is released under the MIT License. See LICENSE for details.
Credits
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file odsynth-0.0.4.tar.gz
.
File metadata
- Download URL: odsynth-0.0.4.tar.gz
- Upload date:
- Size: 16.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 15e62293a27b6bd31b18d19bfd0cf487f51059515ebca60e129215384a9ac90b |
|
MD5 | 8eb2a60606315c5516984f6f8d3d1f02 |
|
BLAKE2b-256 | d5e2a3011f0a7a3fb0ac338e99dc2609ca296fb6deea9467882db74fdc290ac4 |
File details
Details for the file odsynth-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: odsynth-0.0.4-py3-none-any.whl
- Upload date:
- Size: 24.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee7427156a546530770032cd08b8ffe473dee5fb45928e2a99d1bf1b2cada37f |
|
MD5 | 65abd187e30dff893b8c9d24b5ec4bd7 |
|
BLAKE2b-256 | 28a5c78d357ad75dd2d9ae27de5e6b928243e872335c46ae16283291e6382c7b |