Skip to main content

Python package: genesynth

Project description

[![Python packages](https://github.com/sterling312/genesynth/actions/workflows/github-actions-unittests.yaml/badge.svg)](https://github.com/sterling312/genesynth/actions/workflows/github-actions-unittests.yaml)

# genesynth This library is used to synthetically generate structured data based on configuration to be used for testing as well as structured data training purposes. The approach of the library is to leverage as much as C-level python packages such as numpy and scipy to generate data at field level, one type at a time, and use graph approach to piece together the complex dependency as well as de-normalization/sampling from each fields to construct data in a scalable manner fast.

# install ` pip install genesynth `

# example ` $ python -m genesynth.server --host=0.0.0.0 -p 8080 ` ` $ make run `

` $ python -m genesynth.cli -f tests/test.yaml --stdout ` ` $ make cli FILENAME=$(pwd)/tests/test.yaml `

# project status

## supported feature * load yaml as configuration file * arbitrary row size support * data type mapping with configurable parameters * JSON (semi-structured data) support * improved data type support * foreign relationship support * DOT file graph * table graph * built-in orchestrator using graph * thread and process support * intermediary data temporary cache * graph visualization * GenAI based field-level data generation (ollama & openai)

## key features to add * add yaml validator * fix header support * additional output file formats (JSON, PSQL dump, CSV with quotes, etc) * add support for JSON arrays * improve constraint support * add support for quoted string * add support statistical distribution via kernel convolution * optimize orchestration and disk cache efficiency * optimize thread/process based generation * convert serial to autoincrement constraint for integer type * convert password to constraint of string type

## nice to have features to add * support external scheduler * support NLP based text generation * support sklearn * support integration with pytorch embedding * support for object reference via $ref * fix compatibility with [JSON schema array notation](https://json-schema.org/understanding-json-schema/reference/array.html#items) * fix when json array child object appears as separate items

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genesynth-0.6.4.tar.gz (23.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genesynth-0.6.4-py3-none-any.whl (29.5 kB view details)

Uploaded Python 3

File details

Details for the file genesynth-0.6.4.tar.gz.

File metadata

  • Download URL: genesynth-0.6.4.tar.gz
  • Upload date:
  • Size: 23.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.4

File hashes

Hashes for genesynth-0.6.4.tar.gz
Algorithm Hash digest
SHA256 95a5b5244d02f0c7e4d2ca6c8a3b31da5cf2a776007331f01b70bcc9048d9d99
MD5 ffb41f71b07d9de6d8f157cf50fc064d
BLAKE2b-256 9321cb6523b1c5e2c98cc6597024c848e4993da982af535f6d65cda2d177e16f

See more details on using hashes here.

File details

Details for the file genesynth-0.6.4-py3-none-any.whl.

File metadata

  • Download URL: genesynth-0.6.4-py3-none-any.whl
  • Upload date:
  • Size: 29.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.4

File hashes

Hashes for genesynth-0.6.4-py3-none-any.whl
Algorithm Hash digest
SHA256 8a59f7184c7e5ae7bd427ebf593eae5985dc69f9762c6a34c9c329b4ad1818c3
MD5 8769416989d6c0c88717a92163faa617
BLAKE2b-256 4ed14c8820a01241a045af7e9a207c5e2068cd97fd0d9fd7b496e83133a1669d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page