Data Generation Through Specification
Project description
Datacraft
Overview
This is a tool for making data according to specifications. The goal is to separate the structure of the data from the values that populate it. We do this by defining two core concepts: the Data Spec and the Field Spec. A Data Spec is used to define all of the fields that should be generated for a record. The Data Spec does not care about the structure that the data will populate. A single Data Spec could be used to generate JSON, XML, or a csv file. Each field in the Data Spec has its own Field Spec that defines how the values for it should be created. There are a variety of core field types that are used to generate the data for each field. Where the built-in types are not sufficient, there is an easy way to create custom types and handlers for them using Custom Code Loading. The datacraft tool supports templating using the Jinja2 templating engine format.
Data is a key part of any application. Synthetic data can be used to test and exercise a system while it is under
development or modification. By using a Data Spec to generate this synthetic data, it is more compact and easier to
modify, update, and manage. It also lends itself to sharing and reuse. Instead of hosting large data files full of
synthetic test data, you can build Data Specs that encapsulate the information needed to generate the data. If
well-designed, these can be easier to inspect and reason through compared with scanning thousands of lines of a csv
file. datacraft
makes it easy to generate millions or billions of records to use for development and testing of
new or existing systems.
Docs
Find the latest documentation and detailed usage information here: datacraft.readthedocs.io
Installation
$ pip install datacraft
$ datacraft -h # for full command line usage
Basic Usage
$ datacraft type-list # list all available field spec types ...
$ datacraft --type-help combine
INFO [05-Jun-2050 05:52:59 PM] Starting Loading Configurations...
INFO [05-Jun-2050 05:52:59 PM] Loading custom type loader: core
INFO [05-Jun-2050 05:52:59 PM] Loading custom type loader: xeger
-------------------------------------
combine | Example Spec:
{
"combine": {
"type": "combine",
"refs": ["first", "last"],
"config": {
"join_with": " "
}
},
"refs": {
"first": {
"type": "values",
"data": ["zebra", "hedgehog", "llama", "flamingo"]
},
"last": {
"type": "values",
"data": ["jones", "smith", "williams"]
}
}
}
datacraft -s spec.json -i 3 --format json -x -l off
[{"combine": "zebra jones"}, {"combine": "hedgehog smith"}, {"combine": "llama williams"}]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for datacraft-0.4.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b95d0f38a6a9237ac8002e2bf8abb02f9ba865934ee42c21465eaee5c6f6968d |
|
MD5 | 02be08ea89608f64ddf4ce9c39b1e4a7 |
|
BLAKE2b-256 | 300eb012aaa5c92f09d305766577ae8be0066b1e6a1a09fe94df65e900c54b45 |