Data Generation Through Specification

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Programming Language
Topic
- Software Development :: Build Tools

Project description

Datacraft

Datacraft: The Engine for Synthetic Data Generation

Datacraft is a powerful engine designed for generating customized synthetic data, with native support for JSON and other structured formats. Designed for efficiency and flexibility, it simplifies the creation and management of complex data structures for testing and development purposes. Whether you're working with JSON, XML, CSV, or database rows, Datacraft offers a streamlined approach to meet your needs.

Key Features:

Flexible Data Design: Leverage Data Spec and Field Spec paradigms to separate data values from structure, enabling greater modularity and control.
Customizability: Define custom field types with ease using our Custom Code Loading.
Jinja2 Templating: Integrate advanced templating through the Jinja2 engine for dynamic data generation.
Python API: Seamlessly integrate Datacraft into your Python workflows for direct and efficient interaction.
Command-Line Support: Generate millions or even billions of records with simple command-line operations.

Transform your approach to synthetic data generation with Datacraft. Explore its capabilities and get started today.

Overview

Datacraft is a tool for generating synthetic data. We do this by providing a JSON based domain specific language (DSL) for specifying the fields present in a record apart from what form the record takes. The goal is to separate the structure of the data from the values that populate it. We do this by defining two core concepts: the Data Spec and the Field Spec. A Data Spec is used to define all the fields that should be generated for a record. The Data Spec does not care about the structure of the records it will populate. A single Data Spec could be used to generate JSON, XML, a csv file, or rows in a Database. Each field in the Data Spec is described by a Field Spec. A Field Spec defines how the values for a field should be generated. There are a variety of built-in field types that can be used to describe the data structure and format for fields. Where the built-in types are not sufficient, there is an easy way to create custom types and handlers for them using Custom Code Loading. The datacraft tool supports templating using the Jinja2 templating engine format.

Data is a key part of any application. Synthetic data can be used to test and exercise a system while it is under development or modification. By using a Data Spec to generate this synthetic data, it is more compact and easier to modify, update, and manage. It also lends itself to sharing and reuse. Instead of hosting large data files full of synthetic test data, you can build Data Specs that encapsulate the information needed to generate the data. If well-designed, these can be easier to inspect and reason through compared with scanning thousands of lines of a csv file. datacraft makes it easy to generate millions or billions of records to use for development and testing of new or existing systems. Datacraft also has a python API so that you can generate your synthetic data as part of your test suite or application without have to use online tools or external services.

Docs

Find the latest documentation and detailed usage information here: datacraft.readthedocs.io

Installation

$ pip install datacraft

$ datacraft -h # for full command line usage

Basic Usage

Command Line

$ datacraft type-list # list all available field spec types ...

$ datacraft --type-help combine
INFO [05-Jun-2050 05:52:59 PM] Starting Loading Configurations...
INFO [05-Jun-2050 05:52:59 PM] Loading custom type loader: core
INFO [05-Jun-2050 05:52:59 PM] Loading custom type loader: xeger
-------------------------------------
combine | Example Spec:
{
  "name": {
    "type": "combine",
    "refs": ["first", "last"],
    "config": {
      "join_with": " "
    }
  },
  "refs": {
    "first": {
      "type": "values",
      "data": ["zebra", "hedgehog", "llama", "flamingo"]
    },
    "last": {
      "type": "values",
      "data": ["jones", "smith", "williams"]
    }
  }
}
datacraft -s spec.json -i 3 --format json -x -l off
[{"name": "zebra jones"}, {"name": "hedgehog smith"}, {"name": "llama williams"}]

Python API

import datacraft

spec = {
    "id": {"type": "uuid"},
    "timestamp": {"type": "date.iso.millis"},
    "handle": {"type": "cc-word", "config": { "min": 4, "max": 8, "prefix": "@" } }
}

print(*datacraft.entries(spec, 3), sep='\n')

{'id': '40bf8be1-23d2-4e93-9b8b-b37103c4b18c', 'timestamp': '2050-12-03T20:40:03.709', 'handle': '@WPNn'}
{'id': '3bb5789e-10d1-4ae3-ae61-e0682dad8ecf', 'timestamp': '2050-11-20T02:57:48.131', 'handle': '@kl1KUdtT'}
{'id': '474a439a-8582-46a2-84d6-58bfbfa10bca', 'timestamp': '2050-11-29T18:08:44.971', 'handle': '@XDvquPI'}

Type Help

import datacraft

# List all registered types:
datacraft.registered_types()
['calculate', 'char_class', 'cc-ascii', 'cc-lower', '...', 'uuid', 'values', 'replace', 'regex_replace']

# Print API usage for a specific type or types
print(datacraft.type_usage('char_class', 'replace', '...'))
# Example Output
"""
-------------------------------------
replace | API Example:

import datacraft

spec = {
 "field": {
   "type": "values",
   "data": ["foo", "bar", "baz"]
 },
 "replacement": {
   "type": "replace",
   "data": {"ba": "fi"},
   "ref": "field"
 }
}

print(*datacraft.entries(spec, 3), sep='\n')

{'field': 'foo', 'replacement': 'foo'}
{'field': 'bar', 'replacement': 'fir'}
{'field': 'baz', 'replacement': 'fiz'}
"""

For more detailed documentation please see: datacraft.readthedocs.io

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Programming Language
Topic
- Software Development :: Build Tools

Release history Release notifications | RSS feed

This version

0.12.1

Aug 24, 2025

0.12.0

Jun 19, 2025

0.11.1

Jan 24, 2025

0.11.0

Jan 20, 2025

0.10.2

Sep 5, 2024

0.10.1

Aug 10, 2024

0.10.0

Jul 5, 2024

0.9.0

Nov 14, 2023

0.8.1

Oct 17, 2023

0.8.0

Oct 11, 2023

0.7.3

Jul 20, 2023

0.7.2

Jul 20, 2023

0.7.1

Dec 4, 2022

0.7.0

Nov 23, 2022

0.6.0

Jul 13, 2022

0.4.0

Jun 5, 2022

0.3.2

May 19, 2022

0.3.1

May 19, 2022

0.3.0

May 16, 2022

0.2.2

Apr 4, 2022

0.2.1

Jan 30, 2022

0.2.0

Jan 23, 2022

0.1.0

Dec 11, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datacraft-0.12.1-py3-none-any.whl (129.6 kB view details)

Uploaded Aug 24, 2025 Python 3

File details

Details for the file datacraft-0.12.1-py3-none-any.whl.

File metadata

Download URL: datacraft-0.12.1-py3-none-any.whl
Upload date: Aug 24, 2025
Size: 129.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.5

File hashes

Hashes for datacraft-0.12.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`59933679c9e36811d37b5c616b3e6fc0071e929bee08596255b7e2ba288b2b1d`
MD5	`965235f332227e5d7c337f8fcb41da08`
BLAKE2b-256	`fe93681e83f7b4fb38ea387cc65f99be8636e9eb84000dff119aeb1f9162e194`

See more details on using hashes here.

datacraft 0.12.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Datacraft

Datacraft: The Engine for Synthetic Data Generation

Overview

Docs

Installation

Basic Usage

Command Line

Python API

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes