A flexible test data generation toolkit
Project description
TestDataX
This command-line interface application enables quick and customizable test data generation across various formats. It uses Mimesis for synthetic data, offers flexible schema configurations, and simplifies output to multiple database dialects or file types. Users can define precise parameters for data volume, types, and constraints for each target data set.
Requirements
- Python 3.11+
Quick Start
# Install from PyPI
pip install testdatax
# Generate sample data
testdatax --rows 1000 --format json --output data.json
Features
- Generate realistic test data with Mimesis
- Support for multiple output formats (CSV, JSON, SQL, etc.)
- Customizable schema definitions
- Configurable data generation parameters
- CLI tool for easy test data generation
Supported Formats
- JSON
- CSV
- ORC
- Parquet
- MySQL
- MSSQL
- Oracle
CLI Usage
testdatax -o <output_file> -f <format> -s <schema_file> -r <num_rows> -p <provider> [-d]
Options:
-o, --output: Output file path (table_name for sql exports)-f, --format: Output format (csv, json, orc, parquet, mysql, mssql, oracle)-r, --rows: Number of rows to generate (default: 10)-s, --schema: Path to schema file-p, --provider: Data provider; onlymimesisis supported (default: mimesis)--seed: Seed for reproducible output (optional)--null-rate: Default NULL probability (0-1) for nullable fields - default: 0.1-d, --debug: Enable debug output
Reproducibility: passing
--seedmakes generation deterministic — the same schema, row count, provider and seed produce identical output every run, which is ideal for stable test fixtures.
Usage Examples
Generate 10 rows of CSV data:
testdatax -o users.csv -f csv -s schema.json -r 10
Generate 1000 rows of Parquet data with debug output:
testdatax -o large_dataset.parquet -f parquet -s users_schema.json -r 1000 -d
Generate 1000 rows of Parquet data using Mimesis provider:
testdatax -o large_dataset.parquet -f parquet -s users_schema.json -r 1000 -p mimesis
Generate JSON data with default row count (10):
testdatax -o data.json -f json -s schema.json
Generate ORC file with specific schema:
testdatax -o analytics.orc -f orc -s analytics_schema.json -r 100
Generate MySQL with default row count (1000), table_name as 'default':
testdatax -o default.sql -f mysql -r 1000
Generate MSSQL with default row count (1000), table_name as 'mstest':
testdatax -o mstest.sql -f mssql -r 1000
Generate Oracle with default row count (1000), table_name as 'oracle':
testdatax -o oracle.sql -f oracle -r 1000
Each command consists of:
-o, --output: Specify the output file path and name-f, --format: Output format (csv, json, orc, parquet, mysql, mssql, oracle)-s, --schema: Path to your schema definition file-r, --rows: Number of rows to generate (optional, defaults to 10)-p, --provider: Data provider; onlymimesisis supported (default: mimesis)-d, --debug: Enable debug logging (optional)
Schema Example
{
"username": {
"type": "string",
"provider_field": "name"
},
"date_joined": {
"type": "datetime"
},
"date": {
"type": "date"
},
"age": {
"type": "integer",
"min": 18,
"max": 99
},
"is_active": {
"type": "boolean"
},
"float": {
"type": "float"
},
"uuid": {
"type": "uuid"
},
"status": {
"type": "enum",
"values": ["active", "inactive", "pending"]
}
}
Schema Configuration
The schema file defines the structure and constraints of your generated data. Each field in the schema can have the following properties:
Basic Field Properties
type: (required) The data type of the fieldnullable: (optional) Boolean to allow null values (default: false)unique: (optional) Boolean to ensure unique values (default: false)
Type-Specific Properties
String Fields
{
"username": {
"type": "string",
"min_length": 5,
"max_length": 20,
"provider_field": "user_name" // Use provider-specific field to generate realistic data
},
"description": {
"type": "text",
"min_length": 100,
"max_length": 500
}
}
Numeric Fields
{
"age": {
"type": "integer",
"min": 18,
"max": 99
},
"score": {
"type": "float",
"min": 0.0,
"max": 100.0,
"precision": 2
}
}
Date and Time Fields
{
"created_at": {
"type": "datetime",
"start_date": "2020-01-01",
"end_date": "2023-12-31"
},
"birth_date": {
"type": "date",
"format": "%Y-%m-%d"
}
}
Note:
start_date/end_datebound the generated range (inclusive).formatapplies astrftimepattern to date/datetime values in the CSV and JSON outputs only; the SQL, Parquet and ORC exporters keep native date types and ignoreformat.
Enum Fields
{
"status": {
"type": "enum",
"values": ["pending", "active", "suspended"],
"weights": [0.2, 0.7, 0.1] // Optional probability weights
}
}
Using Mimesis provider fields
Specify Mimesis-backed generators with provider_field:
{
"name": {
"type": "string",
"provider_field": "name"
},
"email": {
"type": "string",
"provider_field": "email"
},
"address": {
"type": "string",
"provider_field": "address"
},
"company": {
"type": "string",
"provider_field": "company"
}
}
Complete Example
{
"user_id": {
"type": "uuid",
"unique": true
},
"username": {
"type": "string",
"provider_field": "user_name",
"unique": true
},
"email": {
"type": "string",
"provider_field": "email",
"unique": true
},
"age": {
"type": "integer",
"min": 18,
"max": 99
},
"status": {
"type": "enum",
"values": ["active", "inactive"],
"weights": [0.8, 0.2]
},
"created_at": {
"type": "datetime",
"start_date": "2020-01-01",
"end_date": "2023-12-31"
},
"is_verified": {
"type": "boolean",
"nullable": true
}
}
Data provider
TestDataX generates synthetic values using Mimesis. The CLI accepts -p mimesis (default); other values are rejected.
Migration from older schemas
- Prefer the JSON key
provider_fieldfor Mimesis field names. - The legacy key
fakeris still accepted as a deprecated alias: it maps to the same string Mimesisvalue_providername (the Faker library is not used). Rename toprovider_fieldwhen updating schemas.
Supported Data Types
- string
- text
- integer
- bigint
- float
- decimal
- boolean
- date
- datetime
- blob
- uuid
- enum
Database Type Mappings
| Generic Type | MySQL | MSSQL | Oracle |
|---|---|---|---|
| string | VARCHAR(255) | NVARCHAR(255) | VARCHAR2(255) |
| text | TEXT | NVARCHAR(MAX) | CLOB |
| integer | INT | INT | NUMBER(10) |
| bigint | BIGINT | BIGINT | NUMBER(19) |
| float | FLOAT | FLOAT | FLOAT |
| decimal | DECIMAL(18,2) | DECIMAL(18,2) | NUMBER(18,2) |
| boolean | TINYINT(1) | BIT | NUMBER(1) |
| date | DATE | DATE | DATE |
| datetime | DATETIME | DATETIME2 | TIMESTAMP |
| blob | LONGBLOB | VARBINARY(MAX) | BLOB |
| uuid | VARCHAR(36) | UNIQUEIDENTIFIER | VARCHAR2(36) |
| enum | ENUM | NVARCHAR(255) | VARCHAR2(255) |
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file testdatax-0.22.2.tar.gz.
File metadata
- Download URL: testdatax-0.22.2.tar.gz
- Upload date:
- Size: 67.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86af7afda7347dae9ede3a32cc572aef6f7e53e7c66b884e2077683044167c41
|
|
| MD5 |
1d84508118edc29f56dbf204d62731e3
|
|
| BLAKE2b-256 |
76d0e35ff041304a1e63e8f544a5b98ec437c3126bc2d6893a4f09db596643b0
|
Provenance
The following attestation bundles were made for testdatax-0.22.2.tar.gz:
Publisher:
publish.yml on JamesPBrett/TestDataX
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
testdatax-0.22.2.tar.gz -
Subject digest:
86af7afda7347dae9ede3a32cc572aef6f7e53e7c66b884e2077683044167c41 - Sigstore transparency entry: 1897924669
- Sigstore integration time:
-
Permalink:
JamesPBrett/TestDataX@a48fe7f909923fb0fecb02885c60aab9f0b2bcaa -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JamesPBrett
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a48fe7f909923fb0fecb02885c60aab9f0b2bcaa -
Trigger Event:
push
-
Statement type:
File details
Details for the file testdatax-0.22.2-py3-none-any.whl.
File metadata
- Download URL: testdatax-0.22.2-py3-none-any.whl
- Upload date:
- Size: 84.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20b66196e7aa677e2c8940c22bd6abf6dab3f64bd479e829a2d6acca638e2912
|
|
| MD5 |
d177ae837b1fd1c5245146836b0504d5
|
|
| BLAKE2b-256 |
1b31a496e58dec3736e8dfa8c69729a658b3b562cb28ab0ac80ffb97c0d18e1c
|
Provenance
The following attestation bundles were made for testdatax-0.22.2-py3-none-any.whl:
Publisher:
publish.yml on JamesPBrett/TestDataX
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
testdatax-0.22.2-py3-none-any.whl -
Subject digest:
20b66196e7aa677e2c8940c22bd6abf6dab3f64bd479e829a2d6acca638e2912 - Sigstore transparency entry: 1897924704
- Sigstore integration time:
-
Permalink:
JamesPBrett/TestDataX@a48fe7f909923fb0fecb02885c60aab9f0b2bcaa -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JamesPBrett
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a48fe7f909923fb0fecb02885c60aab9f0b2bcaa -
Trigger Event:
push
-
Statement type: