A flexible test data generation toolkit
Project description
TestDataX
This command-line interface application enables quick and customizable test data generation across various formats. It leverages Faker for realistic data fields, offers flexible schema configurations, and simplifies output to multiple database dialects or file types. Users can define precise parameters for data volume, types, and constraints for each target data set.
Requirements
- Python 3.11+
- Additional dependencies are handled automatically by poetry
Installation
Prerequisites
# Install Python 3.11+ if not already installed
brew install python@3.11
# Install Poetry
curl -sSL https://install.python-poetry.org | python3 -
# Verify Poetry installation
poetry --version
Install
# Clone the repository
git clone https://github.com/JamesPBrett/testdatax.git
cd testdatax
# Install dependencies
poetry install
Common Issues
- If Poetry is not found in PATH:
export PATH="$HOME/.local/bin:$PATH"
Features
- Generate realistic test data using Data providers
- Support for multiple output formats (CSV, JSON, SQL, etc.)
- Customizable schema definitions
- Configurable data generation parameters
- CLI tool for easy data generation
Supported Formats
- JSON
- CSV
- ORC
- Parquet
- MySQL
- MSSQL
- Oracle
CLI Usage
testdatax -o <output_file> -f <format> -s <schema_file> -r <num_rows> [-d]
Options:
-o, --output: Output file path (table_name for sql exports)-f, --format: Output format (csv, json, orc, parquet, mysql, mssql, oracle)-r, --rows: Number of rows to generate (default: 10)-s, --schema: Path to schema file-d, --debug: Enable debug output
Usage Examples
Generate 10 rows of CSV data:
testdatax -o users.csv -f csv -s schema.json -r 10
Generate 1000 rows of Parquet data with debug output:
testdatax -o large_dataset.parquet -f parquet -s users_schema.json -r 1000 -d
Generate JSON data with default row count (10):
testdatax -o data.json -f json -s schema.json
Generate ORC file with specific schema:
testdatax -o analytics.orc -f orc -s analytics_schema.json -r 100
Generate MySQL with default row count (1000), table_name as 'default':
testdatax -o default.sql -f mysql -r 1000
Generate MSSQL with default row count (1000), table_name as 'mstest':
testdatax -o mstest.sql -f mssql -r 1000
Generate Oracle with default row count (1000), table_name as 'oracle':
datagen -o oracle.sql -f oracle -r 1000
Each command consists of:
-o, --output: Specify the output file path and name-f, --format: Output format (csv, json, orc, parquet, mysql, mssql, oracle)-s, --schema: Path to your schema definition file-r, --rows: Number of rows to generate (optional, defaults to 10)-d, --debug: Enable debug logging (optional)
Schema Example
{
"username": {
"type": "string",
"faker": "name"
},
"date_joined": {
"type": "datetime"
},
"date": {
"type": "date"
},
"age": {
"type": "integer",
"min": 18,
"max": 99
},
"is_active": {
"type": "boolean"
},
"float": {
"type": "float"
},
"uuid": {
"type": "uuid"
},
"status": {
"type": "enum",
"values": ["active", "inactive", "pending"]
}
}
Schema Configuration
The schema file defines the structure and constraints of your generated data. Each field in the schema can have the following properties:
Basic Field Properties
type: (required) The data type of the fieldnullable: (optional) Boolean to allow null values (default: false)unique: (optional) Boolean to ensure unique values (default: false)
Type-Specific Properties
String Fields
{
"username": {
"type": "string",
"min_length": 5,
"max_length": 20,
"faker": "user_name" // Use faker to generate realistic data
},
"description": {
"type": "text",
"min_length": 100,
"max_length": 500
}
}
Numeric Fields
{
"age": {
"type": "integer",
"min": 18,
"max": 99
},
"score": {
"type": "float",
"min": 0.0,
"max": 100.0,
"precision": 2
}
}
Date and Time Fields
{
"created_at": {
"type": "datetime",
"start_date": "2020-01-01",
"end_date": "2023-12-31"
},
"birth_date": {
"type": "date",
"format": "%Y-%m-%d"
}
}
Enum Fields
{
"status": {
"type": "enum",
"values": ["pending", "active", "suspended"],
"weights": [0.2, 0.7, 0.1] // Optional probability weights
}
}
Using Faker
The generator supports Faker providers for generating realistic data:
{
"name": {
"type": "string",
"faker": "name"
},
"email": {
"type": "string",
"faker": "email"
},
"address": {
"type": "string",
"faker": "address"
},
"company": {
"type": "string",
"faker": "company"
}
}
Complete Example
{
"user_id": {
"type": "uuid",
"unique": true
},
"username": {
"type": "string",
"faker": "user_name",
"unique": true
},
"email": {
"type": "string",
"faker": "email",
"unique": true
},
"age": {
"type": "integer",
"min": 18,
"max": 99
},
"status": {
"type": "enum",
"values": ["active", "inactive"],
"weights": [0.8, 0.2]
},
"created_at": {
"type": "datetime",
"start_date": "2020-01-01",
"end_date": "2023-12-31"
},
"is_verified": {
"type": "boolean",
"nullable": true
}
}
Supported Data Types
- string
- text
- integer
- bigint
- float
- decimal
- boolean
- date
- datetime
- blob
- uuid
- enum
Database Type Mappings
| Generic Type | MySQL | MSSQL | Oracle |
|---|---|---|---|
| string | VARCHAR(255) | NVARCHAR(255) | VARCHAR2(255) |
| text | TEXT | NVARCHAR(MAX) | CLOB |
| integer | INT | INT | NUMBER(10) |
| bigint | BIGINT | BIGINT | NUMBER(19) |
| float | FLOAT | FLOAT | FLOAT |
| decimal | DECIMAL(18,2) | DECIMAL(18,2) | NUMBER(18,2) |
| boolean | TINYINT(1) | BIT | NUMBER(1) |
| date | DATE | DATE | DATE |
| datetime | DATETIME | DATETIME2 | TIMESTAMP |
| blob | LONGBLOB | VARBINARY(MAX) | BLOB |
| uuid | VARCHAR(36) | UNIQUEIDENTIFIER | VARCHAR2(36) |
| enum | ENUM | NVARCHAR(255) | VARCHAR2(255) |
License
This project is licensed under the MIT License - see the LICENSE file for details.
Test change
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file testdatax-0.1.0.tar.gz.
File metadata
- Download URL: testdatax-0.1.0.tar.gz
- Upload date:
- Size: 19.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fc485de28bc5edb84af5999b099f65e52187f1fb3bded4fe80d8e60f35aae1c
|
|
| MD5 |
21e06e345123442dfa91d93dc05f6ebe
|
|
| BLAKE2b-256 |
8dcb048a2103b5c664be10e5340642efe557fe38fd59d92a3b5f13244b263b49
|
Provenance
The following attestation bundles were made for testdatax-0.1.0.tar.gz:
Publisher:
publish.yml on JamesPBrett/TestDataX
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
testdatax-0.1.0.tar.gz -
Subject digest:
1fc485de28bc5edb84af5999b099f65e52187f1fb3bded4fe80d8e60f35aae1c - Sigstore transparency entry: 169558803
- Sigstore integration time:
-
Permalink:
JamesPBrett/TestDataX@f99bc1d0d825aa8cd72c47d621eb9c7c91877765 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JamesPBrett
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f99bc1d0d825aa8cd72c47d621eb9c7c91877765 -
Trigger Event:
push
-
Statement type:
File details
Details for the file testdatax-0.1.0-py3-none-any.whl.
File metadata
- Download URL: testdatax-0.1.0-py3-none-any.whl
- Upload date:
- Size: 28.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c12bd5185d89a98808526fb53ba001f2d978693184b28473aedf278e611da69
|
|
| MD5 |
70ed3be10e91dbbb30cd1e47c14d4cca
|
|
| BLAKE2b-256 |
57cb93b87091d6c27861914a928442a73b288fba5c6c42e4e41b16dbb98614fb
|
Provenance
The following attestation bundles were made for testdatax-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on JamesPBrett/TestDataX
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
testdatax-0.1.0-py3-none-any.whl -
Subject digest:
0c12bd5185d89a98808526fb53ba001f2d978693184b28473aedf278e611da69 - Sigstore transparency entry: 169558806
- Sigstore integration time:
-
Permalink:
JamesPBrett/TestDataX@f99bc1d0d825aa8cd72c47d621eb9c7c91877765 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/JamesPBrett
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f99bc1d0d825aa8cd72c47d621eb9c7c91877765 -
Trigger Event:
push
-
Statement type: