A flexible test data generation toolkit

These details have not been verified by PyPI

Project description

TestDataX

Build Status Python Version License

This command-line interface application enables quick and customizable test data generation across various formats. It leverages Faker for realistic data fields, offers flexible schema configurations, and simplifies output to multiple database dialects or file types. Users can define precise parameters for data volume, types, and constraints for each target data set.

Requirements

Python 3.11+

Quick Start

# Install from PyPI
pip install testdatax

# Generate sample data
testdatax --rows 1000 --format json --output data.json


## Features

- Generate realistic test data using Data providers
- Support for multiple output formats (CSV, JSON, SQL, etc.)
- Customizable schema definitions
- Configurable data generation parameters
- CLI tool for easy test data generation

## Supported Formats

- JSON
- CSV
- ORC
- Parquet
- MySQL
- MSSQL
- Oracle

## CLI Usage
```bash
testdatax -o <output_file> -f <format> -s <schema_file> -r <num_rows> [-d]

Options:

-o, --output: Output file path (table_name for sql exports)
-f, --format: Output format (csv, json, orc, parquet, mysql, mssql, oracle)
-r, --rows: Number of rows to generate (default: 10)
-s, --schema: Path to schema file
-d, --debug: Enable debug output

Usage Examples

Generate 10 rows of CSV data:

testdatax -o users.csv -f csv -s schema.json -r 10

Generate 1000 rows of Parquet data with debug output:

testdatax -o large_dataset.parquet -f parquet -s users_schema.json -r 1000 -d

Generate JSON data with default row count (10):

testdatax -o data.json -f json -s schema.json

Generate ORC file with specific schema:

testdatax -o analytics.orc -f orc -s analytics_schema.json -r 100

Generate MySQL with default row count (1000), table_name as 'default':

testdatax -o default.sql -f mysql -r 1000

Generate MSSQL with default row count (1000), table_name as 'mstest':

testdatax -o mstest.sql -f mssql -r 1000

Generate Oracle with default row count (1000), table_name as 'oracle':

datagen -o oracle.sql -f oracle -r 1000

Each command consists of:

-o, --output: Specify the output file path and name
-f, --format: Output format (csv, json, orc, parquet, mysql, mssql, oracle)
-s, --schema: Path to your schema definition file
-r, --rows: Number of rows to generate (optional, defaults to 10)
-d, --debug: Enable debug logging (optional)

Schema Example

{
  "username": {
    "type": "string",
    "faker": "name"
  },
  "date_joined": {
    "type": "datetime"
  },
  "date": {
    "type": "date"
  },
  "age": {
    "type": "integer",
    "min": 18,
    "max": 99
  },
  "is_active": {
    "type": "boolean"
  },
  "float": {
    "type": "float"
  },
  "uuid": {
    "type": "uuid"
  },
  "status": {
    "type": "enum",
    "values": ["active", "inactive", "pending"]
  }
}

Schema Configuration

The schema file defines the structure and constraints of your generated data. Each field in the schema can have the following properties:

Basic Field Properties

type: (required) The data type of the field
nullable: (optional) Boolean to allow null values (default: false)
unique: (optional) Boolean to ensure unique values (default: false)

Type-Specific Properties

String Fields

{
  "username": {
    "type": "string",
    "min_length": 5,
    "max_length": 20,
    "faker": "user_name"  // Use faker to generate realistic data
  },
  "description": {
    "type": "text",
    "min_length": 100,
    "max_length": 500
  }
}

Numeric Fields

{
  "age": {
    "type": "integer",
    "min": 18,
    "max": 99
  },
  "score": {
    "type": "float",
    "min": 0.0,
    "max": 100.0,
    "precision": 2
  }
}

Date and Time Fields

{
  "created_at": {
    "type": "datetime",
    "start_date": "2020-01-01",
    "end_date": "2023-12-31"
  },
  "birth_date": {
    "type": "date",
    "format": "%Y-%m-%d"
  }
}

Enum Fields

{
  "status": {
    "type": "enum",
    "values": ["pending", "active", "suspended"],
    "weights": [0.2, 0.7, 0.1]  // Optional probability weights
  }
}

Using Faker

The generator supports Faker providers for generating realistic data:

{
  "name": {
    "type": "string",
    "faker": "name"
  },
  "email": {
    "type": "string",
    "faker": "email"
  },
  "address": {
    "type": "string",
    "faker": "address"
  },
  "company": {
    "type": "string",
    "faker": "company"
  }
}

Complete Example

{
  "user_id": {
    "type": "uuid",
    "unique": true
  },
  "username": {
    "type": "string",
    "faker": "user_name",
    "unique": true
  },
  "email": {
    "type": "string",
    "faker": "email",
    "unique": true
  },
  "age": {
    "type": "integer",
    "min": 18,
    "max": 99
  },
  "status": {
    "type": "enum",
    "values": ["active", "inactive"],
    "weights": [0.8, 0.2]
  },
  "created_at": {
    "type": "datetime",
    "start_date": "2020-01-01",
    "end_date": "2023-12-31"
  },
  "is_verified": {
    "type": "boolean",
    "nullable": true
  }
}

Supported Data Types

string
text
integer
bigint
float
decimal
boolean
date
datetime
blob
uuid
enum

Database Type Mappings

Generic Type	MySQL	MSSQL	Oracle
string	VARCHAR(255)	NVARCHAR(255)	VARCHAR2(255)
text	TEXT	NVARCHAR(MAX)	CLOB
integer	INT	INT	NUMBER(10)
bigint	BIGINT	BIGINT	NUMBER(19)
float	FLOAT	FLOAT	FLOAT
decimal	DECIMAL(18,2)	DECIMAL(18,2)	NUMBER(18,2)
boolean	TINYINT(1)	BIT	NUMBER(1)
date	DATE	DATE	DATE
datetime	DATETIME	DATETIME2	TIMESTAMP
blob	LONGBLOB	VARBINARY(MAX)	BLOB
uuid	VARCHAR(36)	UNIQUEIDENTIFIER	VARCHAR2(36)
enum	ENUM	NVARCHAR(255)	VARCHAR2(255)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Feb 8, 2025

0.1.0

Feb 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

testdatax-0.1.1.tar.gz (19.1 kB view details)

Uploaded Feb 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

testdatax-0.1.1-py3-none-any.whl (27.8 kB view details)

Uploaded Feb 8, 2025 Python 3

File details

Details for the file testdatax-0.1.1.tar.gz.

File metadata

Download URL: testdatax-0.1.1.tar.gz
Upload date: Feb 8, 2025
Size: 19.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for testdatax-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`15526aaee58760bb23f6593d46beb3724f3a04e8a44b07bc84939209b4411a07`
MD5	`38eac1a7b317cea11f3356cf76f38e8a`
BLAKE2b-256	`8a65aac014b815ccc84c0d1798b606c7bbfb9108ac72417cc59d6cbf0d9d1a6e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for testdatax-0.1.1.tar.gz:

Publisher: publish.yml on JamesPBrett/TestDataX

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: testdatax-0.1.1.tar.gz
- Subject digest: 15526aaee58760bb23f6593d46beb3724f3a04e8a44b07bc84939209b4411a07
- Sigstore transparency entry: 169743509
- Sigstore integration time: Feb 8, 2025
Source repository:
- Permalink: JamesPBrett/TestDataX@b93563f4261bacf4805d6805e64bbb5046d347ad
- Branch / Tag: refs/heads/main
- Owner: https://github.com/JamesPBrett
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b93563f4261bacf4805d6805e64bbb5046d347ad
- Trigger Event: push

File details

Details for the file testdatax-0.1.1-py3-none-any.whl.

File metadata

Download URL: testdatax-0.1.1-py3-none-any.whl
Upload date: Feb 8, 2025
Size: 27.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for testdatax-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5104990a860c9aaa4104aba1812e991c1d1f9b1e55493ad6335d2006b12fc094`
MD5	`3f4a6f0430cda4adc3f1d74543f94626`
BLAKE2b-256	`430cd6abbc06aad10fbdddfea5ad0b724b403f492de9481abf0136937b9ebca9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for testdatax-0.1.1-py3-none-any.whl:

Publisher: publish.yml on JamesPBrett/TestDataX

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: testdatax-0.1.1-py3-none-any.whl
- Subject digest: 5104990a860c9aaa4104aba1812e991c1d1f9b1e55493ad6335d2006b12fc094
- Sigstore transparency entry: 169743510
- Sigstore integration time: Feb 8, 2025
Source repository:
- Permalink: JamesPBrett/TestDataX@b93563f4261bacf4805d6805e64bbb5046d347ad
- Branch / Tag: refs/heads/main
- Owner: https://github.com/JamesPBrett
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b93563f4261bacf4805d6805e64bbb5046d347ad
- Trigger Event: push

TestDataX 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

TestDataX

TestDataX

Requirements

Quick Start

Usage Examples

Schema Example

Schema Configuration

Basic Field Properties

Type-Specific Properties

String Fields

Numeric Fields

Date and Time Fields

Enum Fields

Using Faker

Complete Example

Supported Data Types

Database Type Mappings

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance