Skip to main content

A flexible test data generation toolkit

Project description

TestDataX

Build Status codecov Python Version License

This command-line interface application enables quick and customizable test data generation across various formats. It leverages Faker for realistic data fields, offers flexible schema configurations, and simplifies output to multiple database dialects or file types. Users can define precise parameters for data volume, types, and constraints for each target data set.

Requirements

  • Python 3.11+
  • Additional dependencies are handled automatically by poetry

Installation

Prerequisites

# Install Python 3.11+ if not already installed
brew install python@3.11

# Install Poetry
curl -sSL https://install.python-poetry.org | python3 -

# Verify Poetry installation
poetry --version

Install

# Clone the repository
git clone https://github.com/JamesPBrett/testdatax.git
cd testdatax

# Install dependencies
poetry install

Common Issues

  • If Poetry is not found in PATH:
    export PATH="$HOME/.local/bin:$PATH"
    

Features

  • Generate realistic test data using Data providers
  • Support for multiple output formats (CSV, JSON, SQL, etc.)
  • Customizable schema definitions
  • Configurable data generation parameters
  • CLI tool for easy data generation

Supported Formats

  • JSON
  • CSV
  • ORC
  • Parquet
  • MySQL
  • MSSQL
  • Oracle

CLI Usage

testdatax -o <output_file> -f <format> -s <schema_file> -r <num_rows> [-d]

Options:

  • -o, --output: Output file path (table_name for sql exports)
  • -f, --format: Output format (csv, json, orc, parquet, mysql, mssql, oracle)
  • -r, --rows: Number of rows to generate (default: 10)
  • -s, --schema: Path to schema file
  • -d, --debug: Enable debug output

Usage Examples

Generate 10 rows of CSV data:

testdatax -o users.csv -f csv -s schema.json -r 10

Generate 1000 rows of Parquet data with debug output:

testdatax -o large_dataset.parquet -f parquet -s users_schema.json -r 1000 -d

Generate JSON data with default row count (10):

testdatax -o data.json -f json -s schema.json

Generate ORC file with specific schema:

testdatax -o analytics.orc -f orc -s analytics_schema.json -r 100

Generate MySQL with default row count (1000), table_name as 'default':

testdatax -o default.sql -f mysql -r 1000

Generate MSSQL with default row count (1000), table_name as 'mstest':

testdatax -o mstest.sql -f mssql -r 1000

Generate Oracle with default row count (1000), table_name as 'oracle':

datagen -o oracle.sql -f oracle -r 1000

Each command consists of:

  • -o, --output: Specify the output file path and name
  • -f, --format: Output format (csv, json, orc, parquet, mysql, mssql, oracle)
  • -s, --schema: Path to your schema definition file
  • -r, --rows: Number of rows to generate (optional, defaults to 10)
  • -d, --debug: Enable debug logging (optional)

Schema Example

{
  "username": {
    "type": "string",
    "faker": "name"
  },
  "date_joined": {
    "type": "datetime"
  },
  "date": {
    "type": "date"
  },
  "age": {
    "type": "integer",
    "min": 18,
    "max": 99
  },
  "is_active": {
    "type": "boolean"
  },
  "float": {
    "type": "float"
  },
  "uuid": {
    "type": "uuid"
  },
  "status": {
    "type": "enum",
    "values": ["active", "inactive", "pending"]
  }
}

Schema Configuration

The schema file defines the structure and constraints of your generated data. Each field in the schema can have the following properties:

Basic Field Properties

  • type: (required) The data type of the field
  • nullable: (optional) Boolean to allow null values (default: false)
  • unique: (optional) Boolean to ensure unique values (default: false)

Type-Specific Properties

String Fields

{
  "username": {
    "type": "string",
    "min_length": 5,
    "max_length": 20,
    "faker": "user_name"  // Use faker to generate realistic data
  },
  "description": {
    "type": "text",
    "min_length": 100,
    "max_length": 500
  }
}

Numeric Fields

{
  "age": {
    "type": "integer",
    "min": 18,
    "max": 99
  },
  "score": {
    "type": "float",
    "min": 0.0,
    "max": 100.0,
    "precision": 2
  }
}

Date and Time Fields

{
  "created_at": {
    "type": "datetime",
    "start_date": "2020-01-01",
    "end_date": "2023-12-31"
  },
  "birth_date": {
    "type": "date",
    "format": "%Y-%m-%d"
  }
}

Enum Fields

{
  "status": {
    "type": "enum",
    "values": ["pending", "active", "suspended"],
    "weights": [0.2, 0.7, 0.1]  // Optional probability weights
  }
}

Using Faker

The generator supports Faker providers for generating realistic data:

{
  "name": {
    "type": "string",
    "faker": "name"
  },
  "email": {
    "type": "string",
    "faker": "email"
  },
  "address": {
    "type": "string",
    "faker": "address"
  },
  "company": {
    "type": "string",
    "faker": "company"
  }
}

Complete Example

{
  "user_id": {
    "type": "uuid",
    "unique": true
  },
  "username": {
    "type": "string",
    "faker": "user_name",
    "unique": true
  },
  "email": {
    "type": "string",
    "faker": "email",
    "unique": true
  },
  "age": {
    "type": "integer",
    "min": 18,
    "max": 99
  },
  "status": {
    "type": "enum",
    "values": ["active", "inactive"],
    "weights": [0.8, 0.2]
  },
  "created_at": {
    "type": "datetime",
    "start_date": "2020-01-01",
    "end_date": "2023-12-31"
  },
  "is_verified": {
    "type": "boolean",
    "nullable": true
  }
}

Supported Data Types

  • string
  • text
  • integer
  • bigint
  • float
  • decimal
  • boolean
  • date
  • datetime
  • blob
  • uuid
  • enum

Database Type Mappings

Generic Type MySQL MSSQL Oracle
string VARCHAR(255) NVARCHAR(255) VARCHAR2(255)
text TEXT NVARCHAR(MAX) CLOB
integer INT INT NUMBER(10)
bigint BIGINT BIGINT NUMBER(19)
float FLOAT FLOAT FLOAT
decimal DECIMAL(18,2) DECIMAL(18,2) NUMBER(18,2)
boolean TINYINT(1) BIT NUMBER(1)
date DATE DATE DATE
datetime DATETIME DATETIME2 TIMESTAMP
blob LONGBLOB VARBINARY(MAX) BLOB
uuid VARCHAR(36) UNIQUEIDENTIFIER VARCHAR2(36)
enum ENUM NVARCHAR(255) VARCHAR2(255)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Test change

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

testdatax-0.1.0.tar.gz (19.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

testdatax-0.1.0-py3-none-any.whl (28.0 kB view details)

Uploaded Python 3

File details

Details for the file testdatax-0.1.0.tar.gz.

File metadata

  • Download URL: testdatax-0.1.0.tar.gz
  • Upload date:
  • Size: 19.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for testdatax-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1fc485de28bc5edb84af5999b099f65e52187f1fb3bded4fe80d8e60f35aae1c
MD5 21e06e345123442dfa91d93dc05f6ebe
BLAKE2b-256 8dcb048a2103b5c664be10e5340642efe557fe38fd59d92a3b5f13244b263b49

See more details on using hashes here.

Provenance

The following attestation bundles were made for testdatax-0.1.0.tar.gz:

Publisher: publish.yml on JamesPBrett/TestDataX

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file testdatax-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: testdatax-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 28.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for testdatax-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0c12bd5185d89a98808526fb53ba001f2d978693184b28473aedf278e611da69
MD5 70ed3be10e91dbbb30cd1e47c14d4cca
BLAKE2b-256 57cb93b87091d6c27861914a928442a73b288fba5c6c42e4e41b16dbb98614fb

See more details on using hashes here.

Provenance

The following attestation bundles were made for testdatax-0.1.0-py3-none-any.whl:

Publisher: publish.yml on JamesPBrett/TestDataX

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page