Generate structured, customizable test data for API testing.

These details have not been verified by PyPI

Project description

api-test-data-generator

A tool that automatically creates realistic fake data for testing your APIs — no more writing test data by hand.

You describe what your data should look like (using a simple schema file), and this tool generates as many records as you need, ready to use in your tests.

What does it do?
Why not just use Faker directly?
How it works — the big picture
Requirements
Installation
Step 1 — Create a schema file
Step 2 — Generate data from the terminal
Step 3 — Preview your schema output instantly
Step 4 — Use it inside Python or pytest
All CLI options explained
Output formats
Schema field types — full reference
Real-world schema examples
Common recipes
Using with pytest
Reproducible data with seeds
Error messages and what they mean
Development setup
Project structure
Platform quick reference
FAQ

What does it do?

Imagine you are building a user registration API and want to test it with 1000 different users. Writing those users by hand would take hours. With this tool you just:

Describe what a user looks like (name, email, age, etc.) in a schema file
Run one command
Get a ready-to-use JSON, NDJSON, or CSV file with 1000 realistic users

Why not just use Faker directly?

Faker is a great library for generating individual fake values. But when testing an API, you need more than random values — you need structured records that match your API contract, can be exported to a file, and behave consistently across test runs. Doing that with raw Faker requires writing glue code every time.

Here is what that looks like in practice:

With raw Faker — you write this for every project:

from faker import Faker
import json, csv, random

fake = Faker()
random.seed(42)
Faker.seed(42)  # easy to forget; causes non-reproducible tests if missed

users = []
for _ in range(1000):
    include_age = random.random() < 0.8  # optional fields need manual handling
    user = {
        "user_id": str(fake.uuid4()),
        "email": fake.email(),
        "name": fake.name() if include_age else None,
        "age": random.randint(18, 60) if include_age else None,
    }
    users.append(user)

# Flatten nested objects for CSV yourself
# Collect all fieldnames across all records yourself (or columns go missing)
# Write the export boilerplate yourself
with open("users.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=users[0].keys())  # bug: misses optional fields
    writer.writeheader()
    writer.writerows(users)

With this tool — define the schema once, run one command:

api-gen generate --schema user_schema.json --count 1000 --output users.csv --format csv --seed 42

The specific things this tool handles for you:

Problem	Raw Faker	This tool
Structured multi-field records	Write a loop for every project	Defined once in a schema file
Optional fields	`if random.random() < 0.8` everywhere	Automatic — fields not in `required` appear 80% of the time
Reproducible output	Must seed both `random` and `Faker` separately	`--seed` handles both correctly
Schema validation	Write jsonschema calls yourself	Built in — validates every record by default
CSV with nested objects	Flatten and collect all fieldnames manually	Automatic dot-notation flattening
NDJSON export	Write the loop yourself	`--format ndjson`
No-code usage	Must write Python	`api-gen generate` from the terminal
Quick schema iteration	Generate, print, adjust, repeat in code	`api-gen preview --schema ...`

If you are already comfortable with Faker and only need one or two fields, use Faker directly. If you need full records that match an API contract, reproducible datasets, or file export, this tool saves the boilerplate.

How it works — the big picture

Your schema file          This tool              Output file
(what data looks like) → (generates records) → (users.json / users.ndjson / users.csv)

user_schema.json   →   api-gen generate   →   users.json

A schema file is just a description of your data. For example:

"I want records that each have a user ID (UUID format), a name, an email address, and an age between 18 and 60."

The tool reads that description and creates as many records as you ask for.

Requirements

Python 3.11 or higher
pip (comes with Python)

Check your Python version by running:

python --version

Installation

Linux / macOS

# Basic install
pip install api-test-data-generator

# Full install — includes pandas (faster CSV) and rich (prettier terminal output)
pip install "api-test-data-generator[all]"

Windows (PowerShell)

# Basic install
pip install api-test-data-generator

# Full install — use cmd /c to avoid a PowerShell bracket issue
cmd /c "pip install api-test-data-generator[all]"

Windows (Command Prompt)

pip install api-test-data-generator[all]

After installing, verify it worked:

api-gen --help

You should see a help message listing the available commands (generate and preview).

Step 1 — Create a schema file

A schema file tells the tool what your data should look like. You can write it in JSON or YAML — use whichever you prefer.

Create a file called user_schema.json:

{
  "type": "object",
  "properties": {
    "user_id":   { "type": "string", "format": "uuid" },
    "name":      { "type": "string", "faker": "name" },
    "email":     { "type": "string", "format": "email" },
    "age":       { "type": "integer", "minimum": 18, "maximum": 60 },
    "is_active": { "type": "boolean" }
  },
  "required": ["user_id", "email"]
}

What each line means:

"format": "uuid" — generate a unique ID like 550e8400-e29b-41d4-a716-446655440000
"faker": "name" — generate a realistic full name like "Sarah Johnson"
"format": "email" — generate a valid email like "sarah.johnson@example.com"
"minimum": 18, "maximum": 60 — age will always be between 18 and 60
"required": [...] — these fields will always be present in every record

Step 2 — Generate data from the terminal

Once you have a schema file, run this command to generate data.

Linux / macOS

api-gen generate \
  --schema user_schema.json \
  --count 100 \
  --output users.json

Windows (PowerShell)

api-gen generate `
  --schema user_schema.json `
  --count 100 `
  --output users.json

Windows (Command Prompt)

api-gen generate --schema user_schema.json --count 100 --output users.json

This creates a users.json file with 100 user records. Open it and you will see something like:

[
  {
    "user_id": "550e8400-e29b-41d4-a716-446655440000",
    "name": "Sarah Johnson",
    "email": "sarah.johnson@example.com",
    "age": 34,
    "is_active": true
  },
  {
    "user_id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
    "name": "Michael Torres",
    "email": "m.torres@example.org",
    "age": 27,
    "is_active": false
  }
]

Step 3 — Preview your schema output instantly

Before generating a large dataset, use api-gen preview to check that your schema produces the records you expect — without saving any files.

api-gen preview --schema user_schema.json

By default it shows 3 records. Use --count to see up to 10:

api-gen preview --schema user_schema.json --count 5

Use --seed to get the same preview output every time (useful when iterating on a schema):

api-gen preview --schema user_schema.json --count 3 --seed 42

If rich is installed (via pip install "api-test-data-generator[cli]"), the output is syntax-highlighted. Otherwise it falls back to plain JSON — either way, nothing is written to disk.

Step 4 — Use it inside Python or pytest

You can also use the tool directly in your Python code without going to the terminal.

Generate a single record

from api_test_data_generator.generator import DataGenerator

# Load your schema and create a generator
gen = DataGenerator.from_file("user_schema.json")

# Generate one record
user = gen.generate_record()
print(user)
# {'user_id': '550e8400-...', 'name': 'Sarah Johnson', 'email': 'sarah@example.com', 'age': 34}

Generate many records at once

from api_test_data_generator.generator import DataGenerator

gen = DataGenerator.from_file("user_schema.json")

# Generate 500 records
users = gen.generate_bulk(500)
print(f"Generated {len(users)} users")
print(users[0])  # print the first one

Define the schema directly in Python (no file needed)

from api_test_data_generator.generator import DataGenerator

schema = {
    "type": "object",
    "properties": {
        "order_id": {"type": "string", "format": "uuid"},
        "amount":   {"type": "number", "minimum": 1.0, "maximum": 999.99},
        "status":   {"enum": ["pending", "paid", "cancelled"]}
    },
    "required": ["order_id", "amount", "status"]
}

gen = DataGenerator.from_dict(schema)
order = gen.generate_record()
print(order)
# {'order_id': '...', 'amount': 47.83, 'status': 'paid'}

Export to a file from Python

from api_test_data_generator.generator import DataGenerator
from api_test_data_generator.exporters import export_json, export_csv, export_ndjson

gen = DataGenerator.from_file("user_schema.json", seed=42)
users = gen.generate_bulk(1000)

# Save as JSON
export_json(users, "output/users.json")

# Save as NDJSON (one record per line — great for log pipelines and streaming)
export_ndjson(users, "output/users.ndjson")

# Save as CSV
export_csv(users, "output/users.csv")

All CLI options explained

`api-gen generate`

api-gen generate [OPTIONS]

Option	What it does	Required?	Default
`--schema PATH`	Path to your schema file (.json or .yaml)	Yes	—
`--count INT`	How many records to generate	No	1
`--output PATH`	Output file path, or `-` to print to stdout	Yes	—
`--format TEXT`	File format: `json`, `ndjson`, or `csv`	No	`json`
`--seed INT`	A number to make output repeatable (same seed = same data every time)	No	Random
`--no-validate`	Skip checking the output against the schema	No	Validates by default
`--verbose`	Show detailed logs while generating	No	Off

`api-gen preview`

api-gen preview [OPTIONS]

Option	What it does	Required?	Default
`--schema PATH`	Path to your schema file (.json or .yaml)	Yes	—
`--count INT`	How many records to preview (1–10)	No	3
`--seed INT`	A number to make the preview repeatable	No	Random
`--verbose`	Show detailed logs	No	Off

Output formats

JSON (default)

Records are written as a JSON array — one file, all records.

api-gen generate --schema user_schema.json --count 100 --output users.json
# or explicitly:
api-gen generate --schema user_schema.json --count 100 --output users.json --format json

NDJSON (Newline-Delimited JSON)

Each record is written as a separate JSON object on its own line. This format is widely used for log ingestion, streaming pipelines (Kafka, Logstash, Elasticsearch bulk API), and tools that process one record at a time.

api-gen generate --schema user_schema.json --count 100 --output users.ndjson --format ndjson

Output looks like:

{"user_id": "550e8400-...", "name": "Sarah Johnson", "email": "sarah@example.com"}
{"user_id": "6ba7b810-...", "name": "Michael Torres", "email": "m.torres@example.org"}

CSV

Records are written as a comma-separated table. Nested objects are flattened to dot-notation columns (e.g. address.city, address.country). Install pandas for optimal column alignment with optional fields:

pip install "api-test-data-generator[csv]"

api-gen generate --schema user_schema.json --count 100 --output users.csv --format csv

Note: CSV does not support stdout output (--output -).

Stdout output

Use --output - to print records directly to your terminal or pipe them to another tool. Supports json and ndjson formats.

# Print JSON to terminal
api-gen generate --schema user_schema.json --count 5 --output - --format json

# Pipe NDJSON into another tool
api-gen generate --schema user_schema.json --count 1000 --output - --format ndjson | my-loader

# Pretty-print with Python's json.tool
api-gen generate --schema user_schema.json --count 3 --output - | python -m json.tool

Schema field types — full reference

Basic types

{ "type": "string" }

Generates a random text string like "XkLmpQrsT".

{ "type": "string", "minLength": 5, "maxLength": 20 }

Generates a string between 5 and 20 characters long.

{ "type": "integer", "minimum": 1, "maximum": 100 }

Generates a whole number between 1 and 100, e.g. 47.

{ "type": "number", "minimum": 0.0, "maximum": 99.99, "precision": 2 }

Generates a decimal number like 34.72. precision controls decimal places.

{ "type": "boolean" }

Generates true or false randomly.

Formatted strings

{ "type": "string", "format": "uuid" }

Generates a UUID like "550e8400-e29b-41d4-a716-446655440000".

{ "type": "string", "format": "email" }

Generates a valid email like "john.smith@example.com".

{ "type": "string", "format": "date" }

Generates a date like "2023-07-15".

{ "type": "string", "format": "date-time" }

Generates a datetime like "2023-07-15T14:32:10".

{ "type": "string", "format": "phone" }

Generates a phone number like "+1-555-867-5309".

{ "type": "object", "format": "address" }

Generates a full address as a nested object:

{
  "street": "123 Main St",
  "city": "Springfield",
  "state": "Illinois",
  "country": "United States",
  "postal_code": "62701"
}

Faker fields — use any realistic data type

The "faker" key lets you use any method from the Faker library to generate realistic data.

{ "type": "string", "faker": "name" }

→ "Sarah Johnson"

{ "type": "string", "faker": "first_name" }

→ "Sarah"

{ "type": "string", "faker": "company" }

→ "Acme Corp Ltd"

{ "type": "string", "faker": "job" }

→ "Software Engineer"

{ "type": "string", "faker": "city" }

→ "Nairobi"

{ "type": "string", "faker": "country" }

→ "Kenya"

{ "type": "string", "faker": "url" }

→ "https://www.example.com/page"

{ "type": "string", "faker": "sentence" }

→ "The quick brown fox jumps over the lazy dog."

{ "type": "string", "faker": "word" }

→ "discovery"

Tip: Browse all available faker providers at https://faker.readthedocs.io/en/master/providers.html

Note: If a "faker" method name is misspelled or does not exist, the tool logs a warning and falls back to a plain random string rather than crashing your generation run.

Enum — choose from a fixed list

{ "enum": ["active", "inactive", "pending"] }

Randomly picks one value from the list every time.

{ "enum": ["admin", "user", "guest"] }

→ "user"

Pattern — match a specific format using regex

{ "type": "string", "pattern": "[A-Z]{2}\\d{4}" }

Generates strings matching the pattern, e.g. "AB1234" (2 uppercase letters followed by 4 digits).

{ "type": "string", "pattern": "\\d{3}-\\d{2}-\\d{4}" }

→ "123-45-6789" (SSN-style format)

Arrays — lists of items

{
  "type": "array",
  "items": { "type": "string", "faker": "word" },
  "minItems": 1,
  "maxItems": 5
}

Generates a list of 1 to 5 random words, e.g. ["apple", "river", "quantum"].

{
  "type": "array",
  "items": { "type": "integer", "minimum": 1, "maximum": 100 },
  "minItems": 3,
  "maxItems": 3
}

Generates exactly 3 random integers, e.g. [42, 7, 88].

Nested objects — data inside data

{
  "type": "object",
  "properties": {
    "street": { "type": "string", "faker": "street_address" },
    "city":   { "type": "string", "faker": "city" },
    "zip":    { "type": "string", "faker": "postcode" }
  },
  "required": ["street", "city", "zip"]
}

Generates:

{
  "street": "742 Evergreen Terrace",
  "city":   "Springfield",
  "zip":    "62701"
}

Real-world schema examples

E-commerce order

Save as order_schema.json:

{
  "type": "object",
  "properties": {
    "order_id":       { "type": "string", "format": "uuid" },
    "customer_name":  { "type": "string", "faker": "name" },
    "customer_email": { "type": "string", "format": "email" },
    "status":         { "enum": ["pending", "confirmed", "shipped", "delivered", "cancelled"] },
    "total_amount":   { "type": "number", "minimum": 5.0, "maximum": 2000.0, "precision": 2 },
    "item_count":     { "type": "integer", "minimum": 1, "maximum": 20 },
    "created_at":     { "type": "string", "format": "date-time" },
    "shipping_address": {
      "type": "object",
      "properties": {
        "street":  { "type": "string", "faker": "street_address" },
        "city":    { "type": "string", "faker": "city" },
        "country": { "type": "string", "faker": "country" },
        "zip":     { "type": "string", "faker": "postcode" }
      },
      "required": ["street", "city", "country", "zip"]
    }
  },
  "required": ["order_id", "customer_email", "status", "total_amount"]
}

Generate 200 orders:

api-gen generate --schema order_schema.json --count 200 --output orders.json

Or stream them as NDJSON into a pipeline:

api-gen generate --schema order_schema.json --count 200 --output - --format ndjson | my-importer

Healthcare patient record

Save as patient_schema.yaml:

type: object
properties:
  patient_id:
    type: string
    format: uuid
  full_name:
    type: string
    faker: name
  date_of_birth:
    type: string
    format: date
  gender:
    enum:
      - male
      - female
      - other
      - prefer_not_to_say
  blood_type:
    enum:
      - A+
      - A-
      - B+
      - B-
      - AB+
      - AB-
      - O+
      - O-
  phone:
    type: string
    format: phone
  email:
    type: string
    format: email
  registered_at:
    type: string
    format: date-time
required:
  - patient_id
  - full_name
  - date_of_birth
  - blood_type

Generate 500 patient records:

# Linux / macOS
api-gen generate \
  --schema patient_schema.yaml \
  --count 500 \
  --output patients.json \
  --seed 42

# Windows PowerShell
api-gen generate `
  --schema patient_schema.yaml `
  --count 500 `
  --output patients.json `
  --seed 42

# Windows Command Prompt
api-gen generate --schema patient_schema.yaml --count 500 --output patients.json --seed 42

Product catalogue

Save as product_schema.json:

{
  "type": "object",
  "properties": {
    "product_id":   { "type": "string", "format": "uuid" },
    "name":         { "type": "string", "faker": "catch_phrase" },
    "sku":          { "type": "string", "pattern": "[A-Z]{3}-\\d{5}" },
    "category":     { "enum": ["electronics", "clothing", "food", "books", "home", "sports"] },
    "price":        { "type": "number", "minimum": 0.99, "maximum": 4999.99, "precision": 2 },
    "stock_count":  { "type": "integer", "minimum": 0, "maximum": 500 },
    "is_available": { "type": "boolean" },
    "tags": {
      "type": "array",
      "items": { "type": "string", "faker": "word" },
      "minItems": 1,
      "maxItems": 6
    },
    "created_at": { "type": "string", "format": "date-time" }
  },
  "required": ["product_id", "name", "sku", "category", "price"]
}

Common recipes

Preview your schema before generating a large dataset

api-gen preview --schema user_schema.json
api-gen preview --schema user_schema.json --count 5 --seed 1

Generate a single record to check your schema is correct

# Linux / macOS
api-gen generate --schema user_schema.json --count 1 --output test.json && cat test.json

# Windows PowerShell
api-gen generate --schema user_schema.json --count 1 --output test.json; Get-Content test.json

# Windows Command Prompt
api-gen generate --schema user_schema.json --count 1 --output test.json && type test.json

Print records directly to the terminal (no file)

api-gen generate --schema user_schema.json --count 3 --output - --format json

Pipe NDJSON records into another process

# Linux / macOS — pipe into any tool that reads one JSON object per line
api-gen generate --schema user_schema.json --count 1000 --output - --format ndjson | ./my-loader.sh

# Pretty-print with Python
api-gen generate --schema user_schema.json --count 5 --output - | python -m json.tool

Generate data without validating (faster for large datasets)

Validation checks that every generated record matches your schema. Skipping it is safe when you trust your schema and need speed.

api-gen generate --schema user_schema.json --count 50000 --output big_dataset.json --no-validate

Generate to NDJSON for Elasticsearch / Logstash bulk import

api-gen generate \
  --schema user_schema.json \
  --count 10000 \
  --output users.ndjson \
  --format ndjson \
  --seed 42

Debug why a schema is not working

api-gen generate --schema user_schema.json --count 1 --output debug.json --verbose

Using with pytest

This is one of the most powerful ways to use the package — generate fresh test data automatically inside your tests.

Basic example

# tests/test_user_api.py
import pytest
import requests
from api_test_data_generator.generator import DataGenerator


@pytest.fixture
def user_generator():
    """Create a generator once, reuse it across tests."""
    return DataGenerator.from_file("schemas/user_schema.json", seed=42)


def test_create_user(user_generator):
    """Test that the API accepts a valid user."""
    user = user_generator.generate_record()

    response = requests.post("http://localhost:8000/users", json=user)

    assert response.status_code == 201
    assert response.json()["email"] == user["email"]


def test_create_100_users(user_generator):
    """Load test: create 100 users and verify all succeed."""
    users = user_generator.generate_bulk(100)

    for user in users:
        response = requests.post("http://localhost:8000/users", json=user)
        assert response.status_code == 201

Parametrize tests with generated data

import pytest
from api_test_data_generator.generator import DataGenerator


def get_test_users(count=5):
    gen = DataGenerator.from_file("schemas/user_schema.json", seed=99)
    return gen.generate_bulk(count)


@pytest.mark.parametrize("user", get_test_users(5))
def test_user_validation(user):
    """Run the same test for 5 different users."""
    assert "@" in user["email"]
    assert len(user["user_id"]) == 36  # UUID length

Use inline schema — no file needed

from api_test_data_generator.generator import DataGenerator


def test_order_processing():
    schema = {
        "type": "object",
        "properties": {
            "order_id": {"type": "string", "format": "uuid"},
            "amount":   {"type": "number", "minimum": 1.0, "maximum": 500.0},
            "currency": {"enum": ["USD", "EUR", "GBP", "KES", "NGN"]}
        },
        "required": ["order_id", "amount", "currency"]
    }

    gen = DataGenerator.from_dict(schema, seed=1)
    orders = gen.generate_bulk(10)

    for order in orders:
        assert order["amount"] >= 1.0
        assert order["currency"] in ["USD", "EUR", "GBP", "KES", "NGN"]

Reproducible data with seeds

By default, every run generates different random data. If you need the same data every time — for example, to compare test results across runs or share data with a colleague — use the --seed option.

# These two commands produce identical output
api-gen generate --schema user_schema.json --count 10 --output users.json --seed 42
api-gen generate --schema user_schema.json --count 10 --output users.json --seed 42

In Python:

from api_test_data_generator.generator import DataGenerator

# Both generators produce the same records
gen1 = DataGenerator.from_file("user_schema.json", seed=42)
gen2 = DataGenerator.from_file("user_schema.json", seed=42)

records1 = gen1.generate_bulk(10)
records2 = gen2.generate_bulk(10)

assert records1 == records2  # Always True

Use a different seed number to get different (but still repeatable) data:

api-gen generate --schema user_schema.json --count 10 --output dataset_a.json --seed 1
api-gen generate --schema user_schema.json --count 10 --output dataset_b.json --seed 2

Error messages and what they mean

Error message	What it means	How to fix it
`Schema file not found`	The path to your schema file is wrong	Check the file path and make sure the file exists
`Failed to parse schema`	Your JSON or YAML file has a syntax error	Validate your JSON at jsonlint.com or your YAML at yamllint.com
`Unsupported schema format`	You used a file extension other than `.json`, `.yaml`, or `.yml`	Rename your file to use one of those extensions
`Record failed schema validation`	A generated record does not match your schema	Check your schema for conflicting rules (e.g. minimum > maximum)
`Cannot export an empty record list to CSV`	You tried to export 0 records	Make sure `--count` is at least 1
`No generator registered for type '...'`	You used an unsupported field type	See the field types reference for valid options
`CSV format does not support stdout output`	You used `--output -` with `--format csv`	Use a file path for CSV, or switch to `json` or `ndjson`

Development setup

Follow these steps if you want to modify the code or contribute.

1. Clone or extract the project

# Linux / macOS
cd path/to/api_test_data_generator

# Windows PowerShell / Command Prompt
cd path\to\api_test_data_generator

2. Install in editable mode with all dependencies

Linux / macOS

pip install -e ".[dev,all]"

Windows (PowerShell)

cmd /c "pip install -e .[dev,all]"

Windows (Command Prompt)

pip install -e .[dev,all]

3. Run the tests

# All platforms
pytest

Run with more detail:

pytest -v

Run a specific test file:

pytest tests/test_field_types.py -v

Run without the coverage requirement:

pytest --no-cov

4. Check code style

# All platforms
flake8 api_test_data_generator/ --max-line-length=100

5. Build the package

# All platforms
pip install build
python -m build

The built files appear in dist/:

.whl — installable wheel (use this for pip install)
.tar.gz — source distribution

Project structure

api_test_data_generator/
│
├── api_test_data_generator/     ← the actual package code
│   ├── generator/
│   │   ├── core.py              ← DataGenerator class (main entry point)
│   │   ├── schema_loader.py     ← reads .json and .yaml schema files
│   │   ├── field_types.py       ← one class per field type (string, integer, etc.)
│   │   ├── validators.py        ← checks generated data matches the schema
│   │   └── exceptions.py        ← custom error classes
│   │
│   ├── exporters/
│   │   ├── json_exporter.py     ← saves records as a JSON array
│   │   ├── ndjson_exporter.py   ← saves records as NDJSON (one object per line)
│   │   └── csv_exporter.py      ← saves records as CSV (flattens nested data)
│   │
│   ├── cli/
│   │   └── main.py              ← "api-gen generate" and "api-gen preview" commands
│   │
│   └── utils/
│       ├── seed_manager.py      ← manages the random seed globally
│       └── randomizer.py        ← helper functions for random data
│
├── tests/                       ← 169 tests, ~95% coverage
├── examples/
│   ├── user_schema.json         ← example user schema
│   └── order_schema.yaml        ← example order schema
├── pyproject.toml               ← package configuration
└── README.md

Platform quick reference

Task	Linux / macOS	Windows PowerShell	Windows CMD
Line continuation	`\`	` (backtick)	`^`
Install with extras	`".[dev,all]"`	`cmd /c "pip install -e .[dev,all]"`	`.[dev,all]`
Path separator	`/`	`\` or `/` (both work)	`\`
Run pytest	`pytest`	`pytest`	`pytest`
Run flake8	`flake8 pkg/`	`flake8 pkg/`	`flake8 pkg/`
Build package	`python -m build`	`python -m build`	`python -m build`

FAQ

Q: Why not just use Faker directly? Faker is great for generating individual values. This tool wraps Faker (and other generators) to produce structured multi-field records that match an API contract, handle optional fields automatically, validate output against your schema, and export to JSON/NDJSON/CSV — all with a single command. See the full comparison above.

Q: How is this different from just writing test data by hand? Writing data by hand is fine for 5–10 records. This tool is useful when you need hundreds or thousands of records, need them to be realistic (real-looking names, emails, UUIDs), or need the same data to be reproducible across test runs.

Q: Do I need to know JSON Schema to use this? Not deeply. The examples in this README cover the most common cases. You can copy and adapt them without knowing the full JSON Schema specification.

Q: Will the same seed always produce the same data? Yes, as long as your schema does not change and you use the same version of the tool. If you update your schema, the same seed may produce different records.

Q: My schema has a "faker" key but I am getting generic strings instead of names. Why? The faker method name is probably misspelled. Check the full list of available methods at https://faker.readthedocs.io/en/master/providers.html. When a method name is not recognised, the tool logs a warning message and falls back to a plain string so your generation run is not interrupted.

Q: What is the difference between JSON and NDJSON output? JSON produces a single array file ([{...}, {...}]) — good for loading the whole dataset at once. NDJSON writes one JSON object per line — good for streaming, log ingestion tools (Elasticsearch, Logstash, Kafka consumers), and processing large datasets without loading everything into memory.

Q: Can I stream output to another process without writing a file? Yes. Use --output - with --format json or --format ndjson. The records are printed to stdout and can be piped to any tool. CSV does not support stdout output.

Q: Can I use this with any testing framework? Yes. It returns plain Python dicts and lists, so it works with pytest, unittest, or any other framework. You can also use it to pre-generate data files and load them separately.

Q: How fast is it? Generating 10 000 records takes under 2 seconds on a standard laptop.

License

MIT — free to use, modify, and distribute.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Apr 8, 2026

0.1.1

Feb 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

api_test_data_generator-0.2.0.tar.gz (48.7 kB view details)

Uploaded Apr 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

api_test_data_generator-0.2.0-py3-none-any.whl (25.7 kB view details)

Uploaded Apr 8, 2026 Python 3

File details

Details for the file api_test_data_generator-0.2.0.tar.gz.

File metadata

Download URL: api_test_data_generator-0.2.0.tar.gz
Upload date: Apr 8, 2026
Size: 48.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for api_test_data_generator-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`eb638a364b92d7a0c3358237f2e396912468a5d39deb23ceef0b3a0c748f5131`
MD5	`e3614a16568cff8916edef8a1b920eef`
BLAKE2b-256	`c6f3ca007f326beaf040736395ecda1adccf22f3dc757a47a1dc0b495759774f`

See more details on using hashes here.

File details

Details for the file api_test_data_generator-0.2.0-py3-none-any.whl.

File metadata

Download URL: api_test_data_generator-0.2.0-py3-none-any.whl
Upload date: Apr 8, 2026
Size: 25.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for api_test_data_generator-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6f9a571c8e6921602cfd919626eaa280f2e16f74b7c41b9fc493de282222e512`
MD5	`297cfb5aa13f96189911eaefd3e5a117`
BLAKE2b-256	`3cca7e8a93680e4ca0605f358386940062afdbd8e59aeefdd23b01d64e375308`

See more details on using hashes here.

api-test-data-generator 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

api-test-data-generator

Table of Contents

What does it do?

Why not just use Faker directly?

How it works — the big picture

Requirements

Installation

Step 1 — Create a schema file

Step 2 — Generate data from the terminal

Step 3 — Preview your schema output instantly

Step 4 — Use it inside Python or pytest

Generate a single record

Generate many records at once

Define the schema directly in Python (no file needed)

Export to a file from Python

All CLI options explained

api-gen generate

api-gen preview

Output formats

JSON (default)

NDJSON (Newline-Delimited JSON)

CSV

Stdout output

Schema field types — full reference

Basic types

Formatted strings

Faker fields — use any realistic data type

Enum — choose from a fixed list

Pattern — match a specific format using regex

Arrays — lists of items

Nested objects — data inside data

Real-world schema examples

E-commerce order

Healthcare patient record

Product catalogue

Common recipes

Preview your schema before generating a large dataset

Generate a single record to check your schema is correct

Print records directly to the terminal (no file)

Pipe NDJSON records into another process

Generate data without validating (faster for large datasets)

Generate to NDJSON for Elasticsearch / Logstash bulk import

Debug why a schema is not working

Using with pytest

Basic example

Parametrize tests with generated data

Use inline schema — no file needed

Reproducible data with seeds

Error messages and what they mean

Development setup

1. Clone or extract the project

2. Install in editable mode with all dependencies

3. Run the tests

4. Check code style

5. Build the package

Project structure

Platform quick reference

FAQ

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

`api-gen generate`

`api-gen preview`