Skip to main content

Generate structured, customizable test data for API testing.

Project description

api-test-data-generator

CI

A tool that automatically creates realistic fake data for testing your APIs — no more writing test data by hand.

You describe what your data should look like (using a simple schema file), and this tool generates as many records as you need, ready to use in your tests.


Table of Contents


What does it do?

Imagine you are building a user registration API and want to test it with 1000 different users. Writing those users by hand would take hours. With this tool you just:

  1. Describe what a user looks like (name, email, age, etc.) in a schema file
  2. Run one command
  3. Get a ready-to-use JSON, NDJSON, or CSV file with 1000 realistic users

Why not just use Faker directly?

Faker is a great library for generating individual fake values. But when testing an API, you need more than random values — you need structured records that match your API contract, can be exported to a file, and behave consistently across test runs. Doing that with raw Faker requires writing glue code every time.

Here is what that looks like in practice:

With raw Faker — you write this for every project:

from faker import Faker
import json, csv, random

fake = Faker()
random.seed(42)
Faker.seed(42)  # easy to forget; causes non-reproducible tests if missed

users = []
for _ in range(1000):
    include_age = random.random() < 0.8  # optional fields need manual handling
    user = {
        "user_id": str(fake.uuid4()),
        "email": fake.email(),
        "name": fake.name() if include_age else None,
        "age": random.randint(18, 60) if include_age else None,
    }
    users.append(user)

# Flatten nested objects for CSV yourself
# Collect all fieldnames across all records yourself (or columns go missing)
# Write the export boilerplate yourself
with open("users.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=users[0].keys())  # bug: misses optional fields
    writer.writeheader()
    writer.writerows(users)

With this tool — define the schema once, run one command:

api-gen generate --schema user_schema.json --count 1000 --output users.csv --format csv --seed 42

The specific things this tool handles for you:

Problem Raw Faker This tool
Structured multi-field records Write a loop for every project Defined once in a schema file
Optional fields if random.random() < 0.8 everywhere Automatic — fields not in required appear 80% of the time
Reproducible output Must seed both random and Faker separately --seed handles both correctly
Schema validation Write jsonschema calls yourself Built in — validates every record by default
CSV with nested objects Flatten and collect all fieldnames manually Automatic dot-notation flattening
NDJSON export Write the loop yourself --format ndjson
No-code usage Must write Python api-gen generate from the terminal
Quick schema iteration Generate, print, adjust, repeat in code api-gen preview --schema ...

If you are already comfortable with Faker and only need one or two fields, use Faker directly. If you need full records that match an API contract, reproducible datasets, or file export, this tool saves the boilerplate.


How it works — the big picture

Your schema file          This tool              Output file
(what data looks like) → (generates records) → (users.json / users.ndjson / users.csv)

user_schema.json   →   api-gen generate   →   users.json

A schema file is just a description of your data. For example:

"I want records that each have a user ID (UUID format), a name, an email address, and an age between 18 and 60."

The tool reads that description and creates as many records as you ask for.


Requirements

  • Python 3.11 or higher
  • pip (comes with Python)

Check your Python version by running:

python --version

Installation

Linux / macOS

# Basic install
pip install api-test-data-generator

# Full install — includes pandas (faster CSV) and rich (prettier terminal output)
pip install "api-test-data-generator[all]"

Windows (PowerShell)

# Basic install
pip install api-test-data-generator

# Full install — use cmd /c to avoid a PowerShell bracket issue
cmd /c "pip install api-test-data-generator[all]"

Windows (Command Prompt)

pip install api-test-data-generator[all]

After installing, verify it worked:

api-gen --help

You should see a help message listing the available commands (generate and preview).


Step 1 — Create a schema file

A schema file tells the tool what your data should look like. You can write it in JSON or YAML — use whichever you prefer.

Create a file called user_schema.json:

{
  "type": "object",
  "properties": {
    "user_id":   { "type": "string", "format": "uuid" },
    "name":      { "type": "string", "faker": "name" },
    "email":     { "type": "string", "format": "email" },
    "age":       { "type": "integer", "minimum": 18, "maximum": 60 },
    "is_active": { "type": "boolean" }
  },
  "required": ["user_id", "email"]
}

What each line means:

  • "format": "uuid" — generate a unique ID like 550e8400-e29b-41d4-a716-446655440000
  • "faker": "name" — generate a realistic full name like "Sarah Johnson"
  • "format": "email" — generate a valid email like "sarah.johnson@example.com"
  • "minimum": 18, "maximum": 60 — age will always be between 18 and 60
  • "required": [...] — these fields will always be present in every record

Step 2 — Generate data from the terminal

Once you have a schema file, run this command to generate data.

Linux / macOS

api-gen generate \
  --schema user_schema.json \
  --count 100 \
  --output users.json

Windows (PowerShell)

api-gen generate `
  --schema user_schema.json `
  --count 100 `
  --output users.json

Windows (Command Prompt)

api-gen generate --schema user_schema.json --count 100 --output users.json

This creates a users.json file with 100 user records. Open it and you will see something like:

[
  {
    "user_id": "550e8400-e29b-41d4-a716-446655440000",
    "name": "Sarah Johnson",
    "email": "sarah.johnson@example.com",
    "age": 34,
    "is_active": true
  },
  {
    "user_id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
    "name": "Michael Torres",
    "email": "m.torres@example.org",
    "age": 27,
    "is_active": false
  }
]

Step 3 — Preview your schema output instantly

Before generating a large dataset, use api-gen preview to check that your schema produces the records you expect — without saving any files.

api-gen preview --schema user_schema.json

By default it shows 3 records. Use --count to see up to 10:

api-gen preview --schema user_schema.json --count 5

Use --seed to get the same preview output every time (useful when iterating on a schema):

api-gen preview --schema user_schema.json --count 3 --seed 42

If rich is installed (via pip install "api-test-data-generator[cli]"), the output is syntax-highlighted. Otherwise it falls back to plain JSON — either way, nothing is written to disk.


Step 4 — Use it inside Python or pytest

You can also use the tool directly in your Python code without going to the terminal.

Generate a single record

from api_test_data_generator.generator import DataGenerator

# Load your schema and create a generator
gen = DataGenerator.from_file("user_schema.json")

# Generate one record
user = gen.generate_record()
print(user)
# {'user_id': '550e8400-...', 'name': 'Sarah Johnson', 'email': 'sarah@example.com', 'age': 34}

Generate many records at once

from api_test_data_generator.generator import DataGenerator

gen = DataGenerator.from_file("user_schema.json")

# Generate 500 records
users = gen.generate_bulk(500)
print(f"Generated {len(users)} users")
print(users[0])  # print the first one

Define the schema directly in Python (no file needed)

from api_test_data_generator.generator import DataGenerator

schema = {
    "type": "object",
    "properties": {
        "order_id": {"type": "string", "format": "uuid"},
        "amount":   {"type": "number", "minimum": 1.0, "maximum": 999.99},
        "status":   {"enum": ["pending", "paid", "cancelled"]}
    },
    "required": ["order_id", "amount", "status"]
}

gen = DataGenerator.from_dict(schema)
order = gen.generate_record()
print(order)
# {'order_id': '...', 'amount': 47.83, 'status': 'paid'}

Export to a file from Python

from api_test_data_generator.generator import DataGenerator
from api_test_data_generator.exporters import export_json, export_csv, export_ndjson

gen = DataGenerator.from_file("user_schema.json", seed=42)
users = gen.generate_bulk(1000)

# Save as JSON
export_json(users, "output/users.json")

# Save as NDJSON (one record per line — great for log pipelines and streaming)
export_ndjson(users, "output/users.ndjson")

# Save as CSV
export_csv(users, "output/users.csv")

All CLI options explained

api-gen generate

api-gen generate [OPTIONS]
Option What it does Required? Default
--schema PATH Path to your schema file (.json or .yaml) Yes
--count INT How many records to generate No 1
--output PATH Output file path, or - to print to stdout Yes
--format TEXT File format: json, ndjson, or csv No json
--seed INT A number to make output repeatable (same seed = same data every time) No Random
--no-validate Skip checking the output against the schema No Validates by default
--verbose Show detailed logs while generating No Off

api-gen preview

api-gen preview [OPTIONS]
Option What it does Required? Default
--schema PATH Path to your schema file (.json or .yaml) Yes
--count INT How many records to preview (1–10) No 3
--seed INT A number to make the preview repeatable No Random
--verbose Show detailed logs No Off

Output formats

JSON (default)

Records are written as a JSON array — one file, all records.

api-gen generate --schema user_schema.json --count 100 --output users.json
# or explicitly:
api-gen generate --schema user_schema.json --count 100 --output users.json --format json

NDJSON (Newline-Delimited JSON)

Each record is written as a separate JSON object on its own line. This format is widely used for log ingestion, streaming pipelines (Kafka, Logstash, Elasticsearch bulk API), and tools that process one record at a time.

api-gen generate --schema user_schema.json --count 100 --output users.ndjson --format ndjson

Output looks like:

{"user_id": "550e8400-...", "name": "Sarah Johnson", "email": "sarah@example.com"}
{"user_id": "6ba7b810-...", "name": "Michael Torres", "email": "m.torres@example.org"}

CSV

Records are written as a comma-separated table. Nested objects are flattened to dot-notation columns (e.g. address.city, address.country). Install pandas for optimal column alignment with optional fields:

pip install "api-test-data-generator[csv]"
api-gen generate --schema user_schema.json --count 100 --output users.csv --format csv

Note: CSV does not support stdout output (--output -).

Stdout output

Use --output - to print records directly to your terminal or pipe them to another tool. Supports json and ndjson formats.

# Print JSON to terminal
api-gen generate --schema user_schema.json --count 5 --output - --format json

# Pipe NDJSON into another tool
api-gen generate --schema user_schema.json --count 1000 --output - --format ndjson | my-loader

# Pretty-print with Python's json.tool
api-gen generate --schema user_schema.json --count 3 --output - | python -m json.tool

Schema field types — full reference

Basic types

{ "type": "string" }

Generates a random text string like "XkLmpQrsT".

{ "type": "string", "minLength": 5, "maxLength": 20 }

Generates a string between 5 and 20 characters long.

{ "type": "integer", "minimum": 1, "maximum": 100 }

Generates a whole number between 1 and 100, e.g. 47.

{ "type": "number", "minimum": 0.0, "maximum": 99.99, "precision": 2 }

Generates a decimal number like 34.72. precision controls decimal places.

{ "type": "boolean" }

Generates true or false randomly.


Formatted strings

{ "type": "string", "format": "uuid" }

Generates a UUID like "550e8400-e29b-41d4-a716-446655440000".

{ "type": "string", "format": "email" }

Generates a valid email like "john.smith@example.com".

{ "type": "string", "format": "date" }

Generates a date like "2023-07-15".

{ "type": "string", "format": "date-time" }

Generates a datetime like "2023-07-15T14:32:10".

{ "type": "string", "format": "phone" }

Generates a phone number like "+1-555-867-5309".

{ "type": "object", "format": "address" }

Generates a full address as a nested object:

{
  "street": "123 Main St",
  "city": "Springfield",
  "state": "Illinois",
  "country": "United States",
  "postal_code": "62701"
}

Faker fields — use any realistic data type

The "faker" key lets you use any method from the Faker library to generate realistic data.

{ "type": "string", "faker": "name" }

"Sarah Johnson"

{ "type": "string", "faker": "first_name" }

"Sarah"

{ "type": "string", "faker": "company" }

"Acme Corp Ltd"

{ "type": "string", "faker": "job" }

"Software Engineer"

{ "type": "string", "faker": "city" }

"Nairobi"

{ "type": "string", "faker": "country" }

"Kenya"

{ "type": "string", "faker": "url" }

"https://www.example.com/page"

{ "type": "string", "faker": "sentence" }

"The quick brown fox jumps over the lazy dog."

{ "type": "string", "faker": "word" }

"discovery"

Tip: Browse all available faker providers at https://faker.readthedocs.io/en/master/providers.html

Note: If a "faker" method name is misspelled or does not exist, the tool logs a warning and falls back to a plain random string rather than crashing your generation run.


Enum — choose from a fixed list

{ "enum": ["active", "inactive", "pending"] }

Randomly picks one value from the list every time.

{ "enum": ["admin", "user", "guest"] }

"user"


Pattern — match a specific format using regex

{ "type": "string", "pattern": "[A-Z]{2}\\d{4}" }

Generates strings matching the pattern, e.g. "AB1234" (2 uppercase letters followed by 4 digits).

{ "type": "string", "pattern": "\\d{3}-\\d{2}-\\d{4}" }

"123-45-6789" (SSN-style format)


Arrays — lists of items

{
  "type": "array",
  "items": { "type": "string", "faker": "word" },
  "minItems": 1,
  "maxItems": 5
}

Generates a list of 1 to 5 random words, e.g. ["apple", "river", "quantum"].

{
  "type": "array",
  "items": { "type": "integer", "minimum": 1, "maximum": 100 },
  "minItems": 3,
  "maxItems": 3
}

Generates exactly 3 random integers, e.g. [42, 7, 88].


Nested objects — data inside data

{
  "type": "object",
  "properties": {
    "street": { "type": "string", "faker": "street_address" },
    "city":   { "type": "string", "faker": "city" },
    "zip":    { "type": "string", "faker": "postcode" }
  },
  "required": ["street", "city", "zip"]
}

Generates:

{
  "street": "742 Evergreen Terrace",
  "city":   "Springfield",
  "zip":    "62701"
}

Real-world schema examples

E-commerce order

Save as order_schema.json:

{
  "type": "object",
  "properties": {
    "order_id":       { "type": "string", "format": "uuid" },
    "customer_name":  { "type": "string", "faker": "name" },
    "customer_email": { "type": "string", "format": "email" },
    "status":         { "enum": ["pending", "confirmed", "shipped", "delivered", "cancelled"] },
    "total_amount":   { "type": "number", "minimum": 5.0, "maximum": 2000.0, "precision": 2 },
    "item_count":     { "type": "integer", "minimum": 1, "maximum": 20 },
    "created_at":     { "type": "string", "format": "date-time" },
    "shipping_address": {
      "type": "object",
      "properties": {
        "street":  { "type": "string", "faker": "street_address" },
        "city":    { "type": "string", "faker": "city" },
        "country": { "type": "string", "faker": "country" },
        "zip":     { "type": "string", "faker": "postcode" }
      },
      "required": ["street", "city", "country", "zip"]
    }
  },
  "required": ["order_id", "customer_email", "status", "total_amount"]
}

Generate 200 orders:

api-gen generate --schema order_schema.json --count 200 --output orders.json

Or stream them as NDJSON into a pipeline:

api-gen generate --schema order_schema.json --count 200 --output - --format ndjson | my-importer

Healthcare patient record

Save as patient_schema.yaml:

type: object
properties:
  patient_id:
    type: string
    format: uuid
  full_name:
    type: string
    faker: name
  date_of_birth:
    type: string
    format: date
  gender:
    enum:
      - male
      - female
      - other
      - prefer_not_to_say
  blood_type:
    enum:
      - A+
      - A-
      - B+
      - B-
      - AB+
      - AB-
      - O+
      - O-
  phone:
    type: string
    format: phone
  email:
    type: string
    format: email
  registered_at:
    type: string
    format: date-time
required:
  - patient_id
  - full_name
  - date_of_birth
  - blood_type

Generate 500 patient records:

# Linux / macOS
api-gen generate \
  --schema patient_schema.yaml \
  --count 500 \
  --output patients.json \
  --seed 42

# Windows PowerShell
api-gen generate `
  --schema patient_schema.yaml `
  --count 500 `
  --output patients.json `
  --seed 42

# Windows Command Prompt
api-gen generate --schema patient_schema.yaml --count 500 --output patients.json --seed 42

Product catalogue

Save as product_schema.json:

{
  "type": "object",
  "properties": {
    "product_id":   { "type": "string", "format": "uuid" },
    "name":         { "type": "string", "faker": "catch_phrase" },
    "sku":          { "type": "string", "pattern": "[A-Z]{3}-\\d{5}" },
    "category":     { "enum": ["electronics", "clothing", "food", "books", "home", "sports"] },
    "price":        { "type": "number", "minimum": 0.99, "maximum": 4999.99, "precision": 2 },
    "stock_count":  { "type": "integer", "minimum": 0, "maximum": 500 },
    "is_available": { "type": "boolean" },
    "tags": {
      "type": "array",
      "items": { "type": "string", "faker": "word" },
      "minItems": 1,
      "maxItems": 6
    },
    "created_at": { "type": "string", "format": "date-time" }
  },
  "required": ["product_id", "name", "sku", "category", "price"]
}

Common recipes

Preview your schema before generating a large dataset

api-gen preview --schema user_schema.json
api-gen preview --schema user_schema.json --count 5 --seed 1

Generate a single record to check your schema is correct

# Linux / macOS
api-gen generate --schema user_schema.json --count 1 --output test.json && cat test.json

# Windows PowerShell
api-gen generate --schema user_schema.json --count 1 --output test.json; Get-Content test.json

# Windows Command Prompt
api-gen generate --schema user_schema.json --count 1 --output test.json && type test.json

Print records directly to the terminal (no file)

api-gen generate --schema user_schema.json --count 3 --output - --format json

Pipe NDJSON records into another process

# Linux / macOS — pipe into any tool that reads one JSON object per line
api-gen generate --schema user_schema.json --count 1000 --output - --format ndjson | ./my-loader.sh

# Pretty-print with Python
api-gen generate --schema user_schema.json --count 5 --output - | python -m json.tool

Generate data without validating (faster for large datasets)

Validation checks that every generated record matches your schema. Skipping it is safe when you trust your schema and need speed.

api-gen generate --schema user_schema.json --count 50000 --output big_dataset.json --no-validate

Generate to NDJSON for Elasticsearch / Logstash bulk import

api-gen generate \
  --schema user_schema.json \
  --count 10000 \
  --output users.ndjson \
  --format ndjson \
  --seed 42

Debug why a schema is not working

api-gen generate --schema user_schema.json --count 1 --output debug.json --verbose

Using with pytest

This is one of the most powerful ways to use the package — generate fresh test data automatically inside your tests.

Basic example

# tests/test_user_api.py
import pytest
import requests
from api_test_data_generator.generator import DataGenerator


@pytest.fixture
def user_generator():
    """Create a generator once, reuse it across tests."""
    return DataGenerator.from_file("schemas/user_schema.json", seed=42)


def test_create_user(user_generator):
    """Test that the API accepts a valid user."""
    user = user_generator.generate_record()

    response = requests.post("http://localhost:8000/users", json=user)

    assert response.status_code == 201
    assert response.json()["email"] == user["email"]


def test_create_100_users(user_generator):
    """Load test: create 100 users and verify all succeed."""
    users = user_generator.generate_bulk(100)

    for user in users:
        response = requests.post("http://localhost:8000/users", json=user)
        assert response.status_code == 201

Parametrize tests with generated data

import pytest
from api_test_data_generator.generator import DataGenerator


def get_test_users(count=5):
    gen = DataGenerator.from_file("schemas/user_schema.json", seed=99)
    return gen.generate_bulk(count)


@pytest.mark.parametrize("user", get_test_users(5))
def test_user_validation(user):
    """Run the same test for 5 different users."""
    assert "@" in user["email"]
    assert len(user["user_id"]) == 36  # UUID length

Use inline schema — no file needed

from api_test_data_generator.generator import DataGenerator


def test_order_processing():
    schema = {
        "type": "object",
        "properties": {
            "order_id": {"type": "string", "format": "uuid"},
            "amount":   {"type": "number", "minimum": 1.0, "maximum": 500.0},
            "currency": {"enum": ["USD", "EUR", "GBP", "KES", "NGN"]}
        },
        "required": ["order_id", "amount", "currency"]
    }

    gen = DataGenerator.from_dict(schema, seed=1)
    orders = gen.generate_bulk(10)

    for order in orders:
        assert order["amount"] >= 1.0
        assert order["currency"] in ["USD", "EUR", "GBP", "KES", "NGN"]

Reproducible data with seeds

By default, every run generates different random data. If you need the same data every time — for example, to compare test results across runs or share data with a colleague — use the --seed option.

# These two commands produce identical output
api-gen generate --schema user_schema.json --count 10 --output users.json --seed 42
api-gen generate --schema user_schema.json --count 10 --output users.json --seed 42

In Python:

from api_test_data_generator.generator import DataGenerator

# Both generators produce the same records
gen1 = DataGenerator.from_file("user_schema.json", seed=42)
gen2 = DataGenerator.from_file("user_schema.json", seed=42)

records1 = gen1.generate_bulk(10)
records2 = gen2.generate_bulk(10)

assert records1 == records2  # Always True

Use a different seed number to get different (but still repeatable) data:

api-gen generate --schema user_schema.json --count 10 --output dataset_a.json --seed 1
api-gen generate --schema user_schema.json --count 10 --output dataset_b.json --seed 2

Error messages and what they mean

Error message What it means How to fix it
Schema file not found The path to your schema file is wrong Check the file path and make sure the file exists
Failed to parse schema Your JSON or YAML file has a syntax error Validate your JSON at jsonlint.com or your YAML at yamllint.com
Unsupported schema format You used a file extension other than .json, .yaml, or .yml Rename your file to use one of those extensions
Record failed schema validation A generated record does not match your schema Check your schema for conflicting rules (e.g. minimum > maximum)
Cannot export an empty record list to CSV You tried to export 0 records Make sure --count is at least 1
No generator registered for type '...' You used an unsupported field type See the field types reference for valid options
CSV format does not support stdout output You used --output - with --format csv Use a file path for CSV, or switch to json or ndjson

Development setup

Follow these steps if you want to modify the code or contribute.

1. Clone or extract the project

# Linux / macOS
cd path/to/api_test_data_generator

# Windows PowerShell / Command Prompt
cd path\to\api_test_data_generator

2. Install in editable mode with all dependencies

Linux / macOS

pip install -e ".[dev,all]"

Windows (PowerShell)

cmd /c "pip install -e .[dev,all]"

Windows (Command Prompt)

pip install -e .[dev,all]

3. Run the tests

# All platforms
pytest

Run with more detail:

pytest -v

Run a specific test file:

pytest tests/test_field_types.py -v

Run without the coverage requirement:

pytest --no-cov

4. Check code style

# All platforms
flake8 api_test_data_generator/ --max-line-length=100

5. Build the package

# All platforms
pip install build
python -m build

The built files appear in dist/:

  • .whl — installable wheel (use this for pip install)
  • .tar.gz — source distribution

Project structure

api_test_data_generator/
│
├── api_test_data_generator/     ← the actual package code
│   ├── generator/
│   │   ├── core.py              ← DataGenerator class (main entry point)
│   │   ├── schema_loader.py     ← reads .json and .yaml schema files
│   │   ├── field_types.py       ← one class per field type (string, integer, etc.)
│   │   ├── validators.py        ← checks generated data matches the schema
│   │   └── exceptions.py        ← custom error classes
│   │
│   ├── exporters/
│   │   ├── json_exporter.py     ← saves records as a JSON array
│   │   ├── ndjson_exporter.py   ← saves records as NDJSON (one object per line)
│   │   └── csv_exporter.py      ← saves records as CSV (flattens nested data)
│   │
│   ├── cli/
│   │   └── main.py              ← "api-gen generate" and "api-gen preview" commands
│   │
│   └── utils/
│       ├── seed_manager.py      ← manages the random seed globally
│       └── randomizer.py        ← helper functions for random data
│
├── tests/                       ← 169 tests, ~95% coverage
├── examples/
│   ├── user_schema.json         ← example user schema
│   └── order_schema.yaml        ← example order schema
├── pyproject.toml               ← package configuration
└── README.md

Platform quick reference

Task Linux / macOS Windows PowerShell Windows CMD
Line continuation \ ` (backtick) ^
Install with extras ".[dev,all]" cmd /c "pip install -e .[dev,all]" .[dev,all]
Path separator / \ or / (both work) \
Run pytest pytest pytest pytest
Run flake8 flake8 pkg/ flake8 pkg/ flake8 pkg/
Build package python -m build python -m build python -m build

FAQ

Q: Why not just use Faker directly? Faker is great for generating individual values. This tool wraps Faker (and other generators) to produce structured multi-field records that match an API contract, handle optional fields automatically, validate output against your schema, and export to JSON/NDJSON/CSV — all with a single command. See the full comparison above.

Q: How is this different from just writing test data by hand? Writing data by hand is fine for 5–10 records. This tool is useful when you need hundreds or thousands of records, need them to be realistic (real-looking names, emails, UUIDs), or need the same data to be reproducible across test runs.

Q: Do I need to know JSON Schema to use this? Not deeply. The examples in this README cover the most common cases. You can copy and adapt them without knowing the full JSON Schema specification.

Q: Will the same seed always produce the same data? Yes, as long as your schema does not change and you use the same version of the tool. If you update your schema, the same seed may produce different records.

Q: My schema has a "faker" key but I am getting generic strings instead of names. Why? The faker method name is probably misspelled. Check the full list of available methods at https://faker.readthedocs.io/en/master/providers.html. When a method name is not recognised, the tool logs a warning message and falls back to a plain string so your generation run is not interrupted.

Q: What is the difference between JSON and NDJSON output? JSON produces a single array file ([{...}, {...}]) — good for loading the whole dataset at once. NDJSON writes one JSON object per line — good for streaming, log ingestion tools (Elasticsearch, Logstash, Kafka consumers), and processing large datasets without loading everything into memory.

Q: Can I stream output to another process without writing a file? Yes. Use --output - with --format json or --format ndjson. The records are printed to stdout and can be piped to any tool. CSV does not support stdout output.

Q: Can I use this with any testing framework? Yes. It returns plain Python dicts and lists, so it works with pytest, unittest, or any other framework. You can also use it to pre-generate data files and load them separately.

Q: How fast is it? Generating 10 000 records takes under 2 seconds on a standard laptop.


License

MIT — free to use, modify, and distribute.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

api_test_data_generator-0.2.0.tar.gz (48.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

api_test_data_generator-0.2.0-py3-none-any.whl (25.7 kB view details)

Uploaded Python 3

File details

Details for the file api_test_data_generator-0.2.0.tar.gz.

File metadata

  • Download URL: api_test_data_generator-0.2.0.tar.gz
  • Upload date:
  • Size: 48.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for api_test_data_generator-0.2.0.tar.gz
Algorithm Hash digest
SHA256 eb638a364b92d7a0c3358237f2e396912468a5d39deb23ceef0b3a0c748f5131
MD5 e3614a16568cff8916edef8a1b920eef
BLAKE2b-256 c6f3ca007f326beaf040736395ecda1adccf22f3dc757a47a1dc0b495759774f

See more details on using hashes here.

File details

Details for the file api_test_data_generator-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for api_test_data_generator-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6f9a571c8e6921602cfd919626eaa280f2e16f74b7c41b9fc493de282222e512
MD5 297cfb5aa13f96189911eaefd3e5a117
BLAKE2b-256 3cca7e8a93680e4ca0605f358386940062afdbd8e59aeefdd23b01d64e375308

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page