Generate structured, customizable test data for API testing.
Project description
api-test-data-generator
A tool that automatically creates realistic fake data for testing your APIs — no more writing test data by hand.
You describe what your data should look like (using a simple schema file), and this tool generates as many records as you need, ready to use in your tests.
Table of Contents
- What does it do?
- Why not just use Faker directly?
- How it works — the big picture
- Requirements
- Installation
- Step 1 — Create a schema file
- Step 2 — Generate data from the terminal
- Step 3 — Preview your schema output instantly
- Step 4 — Use it inside Python or pytest
- All CLI options explained
- Output formats
- Schema field types — full reference
- Real-world schema examples
- Common recipes
- Using with pytest
- Reproducible data with seeds
- Error messages and what they mean
- Development setup
- Project structure
- Platform quick reference
- FAQ
What does it do?
Imagine you are building a user registration API and want to test it with 1000 different users. Writing those users by hand would take hours. With this tool you just:
- Describe what a user looks like (name, email, age, etc.) in a schema file
- Run one command
- Get a ready-to-use JSON, NDJSON, or CSV file with 1000 realistic users
Why not just use Faker directly?
Faker is a great library for generating individual fake values. But when testing an API, you need more than random values — you need structured records that match your API contract, can be exported to a file, and behave consistently across test runs. Doing that with raw Faker requires writing glue code every time.
Here is what that looks like in practice:
With raw Faker — you write this for every project:
from faker import Faker
import json, csv, random
fake = Faker()
random.seed(42)
Faker.seed(42) # easy to forget; causes non-reproducible tests if missed
users = []
for _ in range(1000):
include_age = random.random() < 0.8 # optional fields need manual handling
user = {
"user_id": str(fake.uuid4()),
"email": fake.email(),
"name": fake.name() if include_age else None,
"age": random.randint(18, 60) if include_age else None,
}
users.append(user)
# Flatten nested objects for CSV yourself
# Collect all fieldnames across all records yourself (or columns go missing)
# Write the export boilerplate yourself
with open("users.csv", "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=users[0].keys()) # bug: misses optional fields
writer.writeheader()
writer.writerows(users)
With this tool — define the schema once, run one command:
api-gen generate --schema user_schema.json --count 1000 --output users.csv --format csv --seed 42
The specific things this tool handles for you:
| Problem | Raw Faker | This tool |
|---|---|---|
| Structured multi-field records | Write a loop for every project | Defined once in a schema file |
| Optional fields | if random.random() < 0.8 everywhere |
Automatic — fields not in required appear 80% of the time |
| Reproducible output | Must seed both random and Faker separately |
--seed handles both correctly |
| Schema validation | Write jsonschema calls yourself | Built in — validates every record by default |
| CSV with nested objects | Flatten and collect all fieldnames manually | Automatic dot-notation flattening |
| NDJSON export | Write the loop yourself | --format ndjson |
| No-code usage | Must write Python | api-gen generate from the terminal |
| Quick schema iteration | Generate, print, adjust, repeat in code | api-gen preview --schema ... |
If you are already comfortable with Faker and only need one or two fields, use Faker directly. If you need full records that match an API contract, reproducible datasets, or file export, this tool saves the boilerplate.
How it works — the big picture
Your schema file This tool Output file
(what data looks like) → (generates records) → (users.json / users.ndjson / users.csv)
user_schema.json → api-gen generate → users.json
A schema file is just a description of your data. For example:
"I want records that each have a user ID (UUID format), a name, an email address, and an age between 18 and 60."
The tool reads that description and creates as many records as you ask for.
Requirements
- Python 3.11 or higher
- pip (comes with Python)
Check your Python version by running:
python --version
Installation
Linux / macOS
# Basic install
pip install api-test-data-generator
# Full install — includes pandas (faster CSV) and rich (prettier terminal output)
pip install "api-test-data-generator[all]"
Windows (PowerShell)
# Basic install
pip install api-test-data-generator
# Full install — use cmd /c to avoid a PowerShell bracket issue
cmd /c "pip install api-test-data-generator[all]"
Windows (Command Prompt)
pip install api-test-data-generator[all]
After installing, verify it worked:
api-gen --help
You should see a help message listing the available commands (generate and preview).
Step 1 — Create a schema file
A schema file tells the tool what your data should look like. You can write it in JSON or YAML — use whichever you prefer.
Create a file called user_schema.json:
{
"type": "object",
"properties": {
"user_id": { "type": "string", "format": "uuid" },
"name": { "type": "string", "faker": "name" },
"email": { "type": "string", "format": "email" },
"age": { "type": "integer", "minimum": 18, "maximum": 60 },
"is_active": { "type": "boolean" }
},
"required": ["user_id", "email"]
}
What each line means:
"format": "uuid"— generate a unique ID like550e8400-e29b-41d4-a716-446655440000"faker": "name"— generate a realistic full name like"Sarah Johnson""format": "email"— generate a valid email like"sarah.johnson@example.com""minimum": 18, "maximum": 60— age will always be between 18 and 60"required": [...]— these fields will always be present in every record
Step 2 — Generate data from the terminal
Once you have a schema file, run this command to generate data.
Linux / macOS
api-gen generate \
--schema user_schema.json \
--count 100 \
--output users.json
Windows (PowerShell)
api-gen generate `
--schema user_schema.json `
--count 100 `
--output users.json
Windows (Command Prompt)
api-gen generate --schema user_schema.json --count 100 --output users.json
This creates a users.json file with 100 user records. Open it and you will see something like:
[
{
"user_id": "550e8400-e29b-41d4-a716-446655440000",
"name": "Sarah Johnson",
"email": "sarah.johnson@example.com",
"age": 34,
"is_active": true
},
{
"user_id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
"name": "Michael Torres",
"email": "m.torres@example.org",
"age": 27,
"is_active": false
}
]
Step 3 — Preview your schema output instantly
Before generating a large dataset, use api-gen preview to check that your schema produces the records you expect — without saving any files.
api-gen preview --schema user_schema.json
By default it shows 3 records. Use --count to see up to 10:
api-gen preview --schema user_schema.json --count 5
Use --seed to get the same preview output every time (useful when iterating on a schema):
api-gen preview --schema user_schema.json --count 3 --seed 42
If rich is installed (via pip install "api-test-data-generator[cli]"), the output is syntax-highlighted. Otherwise it falls back to plain JSON — either way, nothing is written to disk.
Step 4 — Use it inside Python or pytest
You can also use the tool directly in your Python code without going to the terminal.
Generate a single record
from api_test_data_generator.generator import DataGenerator
# Load your schema and create a generator
gen = DataGenerator.from_file("user_schema.json")
# Generate one record
user = gen.generate_record()
print(user)
# {'user_id': '550e8400-...', 'name': 'Sarah Johnson', 'email': 'sarah@example.com', 'age': 34}
Generate many records at once
from api_test_data_generator.generator import DataGenerator
gen = DataGenerator.from_file("user_schema.json")
# Generate 500 records
users = gen.generate_bulk(500)
print(f"Generated {len(users)} users")
print(users[0]) # print the first one
Define the schema directly in Python (no file needed)
from api_test_data_generator.generator import DataGenerator
schema = {
"type": "object",
"properties": {
"order_id": {"type": "string", "format": "uuid"},
"amount": {"type": "number", "minimum": 1.0, "maximum": 999.99},
"status": {"enum": ["pending", "paid", "cancelled"]}
},
"required": ["order_id", "amount", "status"]
}
gen = DataGenerator.from_dict(schema)
order = gen.generate_record()
print(order)
# {'order_id': '...', 'amount': 47.83, 'status': 'paid'}
Export to a file from Python
from api_test_data_generator.generator import DataGenerator
from api_test_data_generator.exporters import export_json, export_csv, export_ndjson
gen = DataGenerator.from_file("user_schema.json", seed=42)
users = gen.generate_bulk(1000)
# Save as JSON
export_json(users, "output/users.json")
# Save as NDJSON (one record per line — great for log pipelines and streaming)
export_ndjson(users, "output/users.ndjson")
# Save as CSV
export_csv(users, "output/users.csv")
All CLI options explained
api-gen generate
api-gen generate [OPTIONS]
| Option | What it does | Required? | Default |
|---|---|---|---|
--schema PATH |
Path to your schema file (.json or .yaml) | Yes | — |
--count INT |
How many records to generate | No | 1 |
--output PATH |
Output file path, or - to print to stdout |
Yes | — |
--format TEXT |
File format: json, ndjson, or csv |
No | json |
--seed INT |
A number to make output repeatable (same seed = same data every time) | No | Random |
--no-validate |
Skip checking the output against the schema | No | Validates by default |
--verbose |
Show detailed logs while generating | No | Off |
api-gen preview
api-gen preview [OPTIONS]
| Option | What it does | Required? | Default |
|---|---|---|---|
--schema PATH |
Path to your schema file (.json or .yaml) | Yes | — |
--count INT |
How many records to preview (1–10) | No | 3 |
--seed INT |
A number to make the preview repeatable | No | Random |
--verbose |
Show detailed logs | No | Off |
Output formats
JSON (default)
Records are written as a JSON array — one file, all records.
api-gen generate --schema user_schema.json --count 100 --output users.json
# or explicitly:
api-gen generate --schema user_schema.json --count 100 --output users.json --format json
NDJSON (Newline-Delimited JSON)
Each record is written as a separate JSON object on its own line. This format is widely used for log ingestion, streaming pipelines (Kafka, Logstash, Elasticsearch bulk API), and tools that process one record at a time.
api-gen generate --schema user_schema.json --count 100 --output users.ndjson --format ndjson
Output looks like:
{"user_id": "550e8400-...", "name": "Sarah Johnson", "email": "sarah@example.com"}
{"user_id": "6ba7b810-...", "name": "Michael Torres", "email": "m.torres@example.org"}
CSV
Records are written as a comma-separated table. Nested objects are flattened to dot-notation columns (e.g. address.city, address.country). Install pandas for optimal column alignment with optional fields:
pip install "api-test-data-generator[csv]"
api-gen generate --schema user_schema.json --count 100 --output users.csv --format csv
Note: CSV does not support stdout output (--output -).
Stdout output
Use --output - to print records directly to your terminal or pipe them to another tool. Supports json and ndjson formats.
# Print JSON to terminal
api-gen generate --schema user_schema.json --count 5 --output - --format json
# Pipe NDJSON into another tool
api-gen generate --schema user_schema.json --count 1000 --output - --format ndjson | my-loader
# Pretty-print with Python's json.tool
api-gen generate --schema user_schema.json --count 3 --output - | python -m json.tool
Schema field types — full reference
Basic types
{ "type": "string" }
Generates a random text string like "XkLmpQrsT".
{ "type": "string", "minLength": 5, "maxLength": 20 }
Generates a string between 5 and 20 characters long.
{ "type": "integer", "minimum": 1, "maximum": 100 }
Generates a whole number between 1 and 100, e.g. 47.
{ "type": "number", "minimum": 0.0, "maximum": 99.99, "precision": 2 }
Generates a decimal number like 34.72. precision controls decimal places.
{ "type": "boolean" }
Generates true or false randomly.
Formatted strings
{ "type": "string", "format": "uuid" }
Generates a UUID like "550e8400-e29b-41d4-a716-446655440000".
{ "type": "string", "format": "email" }
Generates a valid email like "john.smith@example.com".
{ "type": "string", "format": "date" }
Generates a date like "2023-07-15".
{ "type": "string", "format": "date-time" }
Generates a datetime like "2023-07-15T14:32:10".
{ "type": "string", "format": "phone" }
Generates a phone number like "+1-555-867-5309".
{ "type": "object", "format": "address" }
Generates a full address as a nested object:
{
"street": "123 Main St",
"city": "Springfield",
"state": "Illinois",
"country": "United States",
"postal_code": "62701"
}
Faker fields — use any realistic data type
The "faker" key lets you use any method from the Faker library to generate realistic data.
{ "type": "string", "faker": "name" }
→ "Sarah Johnson"
{ "type": "string", "faker": "first_name" }
→ "Sarah"
{ "type": "string", "faker": "company" }
→ "Acme Corp Ltd"
{ "type": "string", "faker": "job" }
→ "Software Engineer"
{ "type": "string", "faker": "city" }
→ "Nairobi"
{ "type": "string", "faker": "country" }
→ "Kenya"
{ "type": "string", "faker": "url" }
→ "https://www.example.com/page"
{ "type": "string", "faker": "sentence" }
→ "The quick brown fox jumps over the lazy dog."
{ "type": "string", "faker": "word" }
→ "discovery"
Tip: Browse all available faker providers at https://faker.readthedocs.io/en/master/providers.html
Note: If a
"faker"method name is misspelled or does not exist, the tool logs a warning and falls back to a plain random string rather than crashing your generation run.
Enum — choose from a fixed list
{ "enum": ["active", "inactive", "pending"] }
Randomly picks one value from the list every time.
{ "enum": ["admin", "user", "guest"] }
→ "user"
Pattern — match a specific format using regex
{ "type": "string", "pattern": "[A-Z]{2}\\d{4}" }
Generates strings matching the pattern, e.g. "AB1234" (2 uppercase letters followed by 4 digits).
{ "type": "string", "pattern": "\\d{3}-\\d{2}-\\d{4}" }
→ "123-45-6789" (SSN-style format)
Arrays — lists of items
{
"type": "array",
"items": { "type": "string", "faker": "word" },
"minItems": 1,
"maxItems": 5
}
Generates a list of 1 to 5 random words, e.g. ["apple", "river", "quantum"].
{
"type": "array",
"items": { "type": "integer", "minimum": 1, "maximum": 100 },
"minItems": 3,
"maxItems": 3
}
Generates exactly 3 random integers, e.g. [42, 7, 88].
Nested objects — data inside data
{
"type": "object",
"properties": {
"street": { "type": "string", "faker": "street_address" },
"city": { "type": "string", "faker": "city" },
"zip": { "type": "string", "faker": "postcode" }
},
"required": ["street", "city", "zip"]
}
Generates:
{
"street": "742 Evergreen Terrace",
"city": "Springfield",
"zip": "62701"
}
Real-world schema examples
E-commerce order
Save as order_schema.json:
{
"type": "object",
"properties": {
"order_id": { "type": "string", "format": "uuid" },
"customer_name": { "type": "string", "faker": "name" },
"customer_email": { "type": "string", "format": "email" },
"status": { "enum": ["pending", "confirmed", "shipped", "delivered", "cancelled"] },
"total_amount": { "type": "number", "minimum": 5.0, "maximum": 2000.0, "precision": 2 },
"item_count": { "type": "integer", "minimum": 1, "maximum": 20 },
"created_at": { "type": "string", "format": "date-time" },
"shipping_address": {
"type": "object",
"properties": {
"street": { "type": "string", "faker": "street_address" },
"city": { "type": "string", "faker": "city" },
"country": { "type": "string", "faker": "country" },
"zip": { "type": "string", "faker": "postcode" }
},
"required": ["street", "city", "country", "zip"]
}
},
"required": ["order_id", "customer_email", "status", "total_amount"]
}
Generate 200 orders:
api-gen generate --schema order_schema.json --count 200 --output orders.json
Or stream them as NDJSON into a pipeline:
api-gen generate --schema order_schema.json --count 200 --output - --format ndjson | my-importer
Healthcare patient record
Save as patient_schema.yaml:
type: object
properties:
patient_id:
type: string
format: uuid
full_name:
type: string
faker: name
date_of_birth:
type: string
format: date
gender:
enum:
- male
- female
- other
- prefer_not_to_say
blood_type:
enum:
- A+
- A-
- B+
- B-
- AB+
- AB-
- O+
- O-
phone:
type: string
format: phone
email:
type: string
format: email
registered_at:
type: string
format: date-time
required:
- patient_id
- full_name
- date_of_birth
- blood_type
Generate 500 patient records:
# Linux / macOS
api-gen generate \
--schema patient_schema.yaml \
--count 500 \
--output patients.json \
--seed 42
# Windows PowerShell
api-gen generate `
--schema patient_schema.yaml `
--count 500 `
--output patients.json `
--seed 42
# Windows Command Prompt
api-gen generate --schema patient_schema.yaml --count 500 --output patients.json --seed 42
Product catalogue
Save as product_schema.json:
{
"type": "object",
"properties": {
"product_id": { "type": "string", "format": "uuid" },
"name": { "type": "string", "faker": "catch_phrase" },
"sku": { "type": "string", "pattern": "[A-Z]{3}-\\d{5}" },
"category": { "enum": ["electronics", "clothing", "food", "books", "home", "sports"] },
"price": { "type": "number", "minimum": 0.99, "maximum": 4999.99, "precision": 2 },
"stock_count": { "type": "integer", "minimum": 0, "maximum": 500 },
"is_available": { "type": "boolean" },
"tags": {
"type": "array",
"items": { "type": "string", "faker": "word" },
"minItems": 1,
"maxItems": 6
},
"created_at": { "type": "string", "format": "date-time" }
},
"required": ["product_id", "name", "sku", "category", "price"]
}
Common recipes
Preview your schema before generating a large dataset
api-gen preview --schema user_schema.json
api-gen preview --schema user_schema.json --count 5 --seed 1
Generate a single record to check your schema is correct
# Linux / macOS
api-gen generate --schema user_schema.json --count 1 --output test.json && cat test.json
# Windows PowerShell
api-gen generate --schema user_schema.json --count 1 --output test.json; Get-Content test.json
# Windows Command Prompt
api-gen generate --schema user_schema.json --count 1 --output test.json && type test.json
Print records directly to the terminal (no file)
api-gen generate --schema user_schema.json --count 3 --output - --format json
Pipe NDJSON records into another process
# Linux / macOS — pipe into any tool that reads one JSON object per line
api-gen generate --schema user_schema.json --count 1000 --output - --format ndjson | ./my-loader.sh
# Pretty-print with Python
api-gen generate --schema user_schema.json --count 5 --output - | python -m json.tool
Generate data without validating (faster for large datasets)
Validation checks that every generated record matches your schema. Skipping it is safe when you trust your schema and need speed.
api-gen generate --schema user_schema.json --count 50000 --output big_dataset.json --no-validate
Generate to NDJSON for Elasticsearch / Logstash bulk import
api-gen generate \
--schema user_schema.json \
--count 10000 \
--output users.ndjson \
--format ndjson \
--seed 42
Debug why a schema is not working
api-gen generate --schema user_schema.json --count 1 --output debug.json --verbose
Using with pytest
This is one of the most powerful ways to use the package — generate fresh test data automatically inside your tests.
Basic example
# tests/test_user_api.py
import pytest
import requests
from api_test_data_generator.generator import DataGenerator
@pytest.fixture
def user_generator():
"""Create a generator once, reuse it across tests."""
return DataGenerator.from_file("schemas/user_schema.json", seed=42)
def test_create_user(user_generator):
"""Test that the API accepts a valid user."""
user = user_generator.generate_record()
response = requests.post("http://localhost:8000/users", json=user)
assert response.status_code == 201
assert response.json()["email"] == user["email"]
def test_create_100_users(user_generator):
"""Load test: create 100 users and verify all succeed."""
users = user_generator.generate_bulk(100)
for user in users:
response = requests.post("http://localhost:8000/users", json=user)
assert response.status_code == 201
Parametrize tests with generated data
import pytest
from api_test_data_generator.generator import DataGenerator
def get_test_users(count=5):
gen = DataGenerator.from_file("schemas/user_schema.json", seed=99)
return gen.generate_bulk(count)
@pytest.mark.parametrize("user", get_test_users(5))
def test_user_validation(user):
"""Run the same test for 5 different users."""
assert "@" in user["email"]
assert len(user["user_id"]) == 36 # UUID length
Use inline schema — no file needed
from api_test_data_generator.generator import DataGenerator
def test_order_processing():
schema = {
"type": "object",
"properties": {
"order_id": {"type": "string", "format": "uuid"},
"amount": {"type": "number", "minimum": 1.0, "maximum": 500.0},
"currency": {"enum": ["USD", "EUR", "GBP", "KES", "NGN"]}
},
"required": ["order_id", "amount", "currency"]
}
gen = DataGenerator.from_dict(schema, seed=1)
orders = gen.generate_bulk(10)
for order in orders:
assert order["amount"] >= 1.0
assert order["currency"] in ["USD", "EUR", "GBP", "KES", "NGN"]
Reproducible data with seeds
By default, every run generates different random data. If you need the same data every time — for example, to compare test results across runs or share data with a colleague — use the --seed option.
# These two commands produce identical output
api-gen generate --schema user_schema.json --count 10 --output users.json --seed 42
api-gen generate --schema user_schema.json --count 10 --output users.json --seed 42
In Python:
from api_test_data_generator.generator import DataGenerator
# Both generators produce the same records
gen1 = DataGenerator.from_file("user_schema.json", seed=42)
gen2 = DataGenerator.from_file("user_schema.json", seed=42)
records1 = gen1.generate_bulk(10)
records2 = gen2.generate_bulk(10)
assert records1 == records2 # Always True
Use a different seed number to get different (but still repeatable) data:
api-gen generate --schema user_schema.json --count 10 --output dataset_a.json --seed 1
api-gen generate --schema user_schema.json --count 10 --output dataset_b.json --seed 2
Error messages and what they mean
| Error message | What it means | How to fix it |
|---|---|---|
Schema file not found |
The path to your schema file is wrong | Check the file path and make sure the file exists |
Failed to parse schema |
Your JSON or YAML file has a syntax error | Validate your JSON at jsonlint.com or your YAML at yamllint.com |
Unsupported schema format |
You used a file extension other than .json, .yaml, or .yml |
Rename your file to use one of those extensions |
Record failed schema validation |
A generated record does not match your schema | Check your schema for conflicting rules (e.g. minimum > maximum) |
Cannot export an empty record list to CSV |
You tried to export 0 records | Make sure --count is at least 1 |
No generator registered for type '...' |
You used an unsupported field type | See the field types reference for valid options |
CSV format does not support stdout output |
You used --output - with --format csv |
Use a file path for CSV, or switch to json or ndjson |
Development setup
Follow these steps if you want to modify the code or contribute.
1. Clone or extract the project
# Linux / macOS
cd path/to/api_test_data_generator
# Windows PowerShell / Command Prompt
cd path\to\api_test_data_generator
2. Install in editable mode with all dependencies
Linux / macOS
pip install -e ".[dev,all]"
Windows (PowerShell)
cmd /c "pip install -e .[dev,all]"
Windows (Command Prompt)
pip install -e .[dev,all]
3. Run the tests
# All platforms
pytest
Run with more detail:
pytest -v
Run a specific test file:
pytest tests/test_field_types.py -v
Run without the coverage requirement:
pytest --no-cov
4. Check code style
# All platforms
flake8 api_test_data_generator/ --max-line-length=100
5. Build the package
# All platforms
pip install build
python -m build
The built files appear in dist/:
.whl— installable wheel (use this forpip install).tar.gz— source distribution
Project structure
api_test_data_generator/
│
├── api_test_data_generator/ ← the actual package code
│ ├── generator/
│ │ ├── core.py ← DataGenerator class (main entry point)
│ │ ├── schema_loader.py ← reads .json and .yaml schema files
│ │ ├── field_types.py ← one class per field type (string, integer, etc.)
│ │ ├── validators.py ← checks generated data matches the schema
│ │ └── exceptions.py ← custom error classes
│ │
│ ├── exporters/
│ │ ├── json_exporter.py ← saves records as a JSON array
│ │ ├── ndjson_exporter.py ← saves records as NDJSON (one object per line)
│ │ └── csv_exporter.py ← saves records as CSV (flattens nested data)
│ │
│ ├── cli/
│ │ └── main.py ← "api-gen generate" and "api-gen preview" commands
│ │
│ └── utils/
│ ├── seed_manager.py ← manages the random seed globally
│ └── randomizer.py ← helper functions for random data
│
├── tests/ ← 169 tests, ~95% coverage
├── examples/
│ ├── user_schema.json ← example user schema
│ └── order_schema.yaml ← example order schema
├── pyproject.toml ← package configuration
└── README.md
Platform quick reference
| Task | Linux / macOS | Windows PowerShell | Windows CMD |
|---|---|---|---|
| Line continuation | \ |
` (backtick) |
^ |
| Install with extras | ".[dev,all]" |
cmd /c "pip install -e .[dev,all]" |
.[dev,all] |
| Path separator | / |
\ or / (both work) |
\ |
| Run pytest | pytest |
pytest |
pytest |
| Run flake8 | flake8 pkg/ |
flake8 pkg/ |
flake8 pkg/ |
| Build package | python -m build |
python -m build |
python -m build |
FAQ
Q: Why not just use Faker directly? Faker is great for generating individual values. This tool wraps Faker (and other generators) to produce structured multi-field records that match an API contract, handle optional fields automatically, validate output against your schema, and export to JSON/NDJSON/CSV — all with a single command. See the full comparison above.
Q: How is this different from just writing test data by hand? Writing data by hand is fine for 5–10 records. This tool is useful when you need hundreds or thousands of records, need them to be realistic (real-looking names, emails, UUIDs), or need the same data to be reproducible across test runs.
Q: Do I need to know JSON Schema to use this? Not deeply. The examples in this README cover the most common cases. You can copy and adapt them without knowing the full JSON Schema specification.
Q: Will the same seed always produce the same data? Yes, as long as your schema does not change and you use the same version of the tool. If you update your schema, the same seed may produce different records.
Q: My schema has a "faker" key but I am getting generic strings instead of names. Why?
The faker method name is probably misspelled. Check the full list of available methods at https://faker.readthedocs.io/en/master/providers.html. When a method name is not recognised, the tool logs a warning message and falls back to a plain string so your generation run is not interrupted.
Q: What is the difference between JSON and NDJSON output?
JSON produces a single array file ([{...}, {...}]) — good for loading the whole dataset at once. NDJSON writes one JSON object per line — good for streaming, log ingestion tools (Elasticsearch, Logstash, Kafka consumers), and processing large datasets without loading everything into memory.
Q: Can I stream output to another process without writing a file?
Yes. Use --output - with --format json or --format ndjson. The records are printed to stdout and can be piped to any tool. CSV does not support stdout output.
Q: Can I use this with any testing framework? Yes. It returns plain Python dicts and lists, so it works with pytest, unittest, or any other framework. You can also use it to pre-generate data files and load them separately.
Q: How fast is it? Generating 10 000 records takes under 2 seconds on a standard laptop.
License
MIT — free to use, modify, and distribute.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file api_test_data_generator-0.2.0.tar.gz.
File metadata
- Download URL: api_test_data_generator-0.2.0.tar.gz
- Upload date:
- Size: 48.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb638a364b92d7a0c3358237f2e396912468a5d39deb23ceef0b3a0c748f5131
|
|
| MD5 |
e3614a16568cff8916edef8a1b920eef
|
|
| BLAKE2b-256 |
c6f3ca007f326beaf040736395ecda1adccf22f3dc757a47a1dc0b495759774f
|
File details
Details for the file api_test_data_generator-0.2.0-py3-none-any.whl.
File metadata
- Download URL: api_test_data_generator-0.2.0-py3-none-any.whl
- Upload date:
- Size: 25.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f9a571c8e6921602cfd919626eaa280f2e16f74b7c41b9fc493de282222e512
|
|
| MD5 |
297cfb5aa13f96189911eaefd3e5a117
|
|
| BLAKE2b-256 |
3cca7e8a93680e4ca0605f358386940062afdbd8e59aeefdd23b01d64e375308
|