Skip to main content

Fake data at the speed of Rust

Project description

forgery

CI codecov License: MIT Python 3.11+ Ruff

Fake data at the speed of Rust.

A high-performance fake data generation library for Python, powered by Rust. Designed to be 50-100x faster than Faker for batch operations.

Installation

pip install forgery

From source (for development)

git clone https://github.com/williajm/forgery.git
cd forgery
pip install maturin
maturin develop --release

Quick Start

from forgery import fake

# Generate 10,000 names in one fast call
names = fake.names(10_000)

# Single values work too
email = fake.email()
name = fake.name()

# Deterministic output with seeding
fake.seed(42)
data1 = fake.names(100)
fake.seed(42)
data2 = fake.names(100)
assert data1 == data2

Features

  • Batch-first design: Generate thousands of values in a single call
  • 50-100x faster than Faker for batch operations
  • Multi-locale support: 7 locales with locale-specific data
  • Deterministic seeding: Reproducible output for testing
  • Type hints: Full type stub support for IDE autocompletion
  • Familiar API: Method names match Faker for easy migration

Locale Support

forgery supports 7 locales with locale-specific names, addresses, phone numbers, and more:

Locale Language Country
en_US English United States (default)
en_GB English United Kingdom
de_DE German Germany
fr_FR French France
es_ES Spanish Spain
it_IT Italian Italy
ja_JP Japanese Japan
from forgery import Faker

# Default locale is en_US
fake = Faker()
fake.names(5)  # American names

# Use a different locale
german = Faker("de_DE")
german.names(5)  # German names

japanese = Faker("ja_JP")
japanese.addresses(3)  # Japanese addresses with prefecture

Each locale provides:

  • Names: First names, last names, and full names in the local language
  • Addresses: Cities, regions/states, postal codes in the correct format
  • Phone numbers: Country-specific formats and country codes
  • Companies: Local company names and job titles
  • Colors: Color names in the local language
  • SSN/National IDs: Country-specific formats (US SSN, UK NINO, DE Steuer-ID, etc.)
  • License plates: Country-specific formats

API

Module-level functions (use default instance)

from forgery import seed, names, emails, integers, uuids

seed(42)  # Seed for reproducibility

# Batch generation (fast path)
names(1000)           # list[str] of full names
emails(1000)          # list[str] of email addresses
integers(1000, 0, 100)  # list[int] in range
uuids(1000)           # list[str] of UUIDv4

# Single values
name()                # str
email()               # str
integer(0, 100)       # int
uuid()                # str

Faker class (independent instances)

from forgery import Faker

# Each instance has its own RNG state
fake1 = Faker()
fake2 = Faker()

fake1.seed(42)
fake2.seed(99)

# Generate independently
fake1.names(100)
fake2.emails(100)

Available Generators

Names & Identity

Batch Single Description
names(n) name() Full names (first + last)
first_names(n) first_name() First names
last_names(n) last_name() Last names

Contact Information

Batch Single Description
emails(n) email() Email addresses
safe_emails(n) safe_email() Safe domain emails (@example.com, etc.)
free_emails(n) free_email() Free provider emails (@gmail.com, etc.)
phone_numbers(n) phone_number() Phone numbers in (XXX) XXX-XXXX format

Numbers & Identifiers

Batch Single Description
integers(n, min, max) integer(min, max) Random integers in range
floats(n, min, max) float_(min, max) Random floats in range (Note: float_ avoids shadowing Python's float builtin)
uuids(n) uuid() UUID v4 strings
md5s(n) md5() Random 32-char hex strings (MD5-like format, not cryptographic hashes)
sha256s(n) sha256() Random 64-char hex strings (SHA256-like format, not cryptographic hashes)

Dates & Times

Batch Single Description
dates(n, start, end) date(start, end) Random dates (YYYY-MM-DD)
datetimes(n, start, end) datetime_(start, end) Random datetimes (ISO 8601). Note: datetime_ avoids shadowing Python's datetime module
dates_of_birth(n, min_age, max_age) date_of_birth(min_age, max_age) Birth dates for given age range

Addresses

Batch Single Description
street_addresses(n) street_address() Street addresses (e.g., "123 Main Street")
cities(n) city() City names
states(n) state() State names
countries(n) country() Country names
zip_codes(n) zip_code() ZIP codes (5 or 9 digit)
addresses(n) address() Full addresses

Company & Business

Batch Single Description
companies(n) company() Company names
jobs(n) job() Job titles
catch_phrases(n) catch_phrase() Business catch phrases

Network

Batch Single Description
urls(n) url() URLs with https://
domain_names(n) domain_name() Domain names
ipv4s(n) ipv4() IPv4 addresses
ipv6s(n) ipv6() IPv6 addresses
mac_addresses(n) mac_address() MAC addresses

Web & HTML

Batch Single Description
url_paths(n) url_path() URL paths (e.g., "/blog/products/42")
url_slugs(n) url_slug() URL slugs (e.g., "ultimate-guide-2024")
query_strings(n) query_string() Query strings (e.g., "?page=2&sort=date")
meta_descriptions(n) meta_description() HTML meta description tags
og_tags_batch(n) og_tags() Open Graph meta tag sets (multi-line)
hreflang_tags_batch(n) hreflang_tags() Hreflang link tag sets with x-default
img_tags(n, ratio) img_tag(ratio) Image tags (configurable missing alt ratio)
content_type_headers(n) content_type_header() Content-Type header values
http_headers_batch(n) http_headers() HTTP response header dicts
robots_txts(n) robots_txt() robots.txt file contents
html_pages(n, ...) html_page(...) Full HTML5 pages with configurable SEO elements
- website(pages, domain) Interlinked website (dict of URL → HTML)
from forgery import Faker

fake = Faker()
fake.seed(42)

# Generate a full HTML page with SEO elements
page = fake.html_page(
    headings=4,
    internal_links=5,
    images=3,
    include_og_tags=True,
    domain="mysite.com",
)

# Generate an interlinked website for crawl testing
site = fake.website(pages=20, domain="example.com")
# site = {"https://example.com/": "<html>...", "https://example.com/blog/guide": "<html>...", ...}
# Every page is reachable from the homepage via link traversal

Finance

Batch Single Description
credit_cards(n) credit_card() Credit card numbers (valid Luhn)
credit_card_providers(n) credit_card_provider() Card network name (Visa, Mastercard, Amex, Discover)
credit_card_expires(n) credit_card_expire() Expiry date in MM/YY format
credit_card_security_codes(n) credit_card_security_code() CVV: 3 digits (Visa/MC/Discover) or 4 digits (Amex)
credit_card_fulls(n) credit_card_full() Complete card info dict (provider, number, expire, security_code, name)
ibans(n) iban() IBAN numbers (valid checksum)
bics(n) bic() BIC/SWIFT codes (8 or 11 characters)
bank_accounts(n) bank_account() Bank account numbers (8-17 digits)
bank_names(n) bank_name() Bank names (locale-specific)

Currency

Batch Single Description
currency_codes(n) currency_code() ISO 4217 currency codes (e.g., "USD", "EUR")
currency_names(n) currency_name() Currency names in English (e.g., "United States Dollar")
currencies(n) currency() (code, name) tuples
prices(n, min, max) price(min, max) Prices with 2 decimal places

UK Banking

Batch Single Description
sort_codes(n) sort_code() UK sort codes (XX-XX-XX format)
uk_account_numbers(n) uk_account_number() UK account numbers (exactly 8 digits)
transaction_amounts(n, min, max) transaction_amount(min, max) Transaction amounts (2 decimal places)
transactions(n, balance, start, end) - Full transaction records with running balance

Passwords

Batch Single Description
passwords(n, ...) password(...) Random passwords with configurable character sets

Password options:

  • length: Password length (default: 12)
  • uppercase: Include uppercase letters (default: True)
  • lowercase: Include lowercase letters (default: True)
  • digits: Include digits (default: True)
  • symbols: Include symbols (default: True)

Text & Lorem Ipsum

Batch Single Description
sentences(n, word_count) sentence(word_count) Lorem ipsum sentences
paragraphs(n, sentence_count) paragraph(sentence_count) Lorem ipsum paragraphs
texts(n, min_chars, max_chars) text(min_chars, max_chars) Text blocks with length limits

Colors

Batch Single Description
colors(n) color() Color names
hex_colors(n) hex_color() Hex color codes (#RRGGBB)
rgb_colors(n) rgb_color() RGB tuples (r, g, b)

Geographic

Batch Single Description
latitudes(n) latitude() Random latitude in [-90.0, 90.0]
longitudes(n) longitude() Random longitude in [-180.0, 180.0]
coordinates(n) coordinate() (latitude, longitude) tuples

User Agents

Batch Single Description
user_agents(n) user_agent() Random browser user agent string (any browser)
chromes(n) chrome() Chrome user agent string
firefoxes(n) firefox() Firefox user agent string
safaris(n) safari() Safari user agent string

Booleans

Batch Single Description
booleans(n, probability) boolean(probability) Random booleans (default: 50% True)

String Pattern Templates

Batch Single Description
numerify_batch(pattern, n) numerify(pattern) Replace # with random digits (0-9)
letterify_batch(pattern, n) letterify(pattern) Replace ? with random lowercase letters (a-z)
bothify_batch(pattern, n) bothify(pattern) Replace # with digits and ? with lowercase letters
lexify_batch(pattern, n) lexify(pattern) Replace ? with random uppercase letters (A-Z)
from forgery import Faker

fake = Faker()
fake.numerify("###-###-####")   # "847-321-9056"
fake.letterify("??-??")         # "kx-bp"
fake.bothify("??-####")         # "mz-7314"
fake.lexify("???-###")          # "QWR-###" (only ? is replaced)

Barcode

Batch Single Description
ean13s(n) ean13() EAN-13 barcodes (valid check digit)
ean8s(n) ean8() EAN-8 barcodes (valid check digit)
upc_as(n) upc_a() UPC-A barcodes (valid check digit)
upc_es(n) upc_e() UPC-E barcodes (valid check digit)

ISBN

Batch Single Description
isbn10s(n) isbn10() ISBN-10 with hyphens (valid check digit, may end in X)
isbn13s(n) isbn13() ISBN-13 with hyphens (978/979 prefix, valid check digit)

File/System

Batch Single Description
file_names(n) file_name() File names with extension (e.g., "report.pdf")
file_extensions(n) file_extension() File extensions (e.g., "pdf", "csv")
mime_types(n) mime_type() MIME types (e.g., "application/pdf")
file_paths(n) file_path_() File paths (e.g., "/home/user/documents/report.pdf")

Commerce/Product

Batch Single Description
product_names(n) product_name() Product names (e.g., "Ergonomic Steel Chair")
product_categories(n) product_category() Product categories (e.g., "Electronics")
departments(n) department() Store departments (e.g., "Home & Garden")
product_materials(n) product_material() Product materials (e.g., "Cotton", "Steel")

SSN/National ID

Batch Single Description
ssns(n) ssn() Locale-specific national ID numbers

Formats by locale:

Locale Format Example
en_US SSN (XXX-XX-XXXX) "123-45-6789"
en_GB NI Number (XX 99 99 99 X) "AB 12 34 56 C"
de_DE Steuer-ID (11 digits) "12345678901"
fr_FR NSS (15 digits with check key) "185076923400145"
es_ES DNI (8 digits + letter) "12345678Z"
it_IT Codice Fiscale (16 alphanumeric) "RSSMRA85M01H501Z"
ja_JP My Number (12 digits with check) "123456789012"

Vehicle/Automotive

Batch Single Description
license_plates(n) license_plate() Locale-specific license plates
vehicle_makes(n) vehicle_make() Vehicle manufacturers (e.g., "Toyota")
vehicle_models(n) vehicle_model() Vehicle models (e.g., "Camry")
vehicle_years(n) vehicle_year() Model years (1990-2026)
vins(n) vin() 17-character VINs (valid check digit, no I/O/Q)

License plate formats by locale:

Locale Format Example
en_US ABC-1234 "KHX-4829"
en_GB AB12 CDE "LM65 NXR"
de_DE X AB 1234 "B KL 3847"
fr_FR AB-123-CD "FG-482-HJ"
es_ES 1234 ABC "4829 FKH"
it_IT AB 123 CD "FG 482 HJ"
ja_JP 300 12-34 "500 38-47"

Package Registry Data

For seeding test databases of package registries (PyPI, npm, Maven, Cargo, RubyGems). Cross-ecosystem primitives share one API; ecosystem-specific shapes have their own methods.

Cross-ecosystem primitives

Batch Single Description
commit_shas(n) commit_sha() 40-hex-char git commit SHA
short_commit_shas(n) short_commit_sha() 7-hex-char short SHA
semvers(n) semver() SemVer MAJOR.MINOR.PATCH
semver_prereleases(n) semver_prerelease() Pre-release (e.g. 1.2.3-alpha.1+build.5)
calvers(n) calver() CalVer in mixed schemes (YYYY.MM.DD, YY.MM, ...)
spdx_licenses(n) spdx_license() SPDX identifier (50 common IDs)
git_usernames(n) git_username() GitHub/GitLab/Bitbucket-compatible username

Ecosystem-specific versions (where SemVer alone doesn't cover the format)

Batch Single Description
pypi_versions(n) pypi_version() PEP 440 (pre/post/dev releases)
maven_versions(n) maven_version() Maven version with qualifiers (-SNAPSHOT, .RELEASE, ...)

Version constraints

Batch Single Description
pypi_version_specifiers(n) pypi_version_specifier() PEP 440 (e.g. >=1.2,<2.0, ~=1.0)
npm_version_ranges(n) npm_version_range() npm (e.g. ^1.2.3, ~1.2.3, 1.x)
cargo_version_reqs(n) cargo_version_req() Cargo (e.g. ^1.0, ~1.2)
maven_version_ranges(n) maven_version_range() Maven (e.g. [1.0,2.0))
gem_version_requirements(n) gem_version_requirement() RubyGems (e.g. ~> 1.2)

Package identity

Batch Single Description
pypi_package_names(n) pypi_package_name() PEP 503 normalised (lowercase [a-z0-9-])
npm_package_names(n) npm_package_name() Plain or @scope/pkg (~30% scoped)
cargo_package_names(n) cargo_package_name() Rust-ident flavour
gem_names(n) gem_name() RubyGems gem name
maven_group_ids(n) maven_group_id() Reverse domain (e.g. com.example.tools)
maven_artifact_ids(n) maven_artifact_id() Lowercase with hyphens
maven_coordinates(n) maven_coordinate() GAV (group:artifact:version)

Full requirement lines

Batch Single Description
pypi_requirements(n) pypi_requirement() e.g. requests>=2.0.0,<3.0.0
from forgery import Faker

fake = Faker()
fake.seed(42)
fake.pypi_requirement()       # 'requests>=2.0.0,<3.0.0'
fake.maven_coordinate()       # 'com.example.tools:widget-core:1.2.3-SNAPSHOT'
fake.npm_package_name()       # '@types/fast-parser'
fake.spdx_license()           # 'Apache-2.0'
fake.git_username()           # 'tiny-logger42'
fake.commit_sha()             # 'a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2'

The nine batch methods below accept unique=True for no-duplicate output, matching the names(n, unique=True) pattern — useful when seeding registry tables that have a unique-name constraint. Exhausting the combinatorial pool raises ValueError:

fake.pypi_package_names(100, unique=True)   # 100 distinct package names
fake.maven_coordinates(500, unique=True)    # 500 distinct GAVs
fake.spdx_licenses(60, unique=True)         # ValueError: only 50 SPDX IDs available

Methods with unique support: pypi_package_names, npm_package_names, cargo_package_names, gem_names, maven_group_ids, maven_artifact_ids, maven_coordinates, git_usernames, spdx_licenses.

Profile

Batch Single Description
profiles(n) profile() Complete personal profiles (returns dict)

Each profile dict contains: first_name, last_name, name, email, phone, address, city, state, zip_code, country, company, job, date_of_birth.

from forgery import Faker

fake = Faker()
fake.seed(42)
p = fake.profile()
# {"first_name": "Ryan", "last_name": "Grant", "name": "Ryan Grant",
#  "email": "rgrant@example.com", "phone": "(555) 123-4567", ...}

Unique Value Generation

For batch methods that select from finite lists (names, cities, countries, etc.), you can request unique values:

from forgery import Faker

fake = Faker()
fake.seed(42)

# Generate 50 unique names (no duplicates)
unique_names = fake.names(50, unique=True)
assert len(unique_names) == len(set(unique_names))

# Generate 20 unique cities
unique_cities = fake.cities(20, unique=True)

# Generate 50 unique countries
unique_countries = fake.countries(50, unique=True)

Important Notes:

  • Unique generation will raise ValueError if you request more unique values than are available in the underlying data set.
  • Performance: Unique generation uses O(n) memory (stores all outputs in a HashSet) and can be O(n × 100) time in worst case due to retry logic. For very large unique batches, consider whether duplicates are actually problematic for your use case.

Financial Transaction Generation

Generate realistic bank transaction data with running balances:

from forgery import Faker

fake = Faker()
fake.seed(42)

# Generate 50 transactions from Jan to Mar 2024, starting with £1000 balance
txns = fake.transactions(50, 1000.0, "2024-01-01", "2024-03-31")

for txn in txns[:3]:
    print(f"{txn['date']} | {txn['transaction_type']:15} | {txn['amount']:>10.2f} | {txn['balance']:>10.2f}")
# 2024-01-03 | Card Payment    |    -42.50 |     957.50
# 2024-01-05 | Direct Debit    |   -125.00 |     832.50
# 2024-01-08 | Faster Payment  |   1250.00 |    2082.50

Each transaction dict contains:

  • reference: 8-character alphanumeric reference
  • date: Transaction date (YYYY-MM-DD)
  • amount: Transaction amount (negative for debits)
  • transaction_type: e.g., "Card Payment", "Direct Debit", "Salary"
  • description: Merchant or payee name
  • balance: Running balance after transaction

Structured Data Generation

Generate entire datasets with a single call using schema definitions:

records()

Returns a list of dictionaries:

from forgery import records, seed

seed(42)
data = records(1000, {
    "id": "uuid",
    "name": "name",
    "email": "email",
    "age": ("int", 18, 65),
    "salary": ("float", 30000.0, 150000.0),
    "hire_date": ("date", "2020-01-01", "2024-12-31"),
    "bio": ("text", 50, 200),
    "status": ("choice", ["active", "inactive", "pending"]),
})

# data[0] = {"id": "88917925-...", "name": "Austin Bell", "age": 50, ...}

records_tuples()

Returns a list of tuples (faster, values in alphabetical key order):

from forgery import records_tuples, seed

seed(42)
data = records_tuples(1000, {
    "age": ("int", 18, 65),
    "name": "name",
})
# data[0] = (50, "Ryan Grant")  # (age, name) - alphabetical order

records_arrow()

Returns a PyArrow RecordBatch for high-performance data processing:

import pyarrow as pa
from forgery import records_arrow, seed

seed(42)
batch = records_arrow(100_000, {
    "id": "uuid",
    "name": "name",
    "age": ("int", 18, 65),
    "salary": ("float", 30000.0, 150000.0),
})

# batch is a pyarrow.RecordBatch
print(batch.num_rows)     # 100000
print(batch.num_columns)  # 4
print(batch.schema)
# age: int64 not null
# id: string not null
# name: string not null
# salary: double not null

# Convert to pandas DataFrame
df = batch.to_pandas()

# Or to Polars DataFrame
import polars as pl
df_polars = pl.from_arrow(batch)

Note: Requires pyarrow to be installed: pip install pyarrow

The records_arrow() function generates data in columnar format, which is more efficient for large batches and integrates seamlessly with the Arrow ecosystem (PyArrow, Polars, pandas, DuckDB, etc.).

Serialized Output Formats

Generate records directly as serialized strings or bytes, avoiding the overhead of creating Python objects just to serialize them.

records_csv()

Returns a CSV string with a header row (fields in alphabetical order):

from forgery import records_csv, seed

seed(42)
csv_str = records_csv(1000, {
    "name": "name",
    "email": "email",
    "age": ("int", 18, 65),
})
# age,email,name
# 50,austin.bell@example.com,Austin Bell
# ...

records_json()

Returns a JSON array of objects:

from forgery import records_json, seed

seed(42)
json_str = records_json(1000, {
    "name": "name",
    "age": ("int", 18, 65),
    "active": "boolean",
})
# [{"active":true,"age":50,"name":"Austin Bell"},...]

Integer and float values are JSON numbers, booleans are JSON booleans, and tuples (e.g., RGB colors, coordinates) become JSON arrays.

records_ndjson()

Returns newline-delimited JSON (one JSON object per line, no trailing newline):

from forgery import records_ndjson, seed

seed(42)
ndjson_str = records_ndjson(1000, {
    "id": "uuid",
    "name": "name",
})
# {"id":"88917925-...","name":"Austin Bell"}
# {"id":"a3c1e7f2-...","name":"Maria Garcia"}
# ...

records_parquet()

Returns Parquet file content as bytes (uses the Arrow path internally).

Note: Like records_arrow(), this uses column-major generation. With a fixed seed and multi-column schema, the row data will differ from the row-major methods (records_csv, records_json, records_ndjson, records_sql).

from forgery import records_parquet, seed

seed(42)
parquet_bytes = records_parquet(100_000, {
    "id": "uuid",
    "name": "name",
    "salary": ("float", 30000.0, 150000.0),
})

# Write to disk
with open("data.parquet", "wb") as f:
    f.write(parquet_bytes)

# Or load directly with PyArrow
import pyarrow.parquet as pq
import io
table = pq.read_table(io.BytesIO(parquet_bytes))

records_sql()

Returns ANSI SQL INSERT statements with properly escaped values:

from forgery import records_sql, seed

seed(42)
sql = records_sql(1000, {
    "name": "name",
    "email": "email",
    "age": ("int", 18, 65),
}, "users")
# INSERT INTO "users" ("age", "email", "name") VALUES
# (50, 'austin.bell@example.com', 'Austin Bell'),
# ...
# (34, 'maria.garcia@gmail.com', 'Maria Garcia');

For large batches, multiple INSERT statements are generated with up to 1000 rows each. Column names are double-quoted and string values use single-quote escaping.

Streaming File Writer

For datasets that exceed available memory, records_to_file() generates records in bounded-memory chunks and writes each chunk to disk before generating the next. Memory usage is proportional to chunk_size, not total n.

from forgery import Faker

fake = Faker()
fake.seed(42)

# Generate 100 million records — memory stays at ~500-800 MB
fake.records_to_file(
    100_000_000,
    {"id": "uuid", "name": "name", "amount": ("float", 0.01, 9999.99)},
    "transactions.parquet",
    chunk_size=1_000_000,  # records per chunk (default: 1M, max: 10M)
)

Supported formats: CSV (.csv), NDJSON (.ndjson/.jsonl), SQL (.sql), Parquet (.parquet). Format is auto-detected from the file extension, or set explicitly with format="csv".

SQL format requires a table parameter:

from forgery import records_to_file, seed

seed(42)
records_to_file(
    50_000_000,
    {"name": "name", "email": "email"},
    "users.sql",
    table="users",
    chunk_size=500_000,
)

Progress callback — track progress with an optional callback:

from forgery import records_to_file, seed

seed(42)
records_to_file(
    10_000_000,
    {"name": "name", "email": "email"},
    "users.csv",
    on_progress=lambda written, total: print(f"\r{written/total:.0%}", end=""),
)

Memory estimation — plan chunk sizes based on available RAM:

from forgery import Faker

schema = {"id": "uuid", "name": "name", "amount": ("float", 0.01, 9999.99)}
est = Faker.estimate_memory(1_000_000, schema)
print(f"~{est / 1024**2:.0f} MB per 1M records")

All streaming formats use row-major generation, so the same seed produces identical data across CSV, NDJSON, SQL, and Parquet output.

Schema Field Types

Type Syntax Example
Simple types "type_name" "name", "email", "uuid", "int", "float"
Integer range ("int", min, max) ("int", 18, 65)
Float range ("float", min, max) ("float", 0.0, 100.0)
Text with limits ("text", min_chars, max_chars) ("text", 50, 200)
Date range ("date", start, end) ("date", "2020-01-01", "2024-12-31")
Choice ("choice", [options]) ("choice", ["a", "b", "c"])

All simple types from the generators above are supported: name, first_name, last_name, email, safe_email, free_email, phone, uuid, int, float, date, datetime, street_address, city, state, country, zip_code, address, company, job, catch_phrase, url, domain_name, ipv4, ipv6, mac_address, credit_card, iban, sentence, paragraph, text, color, hex_color, rgb_color, md5, sha256, latitude, longitude, coordinate, boolean, ssn, file_name, file_extension, mime_type, file_path, license_plate, vehicle_make, vehicle_model, vehicle_year, vin, ean13, ean8, upc_a, upc_e, isbn10, isbn13, product_name, product_category, department, product_material, url_path, url_slug, query_string.

Async Generation

For large datasets (millions of records), async methods prevent blocking the Python event loop:

records_async()

import asyncio
from forgery import records_async, seed

async def main():
    seed(42)
    records = await records_async(1_000_000, {
        "id": "uuid",
        "name": "name",
        "email": "email",
    })
    print(f"Generated {len(records)} records")

asyncio.run(main())

records_tuples_async()

import asyncio
from forgery import records_tuples_async, seed

async def main():
    seed(42)
    records = await records_tuples_async(1_000_000, {
        "age": ("int", 18, 65),
        "name": "name",
    })
    return records

asyncio.run(main())

records_arrow_async()

import asyncio
from forgery import records_arrow_async, seed

async def main():
    seed(42)
    batch = await records_arrow_async(1_000_000, {
        "id": "uuid",
        "name": "name",
        "salary": ("float", 30000.0, 150000.0),
    })
    return batch.to_pandas()

asyncio.run(main())

All async methods accept an optional chunk_size parameter (default: 10,000) that controls how frequently control is yielded to the event loop. Smaller chunks yield more frequently but have slightly higher overhead.

Note: Async methods use a snapshot of the RNG state at call time. The main Faker instance's RNG is not advanced, so calling the same async method twice with the same seed produces identical results. For unique results across multiple async calls, use different seeds or different Faker instances.

Arrow async chunking caveat: For records_arrow_async(), when n > chunk_size, the output differs from records_arrow() due to column-major RNG consumption within each chunk. If you need identical results to the sync version, set chunk_size >= n. The records_async() and records_tuples_async() methods always match their sync counterparts regardless of chunk size.

Custom Providers

Register your own data providers for domain-specific generation:

Basic Custom Provider

from forgery import Faker

fake = Faker()

# Register a uniform (equal probability) provider
fake.add_provider("team", ["Engineering", "Sales", "HR", "Marketing"])

# Generate values
team = fake.generate("team")
teams = fake.generate_batch("team", 100)

Weighted Custom Provider

# Register a weighted provider (higher weights = more likely)
fake.add_weighted_provider("status", [
    ("active", 80),    # 80% probability
    ("inactive", 20),  # 20% probability
])

# Generate with weighted distribution
statuses = fake.generate_batch("status", 1000)
# Expect ~800 "active", ~200 "inactive"

Custom Providers in Records

Custom providers integrate seamlessly with records():

from forgery import Faker

fake = Faker()
fake.add_provider("team", ["Eng", "Sales", "HR"])
fake.add_weighted_provider("priority", [("high", 20), ("medium", 50), ("low", 30)])

data = fake.records(1000, {
    "id": "uuid",
    "name": "name",
    "team": "team",              # Custom provider
    "priority": "priority",      # Weighted custom provider
})

Provider Management

fake.has_provider("team")  # Check if provider exists
fake.list_providers()      # List all custom provider names
fake.remove_provider("team")  # Remove a provider

Module-level Convenience

from forgery import add_provider, generate, generate_batch, seed

seed(42)
add_provider("tier", ["gold", "silver", "bronze"])
tier = generate("tier")
tiers = generate_batch("tier", 100)

Note: Custom provider names cannot conflict with built-in types (e.g., "name", "email", "uuid").

Performance

Benchmark generating 100,000 items:

Names:
  forgery.names():  0.015s
  Faker.name():     1.523s
  Speedup: 101x

Emails:
  forgery.emails():  0.021s
  Faker.email():     2.134s
  Speedup: 101x

Benchmark generating 1,000,000 items:

Names:
  forgery.names():   0.108s
  Faker.name():     47.111s
  Speedup: 436x

Emails:
  forgery.emails():   0.167s
  Faker.email():     46.984s
  Speedup: 281x

Seeding Contract

  • seed(n) affects the default fake instance only
  • Each Faker instance has its own independent RNG state
  • Single-threaded determinism only: Results are reproducible within one thread
  • No cross-version guarantee: Output may differ between forgery versions

Parallel Generation

For large batches, enable parallel generation to split work across multiple CPU cores:

from forgery import Faker

fake = Faker()
fake.seed(42)
fake.set_parallel(True)  # Auto-detect thread count

# All batch methods now run in parallel
names = fake.names(1_000_000)      # ~3.3x faster than sequential
emails = fake.emails(1_000_000)
uuids = fake.uuids(1_000_000)

# Explicit thread count (useful for reproducibility across machines)
fake.set_parallel(True, num_threads=4)

# Check current settings
fake.get_parallel()      # True
fake.get_num_threads()   # 4

# Disable parallel
fake.set_parallel(False)

Determinism contract:

  • Same seed + same num_threads = identical output
  • Changing num_threads produces different output
  • unique=True always uses the sequential path

Performance (names benchmark):

Batch Size Sequential Parallel Speedup
10,000 443 µs 753 µs 0.6x (overhead)
100,000 8.5 ms 2.5 ms 3.4x
1,000,000 83 ms 25 ms 3.3x

Auto-detection ensures parallelism is only used when beneficial (minimum 1,000 items per thread).

Thread Safety

forgery is NOT thread-safe. Each Faker instance maintains mutable RNG state.

For multi-threaded applications, create one Faker instance per thread:

from concurrent.futures import ThreadPoolExecutor
from forgery import Faker

def generate_names(seed: int) -> list[str]:
    fake = Faker()  # Create per-thread instance
    fake.seed(seed)
    return fake.names(1000)

with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(generate_names, range(4)))

Do NOT share a Faker instance across threads.

Note: set_parallel(True) uses Rayon's internal thread pool for parallel generation within a single Faker instance. This is different from sharing a Faker across Python threads, which remains unsafe.

Development

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install maturin
pip install maturin

# Build and install locally
maturin develop --release

# Run tests
cargo test          # Rust tests
pytest              # Python tests

# Run benchmarks
python tests/benchmarks/bench_vs_faker.py

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

forgery-0.4.0.tar.gz (343.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

forgery-0.4.0-cp314-cp314-win_amd64.whl (4.2 MB view details)

Uploaded CPython 3.14Windows x86-64

forgery-0.4.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

forgery-0.4.0-cp314-cp314-macosx_11_0_arm64.whl (3.7 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

forgery-0.4.0-cp313-cp313-win_amd64.whl (4.2 MB view details)

Uploaded CPython 3.13Windows x86-64

forgery-0.4.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

forgery-0.4.0-cp313-cp313-macosx_11_0_arm64.whl (3.7 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

forgery-0.4.0-cp312-cp312-win_amd64.whl (4.2 MB view details)

Uploaded CPython 3.12Windows x86-64

forgery-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

forgery-0.4.0-cp312-cp312-macosx_11_0_arm64.whl (3.7 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

forgery-0.4.0-cp311-cp311-win_amd64.whl (4.2 MB view details)

Uploaded CPython 3.11Windows x86-64

forgery-0.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

forgery-0.4.0-cp311-cp311-macosx_11_0_arm64.whl (3.7 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file forgery-0.4.0.tar.gz.

File metadata

  • Download URL: forgery-0.4.0.tar.gz
  • Upload date:
  • Size: 343.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for forgery-0.4.0.tar.gz
Algorithm Hash digest
SHA256 fe842aa413e04ee37b85d02c2939026b2bbfa678fb671fd572a9350375e1aa65
MD5 bc39f6a4cb897cf569211c08f8243ade
BLAKE2b-256 52f2f4fbd3ce43303d98392da28ce99ee34091d9321ae4dcc17ca78b1386fab6

See more details on using hashes here.

Provenance

The following attestation bundles were made for forgery-0.4.0.tar.gz:

Publisher: release.yml on williajm/forgery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file forgery-0.4.0-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: forgery-0.4.0-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 4.2 MB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for forgery-0.4.0-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 99af788410b9190b3effc8a07bcbcfd41d4d306dfaeb61bc5f46f63646bc28d7
MD5 a49194fe6d069e37719ea8ed74acc5c0
BLAKE2b-256 ada399dc32fad14e3ed09ecd978975dc0c7d25bf9050fa3cc86fe71e2a720c65

See more details on using hashes here.

Provenance

The following attestation bundles were made for forgery-0.4.0-cp314-cp314-win_amd64.whl:

Publisher: release.yml on williajm/forgery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file forgery-0.4.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for forgery-0.4.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0b4b9fb186731fb7d939de9e0eb194f6cbab7099223c276e4b2a4a4d23b9cdd3
MD5 057e33f0dc87f16e4e7f3ddfbf4348ea
BLAKE2b-256 19cb18b4560163bd0c4da13c5a4d33a87b8f012ba9309cb7428b21889980b688

See more details on using hashes here.

Provenance

The following attestation bundles were made for forgery-0.4.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on williajm/forgery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file forgery-0.4.0-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for forgery-0.4.0-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 710da32ad63faa50c4689ae4141af8d40721e44a504d621147b5d234e5a423f5
MD5 791c98287238a5d6f99462ba70a3a6c1
BLAKE2b-256 3a4c9374a38507ffb176415095414b9bbc73af23d937109d910acf1de56ff968

See more details on using hashes here.

Provenance

The following attestation bundles were made for forgery-0.4.0-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on williajm/forgery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file forgery-0.4.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: forgery-0.4.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 4.2 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for forgery-0.4.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 3f37e68caa6ddcf2d90364e035ec80713d9090c3c29577a79845b542eeb5259e
MD5 f9e1cbaf76c4300adffbe7ba1ffd761d
BLAKE2b-256 defdc4b0df678ce971e64c96210a2524ee70649b540155fd36b1a6229691fd01

See more details on using hashes here.

Provenance

The following attestation bundles were made for forgery-0.4.0-cp313-cp313-win_amd64.whl:

Publisher: release.yml on williajm/forgery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file forgery-0.4.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for forgery-0.4.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 33d743c02e4b7058c639d137bf4843d5904dbb1f26b91a5c78bd027323c328cf
MD5 c483911cb5652b213c8224c96bc0f237
BLAKE2b-256 4ca7f04f5163fd8624a00a000e33465d62b7123a4f7b8c142fdcbae580cfff1d

See more details on using hashes here.

Provenance

The following attestation bundles were made for forgery-0.4.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on williajm/forgery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file forgery-0.4.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for forgery-0.4.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 92a34a0bb514654a00bbdc9f7842cbfdcf7b6327927096ef11febf8a63394e5f
MD5 20a604529d0d9f713d3a21c29ab7a1b7
BLAKE2b-256 bb9f1d82c4c89a83888b40015b8d7f94908feaf986b19a2262586e56f520de7e

See more details on using hashes here.

Provenance

The following attestation bundles were made for forgery-0.4.0-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on williajm/forgery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file forgery-0.4.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: forgery-0.4.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 4.2 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for forgery-0.4.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 e2807f22812ed7be41daf6ca279908e72c35400c10dad8a79c9e673567dfb989
MD5 a054167d66aecd185736116155e3d640
BLAKE2b-256 92458cd702b7faf7a2bc66c9b1592f97fdbc3e21dd8daf3760a57326f9e2ec61

See more details on using hashes here.

Provenance

The following attestation bundles were made for forgery-0.4.0-cp312-cp312-win_amd64.whl:

Publisher: release.yml on williajm/forgery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file forgery-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for forgery-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a87dba22a04492526b780d1b6b1e88e1c651691617fbbea25a4d57b5b33d628d
MD5 788c223fa52b74b28fe4f4862df39639
BLAKE2b-256 f12755ec97006fd3e03278ebed87b5c598b8af17792660a99142ca2003b2026a

See more details on using hashes here.

Provenance

The following attestation bundles were made for forgery-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on williajm/forgery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file forgery-0.4.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for forgery-0.4.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f6fbb62c61c92063fcc08f8d92a147ba9049c2f2b1397a11b6d20fafa3a3a3e0
MD5 12897432869e0fd05044a73e1e76ce9b
BLAKE2b-256 379a1f1906c0108915ed3744a6b6f4001b064bd6ee414994bb66d8114080e743

See more details on using hashes here.

Provenance

The following attestation bundles were made for forgery-0.4.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on williajm/forgery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file forgery-0.4.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: forgery-0.4.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 4.2 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for forgery-0.4.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 868573f15088e9c6d864c717be5651399fe931e9bea65770526eb63bc42c3e4f
MD5 a609f782cecb951b78216ccf965b7838
BLAKE2b-256 17b420d00855a9a64bd783568c0c723090242fea99e3c5955f201e9d6c555d57

See more details on using hashes here.

Provenance

The following attestation bundles were made for forgery-0.4.0-cp311-cp311-win_amd64.whl:

Publisher: release.yml on williajm/forgery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file forgery-0.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for forgery-0.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a23871b672cc94dd2116e4ab27ffd1607cd9e1ea1f79d9233f89428efe1f2818
MD5 ce62804df4e5286b75e00f94abbb150a
BLAKE2b-256 dfaf0feea76b3d354660ebf2be72430415e4b0cb068b337ee9456108f7fd76f8

See more details on using hashes here.

Provenance

The following attestation bundles were made for forgery-0.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on williajm/forgery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file forgery-0.4.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for forgery-0.4.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c89c1f43a78e65120d8db50f2df068b7da90510ee80064a9650b54e1d2d1fd72
MD5 23fcd704662a7a75cabfba67d173645e
BLAKE2b-256 623a235ba496b635337f4b1e28c55ee873a2a4c36066151b671da2e70d92f48e

See more details on using hashes here.

Provenance

The following attestation bundles were made for forgery-0.4.0-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: release.yml on williajm/forgery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page