Skip to main content

Comprehensive fake data generator for Taiwanese (zh_TW) data

Project description

nlpFaker

PyPI version Python CI License: MIT

A comprehensive fake data generator built specifically for Taiwan (zh_TW). Zero dependencies.

Features

  • 16 generators covering personal, contact, business, finance, date/time, and vehicle data
  • Taiwan-specific: National IDs with valid checksums, 統一編號, ROC calendar dates, 09XX mobile numbers, real city/district/road addresses
  • Checksum validation: generated IDs, tax IDs, and credit cards pass real validation algorithms
  • Reproducible: seed support for deterministic output
  • Zero dependencies: stdlib only, installs in seconds
  • CLI included: generate data from the command line
  • Export: JSON, CSV, SQL INSERT out of the box

Architecture

classDiagram
    class BaseGenerator {
        <<abstract>>
        +_rng: Random
        +_load_lines(filename) list~str~
        +_load_weighted(filename) tuple
        +_weighted_choice(items, weights) str
        +generate()* str
    }

    class NlpFaker {
        +_rng: Random
        +name() str
        +id_number() str
        +mobile() str
        +address() str
        +email() str
        +company() str
        +landline() str
        +date() str
        +date_roc() str
        +credit_card() str
        +profile() dict
        +batch(field, n) list
    }

    BaseGenerator <|-- NameGenerator
    BaseGenerator <|-- IDNumberGenerator
    BaseGenerator <|-- MobileGenerator
    BaseGenerator <|-- AddressGenerator
    BaseGenerator <|-- EmailGenerator
    BaseGenerator <|-- CompanyGenerator
    BaseGenerator <|-- LandlineGenerator
    BaseGenerator <|-- DateTimeGenerator
    BaseGenerator <|-- LicensePlateGenerator
    BaseGenerator <|-- TaxIDGenerator
    BaseGenerator <|-- InvoiceGenerator
    BaseGenerator <|-- CreditCardGenerator
    BaseGenerator <|-- BankAccountGenerator
    BaseGenerator <|-- JobGenerator
    BaseGenerator <|-- CredentialGenerator

    NlpFaker --> NameGenerator : lazy init
    NlpFaker --> IDNumberGenerator : lazy init
    NlpFaker --> MobileGenerator : lazy init
    NlpFaker --> AddressGenerator : lazy init
    NlpFaker --> EmailGenerator : lazy init
    NlpFaker --> CompanyGenerator : lazy init
    NlpFaker --> LandlineGenerator : lazy init
    NlpFaker --> DateTimeGenerator : lazy init
    NlpFaker --> LicensePlateGenerator : lazy init
    NlpFaker --> TaxIDGenerator : lazy init
    NlpFaker --> InvoiceGenerator : lazy init
    NlpFaker --> CreditCardGenerator : lazy init
    NlpFaker --> BankAccountGenerator : lazy init
    NlpFaker --> JobGenerator : lazy init
    NlpFaker --> CredentialGenerator : lazy init

    class Exporter {
        +to_json(records, path)$
        +to_csv(records, path)$
        +to_sql(records, table, path)$
    }

    class Validators {
        +validate_id_number(id_str)$ bool
        +validate_tax_id(tax_str)$ bool
        +validate_mobile(mobile_str)$ bool
        +validate_credit_card(card_str)$ bool
    }

Installation

pip install nlpFaker

Or with uv:

uv add nlpFaker

Quick Start

from nlpFaker import NlpFaker

fake = NlpFaker(seed=42)

fake.name()           # "陳怡君"
fake.id_number()      # "A123456789"
fake.mobile()         # "0912345678"
fake.address()        # "台北市大安區和平東路二段42號3樓"
fake.email()          # "ming.chen@gmail.com"
fake.company()        # "台灣科技股份有限公司"
fake.date_roc()       # "民國113年03月15日"
fake.credit_card()    # "4539123456789012"

API Reference

Personal & Identity

Method Description Example
name() Full name (last + first) 陳怡君
first_name() First name (1-2 chars) 怡君
last_name() Last name
id_number() National ID with valid checksum A123456789
username() Random username coolstar42
password(length=12) Mixed-character password xK9#mP2qLf!3

Contact

Method Description Example
mobile() Mobile phone (09XX format) 0912345678
landline() Landline with area code (02) 27481234
email() Email with TW domains ming.chen@gmail.com
address() Full Taiwan address 台北市大安區和平東路二段42號3樓
postal_code() 3-digit postal code 106

Business

Method Description Example
company() Company name 台灣科技股份有限公司
job_title() Job title 軟體工程師
tax_id() 統一編號 with valid checksum 12345670
invoice() 統一發票 number AB-12345678

Finance

Method Description Example
credit_card(card_type=None) Luhn-valid (Visa/MC/JCB) 4539123456789012
bank_account() Bank code + account number 812-1234567890
bank_code() 3-digit bank code 812

Date & Time

Method Description Example
date(start, end) Date (YYYY/MM/DD) 2024/03/15
date_roc(start, end) ROC calendar (民國) 民國113年03月15日
time() Time (HH:MM:SS) 14:30:22
datetime(start, end) Full datetime 2024/03/15 14:30:22

Vehicle

Method Description Example
license_plate(vehicle="car") TW license plate ABC-1234

Profile & Batch Generation

# Complete fake person profile
profile = fake.profile()
# {'name': '陳怡君', 'id_number': 'A123456789', 'email': '...',
#  'mobile': '...', 'landline': '...', 'address': '...',
#  'postal_code': '...', 'company': '...', 'job_title': '...',
#  'birthday': '...', 'credit_card': '...', 'bank_account': '...',
#  'username': '...'}

# Batch generation
names = fake.batch("name", 100)
ids = fake.batch("id_number", 50)

Validation

Validate real or generated Taiwan data:

from nlpFaker import validate_id_number, validate_tax_id, validate_mobile, validate_credit_card

validate_id_number("A123456789")  # True/False — checksum verification
validate_tax_id("12345670")       # True/False — 統一編號 checksum
validate_mobile("0912345678")     # True/False — format check
validate_credit_card("4539...")   # True/False — Luhn algorithm

Export

from nlpFaker import NlpFaker, Exporter

fake = NlpFaker(seed=1)
records = [fake.profile() for _ in range(1000)]

Exporter.to_json(records, "data.json")
Exporter.to_csv(records, "data.csv")
Exporter.to_sql(records, "people", "data.sql")

CLI

nlpfaker name -n 10                    # 10 random names
nlpfaker profile -n 5 -f json          # 5 profiles as JSON
nlpfaker id_number -s 42               # reproducible with seed
nlpfaker mobile -n 100 -o phones.txt   # output to file

Development

git clone https://github.com/danghoangnhan/nlpFaker.git
cd nlpFaker
uv sync --group dev
uv run python -m pytest -v
uv run ruff check nlpFaker/

License

MIT


中文說明

簡介

nlpFaker 是一個專為 台灣 (zh_TW) 設計的假資料產生器,零外部相依套件。

提供 16 種產生器,涵蓋個人資料、聯絡方式、商業、金融、日期時間及車輛資訊,所有產生的身分證字號、統一編號、信用卡號碼皆通過真實的校驗演算法驗證。

特色

  • 台灣專屬:身分證字號(含檢查碼)、統一編號、民國年曆、09XX 手機號碼、真實縣市/區/路名地址
  • 16 種產生器:姓名、身分證、手機、地址、電子郵件、公司名稱、市話、日期時間、車牌、郵遞區號、統一編號、統一發票、信用卡、銀行帳號、職稱、帳號密碼
  • 可重現:支援 seed 參數,輸出結果可重現
  • 零相依:僅使用 Python 標準函式庫
  • 內建 CLI:可從命令列直接產生資料
  • 匯出功能:支援 JSON、CSV、SQL INSERT

快速開始

from nlpFaker import NlpFaker

fake = NlpFaker(seed=42)

fake.name()           # 產生姓名,例如 "陳怡君"
fake.id_number()      # 產生身分證字號(含有效檢查碼)
fake.mobile()         # 產生手機號碼,例如 "0912345678"
fake.address()        # 產生完整地址
fake.company()        # 產生公司名稱
fake.date_roc()       # 產生民國日期,例如 "民國113年03月15日"
fake.tax_id()         # 產生統一編號(含有效檢查碼)
fake.credit_card()    # 產生信用卡號碼(通過 Luhn 演算法)

# 產生完整個人資料
profile = fake.profile()

# 批次產生
names = fake.batch("name", 100)  # 產生 100 個姓名

驗證功能

from nlpFaker import validate_id_number, validate_tax_id

validate_id_number("A123456789")  # 驗證身分證字號檢查碼
validate_tax_id("12345670")       # 驗證統一編號檢查碼

命令列工具

nlpfaker name -n 10              # 產生 10 個隨機姓名
nlpfaker profile -n 5 -f json    # 產生 5 筆完整個人資料(JSON 格式)
nlpfaker id_number -s 42         # 使用固定種子產生身分證字號

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlpfaker-2.0.1.tar.gz (37.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nlpfaker-2.0.1-py3-none-any.whl (32.4 kB view details)

Uploaded Python 3

File details

Details for the file nlpfaker-2.0.1.tar.gz.

File metadata

  • Download URL: nlpfaker-2.0.1.tar.gz
  • Upload date:
  • Size: 37.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nlpfaker-2.0.1.tar.gz
Algorithm Hash digest
SHA256 68cabaa60b1861d65b6103c2acff649dd7a502722573933d9f3e28d91101b74f
MD5 d6f509fa4244437429b2c3e11e33a719
BLAKE2b-256 e6ed5efcc42e5ee33efeff81d935b948fa599c9bcee0b1ec351b7fe898fb4f50

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlpfaker-2.0.1.tar.gz:

Publisher: publish.yml on danghoangnhan/nlpFaker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nlpfaker-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: nlpfaker-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 32.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nlpfaker-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ac93a2f91b24177c3bd81f9403c05552d97788d75c0c4961758e3665ba7c765f
MD5 33554afb8fa039fc5392e29d646c3c20
BLAKE2b-256 bf4f5c288fdb771089a59c7b435f92934e24020c7543cadd3ce27214870b6a30

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlpfaker-2.0.1-py3-none-any.whl:

Publisher: publish.yml on danghoangnhan/nlpFaker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page