Skip to main content

Generate realistic Indian fake data — Faker, but with India as a first-class citizen.

Project description

indic-faker banner

The Faker library India deserves. Generate production-quality Indian synthetic data in 8 languages.

हिन्दी | മലയാളം | தமிழ் | తెలుగు | বাংলা | ಕನ್ನಡ | ગુજરાતી | मराठी

Build Status PyPI Version Python 3.8+ License: MIT
ML Ready 8 Languages Verhoeff Checksum

indic-faker generates realistic Indian fake data — names in 8 native scripts, Aadhaar numbers with valid Verhoeff checksums, state-aware addresses with real pincodes, UPI IDs, salary data in LPA, company names (Pvt Ltd, LLP), IIT/NIT/IIM names, and batch export for AI/ML datasets. It's faker, but with India as a first-class citizen.

Every method that isn't India-specific (.ipv4(), .text(), .uuid4()) automatically passes through to vanilla Faker — so you only need one import.

⚡ Why indic-faker?

Feature vanilla Faker indic-faker
Names in 8 Indian scripts (native + Latin)
Aadhaar with Verhoeff checksum
GSTIN with valid state codes + checksum
INR formatting in lakhs/crores (₹12,34,567)
UPI IDs (@okicici, @ybl, @paytm)
Salary in LPA (₹18.5 LPA)
profile() — complete Indian identity
to_dataframe(1000) — batch for ML
messy_address() with "Nr.", "Opp.", old city names
IIT / NIT / IIM / medical college names
Indian company names (Pvt Ltd, LLP, HUF)
CIN, DL, Voter ID, Passport numbers
Faker pass-through (.ipv4() just works) N/A

🚀 Install

pip install indic-faker

For ML/AI dataset export (pandas DataFrame):

pip install indic-faker[ml]

That's it. No cloning, no setup — just install and use.

⚡ Quick Start

from indic_faker import IndicFaker

fake = IndicFaker(language="ml")  # Malayalam

# Names — Latin by default, native on demand
fake.name()                    # "Rajesh Krishnan"
fake.name(script="native")    # "രാജേഷ് കൃഷ്ണൻ"

# Indian ID numbers (algorithm-validated)
fake.aadhaar()                 # "3847 2918 4721"  ← Verhoeff checksum ✓
fake.pan()                     # "ABCPK1234F"
fake.gstin()                   # "32ABCPK1234F1Z5" ← Kerala state code ✓

# Contact
fake.phone()                   # "+91 94471 82931"
fake.upi_id()                  # "rajesh.krishnan@okicici"

# Money — Indian comma system (lakhs/crores, not millions)
fake.amount_inr()              # "₹4,29,150.00"
fake.salary_lpa()              # "₹12.5 LPA"

# Address — state-aware with valid pincodes
fake.address()                 # "TC 14/2341, Pettah, TVM - 695024"
fake.messy_address()           # "14/2341, Nr. Temple,Pettah, Trivandrum"

# Company & Career
fake.company_indian()          # "Sharma Technologies Pvt. Ltd."
fake.college()                 # "IIT Bombay"
fake.job_title()               # "Senior Software Engineer"

# Faker pass-through — these just work
fake.ipv4()                    # "192.168.1.1"
fake.text()                    # "Lorem ipsum..."
fake.uuid4()                   # "a1b2c3d4-..."

🔤 8 Indian Languages

Every name is available in both native script and Latin transliteration. Default is Latin for database compatibility.

Code Language Script Example (Native) Example (Latin)
hi Hindi देवनागरी दिनेश तिवारी Dinesh Tiwari
ml Malayalam മലയാളം രവി പണിക്കർ Ravi Panikkar
ta Tamil தமிழ் சதீஷ் சிவக்குமார் Satheesh Sivakumar
te Telugu తెలుగు నారాయణ కుమార్ Narayana Kumar
bn Bengali বাংলা রাহুল মিত্র Rahul Mitra
kn Kannada ಕನ್ನಡ ಸಂತೋಷ್ ಪಾಟೀಲ್ Santosh Patil
gu Gujarati ગુજરાતી તેજસ પંડ્યા Tejas Pandya
mr Marathi मराठी योगेश पवार Yogesh Pawar
# Switch languages at any time
fake = IndicFaker(language="ta")
fake.name()                       # "Murugan Natarajan"
fake.name(script="native")        # "முருகன் நடராஜன்"

fake.set_language("bn")           # Switch to Bengali mid-session
fake.name(script="native")        # "সৌরভ গাঙ্গুলী"

👤 Profile — Complete Indian Identity

Generate a consistent, complete identity in a single call. Every field belongs to the same person.

profile = fake.profile()
{
  "name": "Rajesh Krishnan",
  "name_native": "രാജേഷ് കൃഷ്ണൻ",
  "gender": "male",
  "dob": "15/08/1990",
  "age": 35,
  "language": "ml",
  "aadhaar": "3847 2918 4721",
  "pan": "ABCPK1234F",
  "phone": "+91 94471 82931",
  "email": "rajesh.krishnan@gmail.com",
  "address": "TC 14/2341, Pettah, TVM - 695024",
  "city": "Thiruvananthapuram",
  "state": "Kerala",
  "pincode": "695024",
  "bank_account": {"ifsc": "SBIN0001234", "account": "38291847291", "bank": "SBI"},
  "upi_id": "rajesh.krishnan@okicici",
  "employer": "Infosys",
  "job_title": "Software Engineer",
  "salary": "₹12.5 LPA",
  "college": "NIT Calicut",
  "degree": "B.Tech"
}

Pick specific fields only:

fake.profile(fields=["name", "aadhaar", "phone", "email"])

📊 Batch Generation for AI/ML

The killer feature for data scientists. Generate thousands of realistic Indian records instantly.

# 🔥 Generate 1000 records as pandas DataFrame
df = fake.to_dataframe(1000)
df.to_csv("indian_test_data.csv", index=False)

# Custom fields only
df = fake.to_dataframe(500, fields=["name", "phone", "city", "salary"])

# JSON output
json_str = fake.to_json(100)

# CSV string
csv_str = fake.to_csv(100)

# Raw list of dicts
records = fake.generate_batch(100)

🆔 ID Numbers

All generated with correct formats and validated checksums. These pass real-world format validation.

# Aadhaar — Verhoeff checksum validated (not just random 12 digits)
fake.aadhaar()                     # "3847 2918 4721"
fake.aadhaar(formatted=False)      # "384729184721"

# PAN — entity-type aware
fake.pan()                         # "ABCPK1234F" (Person)
fake.pan(entity_type="C")          # "XYZCK5678G" (Company)

# GSTIN — correct state code + modular checksum
fake.gstin()                       # "32ABCPK1234F1Z5"
fake.gstin(state="MH")             # "27..." (Maharashtra)

# Others
fake.dl_number()                   # "KL-09-2020-0012345"
fake.voter_id()                    # "ABC1234567"
fake.passport()                    # "A1234567"

📍 Addresses

State-aware with valid pincodes, building number formats, and landmarks.

fake.address()             # "14/2341, MG Road, Pettah, TVM - 695024"
fake.full_address()        # "TC 14/2341, MG Road, Near Temple, Pettah, Kerala - 695024"
fake.city()                # "Thiruvananthapuram"
fake.district()            # "Ernakulam"
fake.village()             # "Punalur, Kollam District, Kerala"
fake.pincode()             # "695024" (valid for Kerala)
fake.landmark()            # "Near Government Hospital"

# 🔥 Messy address — simulates real-world Indian user input
fake.messy_address()
# "14/2341, Nr. Temple,Pettah, Trivandrum"
#  ↑ abbreviations  ↑ old city names  ↑ missing pincode

💰 Finance

# INR with Indian comma system (lakhs/crores, NOT millions/billions)
fake.amount_inr()                      # "₹4,29,150.00"
fake.amount_inr(100000, 10000000)      # "₹45,82,391.20"

# Banking
fake.bank_account()  # {"ifsc": "SBIN0001234", "account": "38291847291", "bank": "SBI"}
fake.ifsc()                            # "HDFC0001234"
fake.upi_id()                          # "rajesh.k@okicici"
fake.bank_name()                       # "HDFC Bank"
fake.credit_card_indian()              # "4532 1234 5678 9012"

🏢 Company Data

fake.company_indian()    # "Sharma Technologies Pvt. Ltd."
fake.company_type()      # "LLP"
fake.cin()               # "U12345MH2020PLC123456"
fake.gst_invoice()       # "INV/2024-25/001234"

🎓 Education

fake.college()                  # "IIT Bombay"
fake.iit()                      # "IIT Madras"
fake.nit()                      # "NIT Trichy"
fake.iim()                      # "IIM Ahmedabad"
fake.medical_college()          # "AIIMS Delhi"
fake.university()               # "Anna University"
fake.degree()                   # "B.Tech"
fake.engineering_branch()       # "Computer Science"
fake.education_record()         # {"college": "BITS Pilani", "degree": "B.Tech", ...}

💼 Jobs & Salary

fake.job_title()                   # "Senior Software Engineer"
fake.employer()                    # "Razorpay"
fake.salary_lpa()                  # "₹18.5 LPA"
fake.salary_lpa(level="fresher")   # "₹5.2 LPA"
fake.salary_lpa(level="cxo")       # "₹180.0 LPA"
fake.salary_monthly()              # "₹1,54,167"
fake.job_record()                  # {"title": ..., "employer": ..., "salary_lpa": ...}

Salary bands: fresherjuniormidseniorleaddirectorvpcxo

📅 Indian Dates

fake.date_indian()         # "15/08/2024"  (DD/MM/YYYY, not MM/DD)
fake.date_of_birth()       # "23/11/1990"
fake.financial_year()      # "2024-25"     (Indian FY format)

🌍 Cross-State Generation

India has massive internal migration. Decouple language from geography:

# Tamil name living in Delhi
fake = IndicFaker(language="ta", state="DL")
fake.name()       # "Murugan Natarajan"   ← Tamil name
fake.address()    # "H.No. 123, Dwarka, Delhi - 110075"  ← Delhi address
fake.gstin()      # "07..."               ← Delhi GST code

🎲 Reproducibility

fake1 = IndicFaker(seed=42)
fake2 = IndicFaker(seed=42)

assert fake1.name() == fake2.name()       # Always True
assert fake1.aadhaar() == fake2.aadhaar() # Always True

📋 Complete API Reference

Person
Method Returns Example
name() Full name (Latin) "Rajesh Krishnan"
name(script="native") Full name (native) "രാജേഷ് കൃഷ്ണൻ"
name_male() Male name "Arjun Sharma"
name_female() Female name "Priya Nair"
first_name() Random first name "Harish"
last_name() Surname "Patel"
prefix() Honorific "Mr." / "श्री"
ID Numbers
Method Returns Example
aadhaar() Aadhaar (Verhoeff ✓) "3847 2918 4721"
pan() PAN number "ABCPK1234F"
gstin() GSTIN (checksum ✓) "32ABCPK1234F1Z5"
dl_number() Driving License "KL-09-2020-0012345"
voter_id() Voter ID "ABC1234567"
passport() Passport "A1234567"
Address
Method Returns Example
address() Full address "14/2341, MG Road, TVM - 695024"
messy_address() Messy address "14/2341, Nr. Temple,TVM"
city() City name "Thiruvananthapuram"
pincode() Valid pincode "695024"
village() Village + district "Punalur, Kollam, Kerala"
landmark() Landmark "Near Govt Hospital"
Finance
Method Returns Example
amount_inr() INR amount "₹4,29,150.00"
bank_account() Bank details dict {"ifsc": ..., "account": ..., "bank": ...}
upi_id() UPI ID "rajesh.k@okicici"
ifsc() IFSC code "SBIN0001234"
Company
Method Returns Example
company_indian() Company name "Sharma Tech Pvt. Ltd."
company_type() Type "LLP"
cin() CIN number "U12345MH2020PLC123456"
gst_invoice() Invoice number "INV/2024-25/001234"
Education
Method Returns Example
college() College name "IIT Bombay"
iit() / nit() / iim() Premier institute "IIT Madras"
degree() Degree "B.Tech"
education_record() Full record dict {"college": ..., "cgpa": 8.45}
Job & Salary
Method Returns Example
job_title() Job title "Senior Software Engineer"
employer() Employer "Razorpay"
salary_lpa() Salary in LPA "₹18.5 LPA"
salary_monthly() Monthly salary "₹1,54,167"
job_record() Full record dict {"title": ..., "salary_lpa": ...}
Batch Generation
Method Returns Example
profile() Complete identity dict {"name": ..., "aadhaar": ...}
generate_batch(n) List of n profiles [{...}, {...}, ...]
to_csv(n) CSV string "name,aadhaar,...\n..."
to_json(n) JSON string '[{"name": ...}]'
to_dataframe(n) pandas DataFrame DataFrame(n rows)

🤝 Contributing

We welcome contributions! Here's how you can help make indic-faker even better:

  • 🔤 Add a new language — Create src/indic_faker/data/names/<lang_code>.py
  • 🏘️ Expand address data — Add more cities/villages/districts
  • 💡 New provider ideas — Create a provider in src/indic_faker/providers/
  • 🐛 Bug fixes — Found wrong data? Open an issue or PR
# Development setup
git clone https://github.com/adwaith-0/indic-faker.git
cd indic-faker
pip install -e ".[dev]"
pytest tests/ -v

All 86 tests must pass before submitting a PR.


📜 License

MIT License — free for everyone, forever. Use it in personal projects, startups, enterprises, and everything in between.



Built with ❤️ for Indian developers

Because AI deserves real Indian test data, not "John Smith, 123 Main St"


Star this repo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indic_faker-0.2.0.tar.gz (60.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

indic_faker-0.2.0-py3-none-any.whl (56.4 kB view details)

Uploaded Python 3

File details

Details for the file indic_faker-0.2.0.tar.gz.

File metadata

  • Download URL: indic_faker-0.2.0.tar.gz
  • Upload date:
  • Size: 60.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for indic_faker-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0ffcea6d37d19388d5b438f16752b9b6c51361c6136ae138da2a406c56b6e47c
MD5 8ac39184881d3c5d87eccd6bf8c7b958
BLAKE2b-256 d17b8bc932fcbe09c7a3d7bee820429cdca13173c29da8c45008cb3a1c861d80

See more details on using hashes here.

File details

Details for the file indic_faker-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: indic_faker-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 56.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for indic_faker-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f80edcc45fc62083c0b3c93aaa053565079c6127d8077dc894b6c74cc88366d2
MD5 efc1125e3189fd052711a50ba076b3b4
BLAKE2b-256 45dae627dd701640aedbe9510dea11c47088ef88980427a14f59181b816cb5dd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page