Generate realistic Indian fake data — Faker, but with India as a first-class citizen.
Project description
The Faker library India deserves. Generate production-quality Indian synthetic data in 8 languages.
हिन्दी | മലയാളം | தமிழ் | తెలుగు | বাংলা | ಕನ್ನಡ | ગુજરાતી | मराठी
indic-faker generates realistic Indian fake data — names in 8 native scripts, Aadhaar numbers with valid Verhoeff checksums, state-aware addresses with real pincodes, UPI IDs, salary data in LPA, company names (Pvt Ltd, LLP), IIT/NIT/IIM names, and batch export for AI/ML datasets. It's faker, but with India as a first-class citizen.
Every method that isn't India-specific (.ipv4(), .text(), .uuid4()) automatically passes through to vanilla Faker — so you only need one import.
⚡ Why indic-faker?
| Feature | vanilla Faker |
indic-faker |
|---|---|---|
| Names in 8 Indian scripts (native + Latin) | ❌ | ✅ |
| Aadhaar with Verhoeff checksum | ❌ | ✅ |
| GSTIN with valid state codes + checksum | ❌ | ✅ |
| INR formatting in lakhs/crores (₹12,34,567) | ❌ | ✅ |
| UPI IDs (@okicici, @ybl, @paytm) | ❌ | ✅ |
| Salary in LPA (₹18.5 LPA) | ❌ | ✅ |
profile() — complete Indian identity |
❌ | ✅ |
to_dataframe(1000) — batch for ML |
❌ | ✅ |
messy_address() with "Nr.", "Opp.", old city names |
❌ | ✅ |
| IIT / NIT / IIM / medical college names | ❌ | ✅ |
| Indian company names (Pvt Ltd, LLP, HUF) | ❌ | ✅ |
| CIN, DL, Voter ID, Passport numbers | ❌ | ✅ |
Faker pass-through (.ipv4() just works) |
N/A | ✅ |
🚀 Install
pip install indic-faker
For ML/AI dataset export (pandas DataFrame):
pip install indic-faker[ml]
That's it. No cloning, no setup — just install and use.
⚡ Quick Start
from indic_faker import IndicFaker
fake = IndicFaker(language="ml") # Malayalam
# Names — Latin by default, native on demand
fake.name() # "Rajesh Krishnan"
fake.name(script="native") # "രാജേഷ് കൃഷ്ണൻ"
# Indian ID numbers (algorithm-validated)
fake.aadhaar() # "3847 2918 4721" ← Verhoeff checksum ✓
fake.pan() # "ABCPK1234F"
fake.gstin() # "32ABCPK1234F1Z5" ← Kerala state code ✓
# Contact
fake.phone() # "+91 94471 82931"
fake.upi_id() # "rajesh.krishnan@okicici"
# Money — Indian comma system (lakhs/crores, not millions)
fake.amount_inr() # "₹4,29,150.00"
fake.salary_lpa() # "₹12.5 LPA"
# Address — state-aware with valid pincodes
fake.address() # "TC 14/2341, Pettah, TVM - 695024"
fake.messy_address() # "14/2341, Nr. Temple,Pettah, Trivandrum"
# Company & Career
fake.company_indian() # "Sharma Technologies Pvt. Ltd."
fake.college() # "IIT Bombay"
fake.job_title() # "Senior Software Engineer"
# Faker pass-through — these just work
fake.ipv4() # "192.168.1.1"
fake.text() # "Lorem ipsum..."
fake.uuid4() # "a1b2c3d4-..."
🔤 8 Indian Languages
Every name is available in both native script and Latin transliteration. Default is Latin for database compatibility.
| Code | Language | Script | Example (Native) | Example (Latin) |
|---|---|---|---|---|
hi |
Hindi | देवनागरी | दिनेश तिवारी | Dinesh Tiwari |
ml |
Malayalam | മലയാളം | രവി പണിക്കർ | Ravi Panikkar |
ta |
Tamil | தமிழ் | சதீஷ் சிவக்குமார் | Satheesh Sivakumar |
te |
Telugu | తెలుగు | నారాయణ కుమార్ | Narayana Kumar |
bn |
Bengali | বাংলা | রাহুল মিত্র | Rahul Mitra |
kn |
Kannada | ಕನ್ನಡ | ಸಂತೋಷ್ ಪಾಟೀಲ್ | Santosh Patil |
gu |
Gujarati | ગુજરાતી | તેજસ પંડ્યા | Tejas Pandya |
mr |
Marathi | मराठी | योगेश पवार | Yogesh Pawar |
# Switch languages at any time
fake = IndicFaker(language="ta")
fake.name() # "Murugan Natarajan"
fake.name(script="native") # "முருகன் நடராஜன்"
fake.set_language("bn") # Switch to Bengali mid-session
fake.name(script="native") # "সৌরভ গাঙ্গুলী"
👤 Profile — Complete Indian Identity
Generate a consistent, complete identity in a single call. Every field belongs to the same person.
profile = fake.profile()
{
"name": "Rajesh Krishnan",
"name_native": "രാജേഷ് കൃഷ്ണൻ",
"gender": "male",
"dob": "15/08/1990",
"age": 35,
"language": "ml",
"aadhaar": "3847 2918 4721",
"pan": "ABCPK1234F",
"phone": "+91 94471 82931",
"email": "rajesh.krishnan@gmail.com",
"address": "TC 14/2341, Pettah, TVM - 695024",
"city": "Thiruvananthapuram",
"state": "Kerala",
"pincode": "695024",
"bank_account": {"ifsc": "SBIN0001234", "account": "38291847291", "bank": "SBI"},
"upi_id": "rajesh.krishnan@okicici",
"employer": "Infosys",
"job_title": "Software Engineer",
"salary": "₹12.5 LPA",
"college": "NIT Calicut",
"degree": "B.Tech"
}
Pick specific fields only:
fake.profile(fields=["name", "aadhaar", "phone", "email"])
📊 Batch Generation for AI/ML
The killer feature for data scientists. Generate thousands of realistic Indian records instantly.
# 🔥 Generate 1000 records as pandas DataFrame
df = fake.to_dataframe(1000)
df.to_csv("indian_test_data.csv", index=False)
# Custom fields only
df = fake.to_dataframe(500, fields=["name", "phone", "city", "salary"])
# JSON output
json_str = fake.to_json(100)
# CSV string
csv_str = fake.to_csv(100)
# Raw list of dicts
records = fake.generate_batch(100)
🆔 ID Numbers
All generated with correct formats and validated checksums. These pass real-world format validation.
# Aadhaar — Verhoeff checksum validated (not just random 12 digits)
fake.aadhaar() # "3847 2918 4721"
fake.aadhaar(formatted=False) # "384729184721"
# PAN — entity-type aware
fake.pan() # "ABCPK1234F" (Person)
fake.pan(entity_type="C") # "XYZCK5678G" (Company)
# GSTIN — correct state code + modular checksum
fake.gstin() # "32ABCPK1234F1Z5"
fake.gstin(state="MH") # "27..." (Maharashtra)
# Others
fake.dl_number() # "KL-09-2020-0012345"
fake.voter_id() # "ABC1234567"
fake.passport() # "A1234567"
📍 Addresses
State-aware with valid pincodes, building number formats, and landmarks.
fake.address() # "14/2341, MG Road, Pettah, TVM - 695024"
fake.full_address() # "TC 14/2341, MG Road, Near Temple, Pettah, Kerala - 695024"
fake.city() # "Thiruvananthapuram"
fake.district() # "Ernakulam"
fake.village() # "Punalur, Kollam District, Kerala"
fake.pincode() # "695024" (valid for Kerala)
fake.landmark() # "Near Government Hospital"
# 🔥 Messy address — simulates real-world Indian user input
fake.messy_address()
# "14/2341, Nr. Temple,Pettah, Trivandrum"
# ↑ abbreviations ↑ old city names ↑ missing pincode
💰 Finance
# INR with Indian comma system (lakhs/crores, NOT millions/billions)
fake.amount_inr() # "₹4,29,150.00"
fake.amount_inr(100000, 10000000) # "₹45,82,391.20"
# Banking
fake.bank_account() # {"ifsc": "SBIN0001234", "account": "38291847291", "bank": "SBI"}
fake.ifsc() # "HDFC0001234"
fake.upi_id() # "rajesh.k@okicici"
fake.bank_name() # "HDFC Bank"
fake.credit_card_indian() # "4532 1234 5678 9012"
🏢 Company Data
fake.company_indian() # "Sharma Technologies Pvt. Ltd."
fake.company_type() # "LLP"
fake.cin() # "U12345MH2020PLC123456"
fake.gst_invoice() # "INV/2024-25/001234"
🎓 Education
fake.college() # "IIT Bombay"
fake.iit() # "IIT Madras"
fake.nit() # "NIT Trichy"
fake.iim() # "IIM Ahmedabad"
fake.medical_college() # "AIIMS Delhi"
fake.university() # "Anna University"
fake.degree() # "B.Tech"
fake.engineering_branch() # "Computer Science"
fake.education_record() # {"college": "BITS Pilani", "degree": "B.Tech", ...}
💼 Jobs & Salary
fake.job_title() # "Senior Software Engineer"
fake.employer() # "Razorpay"
fake.salary_lpa() # "₹18.5 LPA"
fake.salary_lpa(level="fresher") # "₹5.2 LPA"
fake.salary_lpa(level="cxo") # "₹180.0 LPA"
fake.salary_monthly() # "₹1,54,167"
fake.job_record() # {"title": ..., "employer": ..., "salary_lpa": ...}
Salary bands: fresher → junior → mid → senior → lead → director → vp → cxo
📅 Indian Dates
fake.date_indian() # "15/08/2024" (DD/MM/YYYY, not MM/DD)
fake.date_of_birth() # "23/11/1990"
fake.financial_year() # "2024-25" (Indian FY format)
🌍 Cross-State Generation
India has massive internal migration. Decouple language from geography:
# Tamil name living in Delhi
fake = IndicFaker(language="ta", state="DL")
fake.name() # "Murugan Natarajan" ← Tamil name
fake.address() # "H.No. 123, Dwarka, Delhi - 110075" ← Delhi address
fake.gstin() # "07..." ← Delhi GST code
🎲 Reproducibility
fake1 = IndicFaker(seed=42)
fake2 = IndicFaker(seed=42)
assert fake1.name() == fake2.name() # Always True
assert fake1.aadhaar() == fake2.aadhaar() # Always True
📋 Complete API Reference
Person
| Method | Returns | Example |
|---|---|---|
name() |
Full name (Latin) | "Rajesh Krishnan" |
name(script="native") |
Full name (native) | "രാജേഷ് കൃഷ്ണൻ" |
name_male() |
Male name | "Arjun Sharma" |
name_female() |
Female name | "Priya Nair" |
first_name() |
Random first name | "Harish" |
last_name() |
Surname | "Patel" |
prefix() |
Honorific | "Mr." / "श्री" |
ID Numbers
| Method | Returns | Example |
|---|---|---|
aadhaar() |
Aadhaar (Verhoeff ✓) | "3847 2918 4721" |
pan() |
PAN number | "ABCPK1234F" |
gstin() |
GSTIN (checksum ✓) | "32ABCPK1234F1Z5" |
dl_number() |
Driving License | "KL-09-2020-0012345" |
voter_id() |
Voter ID | "ABC1234567" |
passport() |
Passport | "A1234567" |
Address
| Method | Returns | Example |
|---|---|---|
address() |
Full address | "14/2341, MG Road, TVM - 695024" |
messy_address() |
Messy address | "14/2341, Nr. Temple,TVM" |
city() |
City name | "Thiruvananthapuram" |
pincode() |
Valid pincode | "695024" |
village() |
Village + district | "Punalur, Kollam, Kerala" |
landmark() |
Landmark | "Near Govt Hospital" |
Finance
| Method | Returns | Example |
|---|---|---|
amount_inr() |
INR amount | "₹4,29,150.00" |
bank_account() |
Bank details dict | {"ifsc": ..., "account": ..., "bank": ...} |
upi_id() |
UPI ID | "rajesh.k@okicici" |
ifsc() |
IFSC code | "SBIN0001234" |
Company
| Method | Returns | Example |
|---|---|---|
company_indian() |
Company name | "Sharma Tech Pvt. Ltd." |
company_type() |
Type | "LLP" |
cin() |
CIN number | "U12345MH2020PLC123456" |
gst_invoice() |
Invoice number | "INV/2024-25/001234" |
Education
| Method | Returns | Example |
|---|---|---|
college() |
College name | "IIT Bombay" |
iit() / nit() / iim() |
Premier institute | "IIT Madras" |
degree() |
Degree | "B.Tech" |
education_record() |
Full record dict | {"college": ..., "cgpa": 8.45} |
Job & Salary
| Method | Returns | Example |
|---|---|---|
job_title() |
Job title | "Senior Software Engineer" |
employer() |
Employer | "Razorpay" |
salary_lpa() |
Salary in LPA | "₹18.5 LPA" |
salary_monthly() |
Monthly salary | "₹1,54,167" |
job_record() |
Full record dict | {"title": ..., "salary_lpa": ...} |
Batch Generation
| Method | Returns | Example |
|---|---|---|
profile() |
Complete identity dict | {"name": ..., "aadhaar": ...} |
generate_batch(n) |
List of n profiles | [{...}, {...}, ...] |
to_csv(n) |
CSV string | "name,aadhaar,...\n..." |
to_json(n) |
JSON string | '[{"name": ...}]' |
to_dataframe(n) |
pandas DataFrame | DataFrame(n rows) |
🤝 Contributing
We welcome contributions! Here's how you can help make indic-faker even better:
- 🔤 Add a new language — Create
src/indic_faker/data/names/<lang_code>.py - 🏘️ Expand address data — Add more cities/villages/districts
- 💡 New provider ideas — Create a provider in
src/indic_faker/providers/ - 🐛 Bug fixes — Found wrong data? Open an issue or PR
# Development setup
git clone https://github.com/adwaith-0/indic-faker.git
cd indic-faker
pip install -e ".[dev]"
pytest tests/ -v
All 86 tests must pass before submitting a PR.
📜 License
MIT License — free for everyone, forever. Use it in personal projects, startups, enterprises, and everything in between.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file indic_faker-0.2.0.tar.gz.
File metadata
- Download URL: indic_faker-0.2.0.tar.gz
- Upload date:
- Size: 60.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ffcea6d37d19388d5b438f16752b9b6c51361c6136ae138da2a406c56b6e47c
|
|
| MD5 |
8ac39184881d3c5d87eccd6bf8c7b958
|
|
| BLAKE2b-256 |
d17b8bc932fcbe09c7a3d7bee820429cdca13173c29da8c45008cb3a1c861d80
|
File details
Details for the file indic_faker-0.2.0-py3-none-any.whl.
File metadata
- Download URL: indic_faker-0.2.0-py3-none-any.whl
- Upload date:
- Size: 56.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f80edcc45fc62083c0b3c93aaa053565079c6127d8077dc894b6c74cc88366d2
|
|
| MD5 |
efc1125e3189fd052711a50ba076b3b4
|
|
| BLAKE2b-256 |
45dae627dd701640aedbe9510dea11c47088ef88980427a14f59181b816cb5dd
|