A simple library for generating short, unique identifiers using characters from various Indian language scripts.
Project description
Indic UID
Generate short, unique identifiers using characters from Indian language scripts (Devanagari, Gujarati, Kannada, Tamil, Telugu, Bengali) and English.
Perfect for referral codes, short URLs, or any system requiring unique identifiers with an Indian language touch!
✨ Features
- 🎯 Multiple Scripts: Support for 7 Indian scripts (Devanagari, Gujarati, Kannada, Tamil, Telugu, Bengali, English)
- 🔐 Cryptographically Secure: Uses Python's
secretsmodule for secure random generation - 🎲 Flexible Generation: Choose between pure random or pronounceable IDs
- ✅ Validation Built-in: Validate IDs and detect their script
- 📊 Collision Analysis: Calculate probability of ID collisions
- 🚀 Zero Dependencies: Lightweight, no external dependencies
- 🎨 Simple API: Easy to use, intuitive interface
📦 Installation
pip install indic-uid
Note
The package is installed as indic-uid, but imported in Python as indic_uid
🚀 Quick Start
from indic_uid import generate_id
# Generate a random 6-character ID
id = generate_id()
print(id) # Example: 'कஆಗઘతঅ'
# Generate with specific length
id = generate_id(length=8)
print(id) # Example: 'अкஆগઘతaక'
# Generate from specific script
id = generate_id(scripts=['devanagari'])
print(id) # Example: 'घअकआइश'
# Generate pronounceable IDs (alternates vowels and consonants)
id = generate_id(pronounceable=True, scripts=['devanagari'])
print(id) # Example: 'कअखइगउ'
📖 Usage Examples
Basic ID Generation
from indic_uid import generate_id, generate_batch
# Default: 6 characters, all scripts
id = generate_id()
# Custom length
short_id = generate_id(length=4)
long_id = generate_id(length=10)
# Generate multiple IDs at once
ids = generate_batch(count=10, length=5)
for id in ids:
print(id)
Script Selection
# Single script
devanagari_id = generate_id(scripts=['devanagari'])
tamil_id = generate_id(scripts=['tamil'])
# Multiple scripts
mixed_id = generate_id(scripts=['devanagari', 'tamil', 'gujarati'])
# Available scripts
from indic_uid import get_available_scripts
print(get_available_scripts())
# Output: ['devanagari', 'gujarati', 'kannada', 'tamil', 'telugu', 'bengali', 'english']
Pronounceable IDs
Generate IDs that alternate between consonants and vowels for easier pronunciation:
# Pronounceable ID (easier to speak)
id = generate_id(pronounceable=True, scripts=['devanagari'])
print(id) # Example: 'कअखइगउ' (ka-khi-gu)
# Works with any script
id = generate_id(pronounceable=True, scripts=['tamil'])
print(id) # Example: 'கஅசஇடஉ'
Validation
from indic_uid import is_valid_id, get_script_of_char
# Validate an ID
id = generate_id()
print(is_valid_id(id)) # True
print(is_valid_id('abc123')) # False
# Validate with specific script
id = generate_id(scripts=['kannada'])
print(is_valid_id(id, scripts=['kannada'])) # True
print(is_valid_id(id, scripts=['tamil'])) # False
# Validate length
print(is_valid_id(id, length=6)) # True if ID is 6 chars
# Detect script of a character
char = 'क'
print(get_script_of_char(char)) # 'devanagari'
Collision Probability Analysis
from indic_uid import calculate_collision_probability, estimate_safe_id_count
# Calculate collision probability for your use case
prob = calculate_collision_probability(
num_ids=10000, # Expected number of IDs
length=6, # ID length
num_scripts=6 # Number of scripts used
)
print(f"Collision probability: {prob:.2e}")
# Output: Collision probability: 5.45e-10 (extremely low!)
# Estimate safe number of IDs
safe_count = estimate_safe_id_count(
length=6,
num_scripts=6,
max_collision_prob=0.000001 # 1 in a million
)
print(f"Safe ID count: {safe_count:,}")
# Output: Safe ID count: 13,856,406
🎯 Use Cases
Referral System
from indic_uid import generate_id
def create_referral_code(user_id):
"""Generate a unique referral code for a user."""
referral_code = generate_id(length=6, scripts=['devanagari'])
# Store mapping: referral_code -> user_id in database
return referral_code
# Usage
code = create_referral_code(user_id=12345)
print(f"Your referral code: {code}")
# Share: https://yourapp.com/ref/{code}
Short URLs
from indic_uid import generate_id
def create_short_url(long_url):
"""Create a short URL identifier."""
short_id = generate_id(length=5, scripts=['devanagari', 'gujarati'])
# Store mapping: short_id -> long_url
return f"https://short.link/{short_id}"
url = create_short_url("https://example.com/very/long/url/path")
print(url) # https://short.link/कખગઘચ
Coupon Codes
from indic_uid import generate_id
def create_coupon_code(campaign_name):
"""Generate pronounceable coupon codes."""
code = generate_id(
length=6,
pronounceable=True,
scripts=['devanagari']
)
return code.upper() # Note: Indian scripts don't have uppercase
coupon = create_coupon_code("diwali_sale")
print(f"Coupon: {coupon}")
📊 Collision Probability
With default settings (6 characters, all 7 scripts):
| Number of IDs | Collision Probability | Odds |
|---|---|---|
| 1,000 | 5.45 × 10⁻¹² | 1 in 183 billion |
| 10,000 | 5.45 × 10⁻¹⁰ | 1 in 1.8 billion |
| 100,000 | 5.45 × 10⁻⁸ | 1 in 18 million |
| 1,000,000 | 5.45 × 10⁻⁶ | 1 in 183,000 |
Conclusion: Extremely safe for most applications. Even with 1 million IDs, collision probability is less than 0.0005%.
🌐 Supported Scripts
| Script | Example Characters | Total Characters |
|---|---|---|
| Devanagari (Hindi) | अ, क, ख, ग, घ | 41 |
| Gujarati | અ, ક, ખ, ગ, ઘ | 41 |
| Kannada | ಅ, ಕ, ಖ, ಗ, ಘ | 44 |
| Tamil | அ, க, ங, ச, ஞ | 30 |
| Telugu | అ, క, ఖ, గ, ఘ | 44 |
| Bengali | অ, ক, খ, গ, ঘ | 40 |
| English | a, b, c, d, e | 26 |
Total character pool: ~240 characters across all scripts
🔧 API Reference
generate_id(length=6, scripts=None, pronounceable=False)
Generate a unique ID.
Parameters:
length(int): Length of the ID (default: 6)scripts(list): List of script names to use (default: all scripts)pronounceable(bool): Alternate vowels/consonants (default: False)
Returns: String containing the generated ID
Example:
id = generate_id(length=8, scripts=['devanagari', 'tamil'])
generate_batch(count, length=6, scripts=None, pronounceable=False)
Generate multiple IDs at once.
Parameters:
count(int): Number of IDs to generatelength(int): Length of each IDscripts(list): List of script namespronounceable(bool): Alternate vowels/consonants
Returns: List of generated IDs
Example:
ids = generate_batch(100, length=5)
is_valid_id(id_string, scripts=None, length=None)
Validate if a string is a valid Indic ID.
Parameters:
id_string(str): The ID to validatescripts(list): Expected scripts (default: any)length(int): Expected length (default: any)
Returns: Boolean
Example:
is_valid = is_valid_id('कखگઘచঅ', scripts=['devanagari'])
get_script_of_char(char)
Identify which script a character belongs to.
Parameters:
char(str): A single character
Returns: Script name (str) or None
Example:
script = get_script_of_char('क') # Returns 'devanagari'
get_available_scripts()
Get list of all available scripts.
Returns: List of script names
Example:
scripts = get_available_scripts()
# ['devanagari', 'gujarati', 'kannada', 'tamil', 'telugu', 'bengali']
calculate_collision_probability(num_ids, length=6, num_scripts=6)
Calculate collision probability for given parameters.
Parameters:
num_ids(int): Expected number of IDslength(int): ID lengthnum_scripts(int): Number of scripts being used
Returns: Float between 0 and 1
estimate_safe_id_count(length=6, num_scripts=6, max_collision_prob=0.000001)
Estimate safe number of IDs for given collision probability.
Parameters:
length(int): ID lengthnum_scripts(int): Number of scriptsmax_collision_prob(float): Maximum acceptable collision probability
Returns: Integer (estimated safe count)
🤝 Contributing
Contributions are welcome! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run tests (
pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Setup
# Clone the repo
git clone https://github.com/aayush-mor13/indic-uid.git
cd indic-uid
# Install in development mode
pip install -e .
# Install dev dependencies
pip install pytest pytest-cov
# Run tests
pytest
# Run tests with coverage
pytest --cov=indic_uid
📝 License
This project is licensed under the MIT License - see the LICENSE file for details.
👥 Authors
- Aayush Mor - @aayush-mor13
- Jay Gala - @jaygala223
🙏 Acknowledgments
- Inspired by the beauty and diversity of Indian scripts
- Built for the Indian developer community
- Thanks to all contributors!
📫 Support
- 🐛 Report a bug
- 💡 Request a feature
- ⭐ Star this repo if you find it useful!
📈 Changelog
v0.1.0 (2025-01-22)
- Initial release
- Support for 6 Indian scripts + English
- Random and pronounceable ID generation
- Validation and script detection
- Collision probability analysis
Made with ❤️ for the Indian developer community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file indic_uid-0.1.0.tar.gz.
File metadata
- Download URL: indic_uid-0.1.0.tar.gz
- Upload date:
- Size: 14.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e39cc110f9081e3c2004a5a43c32743aa4a4f8155b7af2efe32221ef2f406765
|
|
| MD5 |
7e48b098c2a7a00b68d7520bd003d70c
|
|
| BLAKE2b-256 |
a81fbb6089d3876690b22e0abe04add967ac8eb82ba0c9103586ee0e93e446a3
|
File details
Details for the file indic_uid-0.1.0-py3-none-any.whl.
File metadata
- Download URL: indic_uid-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1afcdb85f1cd20a3db0258fdcba3b0e14028637b47fbfb4867b9c899c1ea15a0
|
|
| MD5 |
4ea104c05837e9a9f6c502de2d31f0f8
|
|
| BLAKE2b-256 |
75ebc91855fa9aabcccc182b27addf08213a9032af822988ad4a3aa2edc02573
|