High-performance extensible transliteration library for Sanskrit and Indic scripts
Project description
Shlesha - High-Performance Schema-Driven Transliteration Library
A next-generation transliteration library built with schema-driven architecture for Sanskrit and Indic scripts. Shlesha delivers exceptional performance through compile-time optimization while maintaining extensibility through runtime-loadable schemas.
🚀 Quick Start for Developers
New to Shlesha? Get up and running in one command:
./scripts/quick-start.sh
This sets up everything: Rust environment, Python bindings, WASM support, and runs all tests.
For detailed setup instructions, see DEVELOPER_SETUP.md.
📚 Complete Documentation: See DOCUMENTATION_INDEX.md for all guides and references.
⚡ Performance Highlights
Shlesha delivers exceptional performance competitive with the fastest transliteration libraries:
- Only 1.07x - 2.96x slower than Vidyut (industry-leading performance)
- 10.52 MB/s for Indic script conversions
- 6-10x better performance than our original targets
- Dramatically faster than Aksharamukha and Dharmamitra
- Schema-generated converters perform identically to hand-coded ones
🏗️ Revolutionary Schema-Based Architecture
Compile-Time Code Generation
Shlesha uses a revolutionary schema-driven approach where converters are generated at compile-time from declarative schemas:
# schemas/slp1.yaml - Generates optimized SLP1 converter
metadata:
name: "slp1"
script_type: "roman"
description: "Sanskrit Library Phonetic Basic"
target: "iso15919"
mappings:
vowels:
"A": "ā"
"I": "ī"
"U": "ū"
# ... more mappings
# schemas/bengali.yaml - Generates optimized Bengali converter
metadata:
name: "bengali"
script_type: "brahmic"
description: "Bengali/Bangla script"
mappings:
vowels:
"অ": "अ" # Bengali A → Devanagari A
"আ": "आ" # Bengali AA → Devanagari AA
# ... more mappings
Build-Time Optimization
The build system automatically generates highly optimized converters:
# Build output showing schema processing
warning: Processing YAML schemas...
warning: Generating optimized converters with Handlebars templates...
warning: Created 18 schema-generated converters with O(1) lookups
🎯 Hub-and-Spoke Architecture
Smart Multi-Hub Design
- Devanagari Hub: Central format for Indic scripts (तमिल → देवनागरी → गुजराती)
- ISO-15919 Hub: Central format for romanization schemes (ITRANS → ISO → IAST)
- Cross-Hub Conversion: Seamless Indic ↔ Roman via both hubs
- Direct Conversion: Bypass hubs when possible for maximum performance
Intelligent Routing
The system automatically determines the optimal conversion path:
// Direct passthrough - zero conversion cost
transliterator.transliterate("धर्म", "devanagari", "devanagari")?; // instant
// Single hub - one conversion
transliterator.transliterate("धर्म", "devanagari", "iso")?; // deva→iso
// Cross-hub - optimized path
transliterator.transliterate("dharma", "itrans", "bengali")?; // itrans→iso→deva→bengali
📚 Supported Scripts (15+ Scripts, 210+ Conversion Pairs)
Indic Scripts (Schema-Generated)
- Devanagari (
devanagari,deva) - Sanskrit, Hindi, Marathi - Bengali (
bengali,bn) - Bengali/Bangla script - Tamil (
tamil,ta) - Tamil script - Telugu (
telugu,te) - Telugu script - Gujarati (
gujarati,gu) - Gujarati script - Kannada (
kannada,kn) - Kannada script - Malayalam (
malayalam,ml) - Malayalam script - Odia (
odia,od) - Odia/Oriya script - Gurmukhi (
gurmukhi,pa) - Punjabi script - Sinhala (
sinhala,si) - Sinhala script
Romanization Schemes (Schema-Generated)
- ISO-15919 (
iso15919,iso) - International standard - ITRANS (
itrans) - Indian languages TRANSliteration - SLP1 (
slp1) - Sanskrit Library Phonetic Basic - Harvard-Kyoto (
harvard_kyoto,hk) - ASCII-based scheme - Velthuis (
velthuis) - TeX-compatible scheme - WX (
wx) - ASCII-based notation
Hand-Coded Scripts (Premium Quality)
- IAST (
iast) - International Alphabet of Sanskrit Transliteration - Kolkata (
kolkata) - Regional romanization scheme - Grantha (
grantha) - Classical Sanskrit script
🛠️ Usage Examples
Rust Library
use shlesha::Shlesha;
let transliterator = Shlesha::new();
// High-performance cross-script conversion
let result = transliterator.transliterate("धर्म", "devanagari", "gujarati")?;
println!("{}", result); // "ધર્મ"
// Roman to Indic conversion
let result = transliterator.transliterate("dharmakṣetra", "slp1", "tamil")?;
println!("{}", result); // "தர்மக்ஷேத்ர"
// Schema-generated converters in action
let result = transliterator.transliterate("dharmakSetra", "slp1", "iast")?;
println!("{}", result); // "dharmakśetra"
Python Bindings (PyO3)
import shlesha
# Create transliterator with all schema-generated converters
transliterator = shlesha.Shlesha()
# Fast schema-based conversion
result = transliterator.transliterate("ধর্ম", "bengali", "telugu")
print(result) # "ధర్మ"
# Performance with metadata tracking
result = transliterator.transliterate_with_metadata("धर्मkr", "devanagari", "iast")
print(f"Output: {result.output}") # "dharmakr"
print(f"Unknown tokens: {len(result.metadata.unknown_tokens)}")
# Runtime extensibility
scripts = shlesha.get_supported_scripts()
print(f"Supports {len(scripts)} scripts: {scripts}")
Command Line Interface
# Schema-generated high-performance conversion
shlesha transliterate --from slp1 --to devanagari "dharmakSetra"
# Output: धर्मक्षेत्र
# Cross-script conversion via dual hubs
shlesha transliterate --from itrans --to tamil "dharma"
# Output: தர்ம
# List all schema-generated + hand-coded scripts
shlesha scripts
# Output: bengali, devanagari, gujarati, harvard_kyoto, iast, iso15919, itrans, ...
WebAssembly (Browser/Node.js)
import init, { WasmShlesha } from './pkg/shlesha.js';
async function demo() {
await init();
const transliterator = new WasmShlesha();
// Schema-generated converter performance in browser
const result = transliterator.transliterate("કર્મ", "gujarati", "devanagari");
console.log(result); // "कर्म"
// Runtime script discovery
const scripts = transliterator.listSupportedScripts();
console.log(`${scripts.length} scripts available`);
}
🔧 Runtime Schema Loading
NEW: Shlesha now supports runtime schema loading across all APIs, enabling you to add custom scripts without recompilation.
Rust API
use shlesha::Shlesha;
let mut transliterator = Shlesha::new();
// Load custom schema from YAML content
let custom_schema = r#"
metadata:
name: "my_custom_script"
script_type: "roman"
has_implicit_a: false
description: "My custom transliteration scheme"
target: "iso15919"
mappings:
vowels:
"a": "a"
"e": "ē"
consonants:
"k": "k"
"t": "ṭ"
"#;
// Load the schema at runtime
transliterator.load_schema_from_string(custom_schema, "my_custom_script")?;
// Use immediately without recompilation
let result = transliterator.transliterate("kate", "my_custom_script", "devanagari")?;
println!("{}", result); // "काटे"
// Schema management
let info = transliterator.get_schema_info("my_custom_script").unwrap();
println!("Loaded {} with {} mappings", info.name, info.mapping_count);
Python API
import shlesha
transliterator = shlesha.Shlesha()
# Load schema from YAML string
yaml_content = """
metadata:
name: "custom_script"
script_type: "roman"
has_implicit_a: false
description: "Custom transliteration"
target: "iso15919"
mappings:
vowels:
"a": "a"
consonants:
"k": "k"
"""
# Runtime loading
transliterator.load_schema_from_string(yaml_content, "custom_script")
# Immediate usage
result = transliterator.transliterate("ka", "custom_script", "devanagari")
print(result) # "क"
# Schema info
info = transliterator.get_schema_info("custom_script")
print(f"Script: {info['name']}, Mappings: {info['mapping_count']}")
# Schema management
transliterator.remove_schema("custom_script")
transliterator.clear_runtime_schemas()
JavaScript/WASM API
import init, { WasmShlesha } from './pkg/shlesha.js';
async function loadCustomScript() {
await init();
const transliterator = new WasmShlesha();
// Define custom schema
const yamlContent = `
metadata:
name: "custom_script"
script_type: "roman"
has_implicit_a: false
description: "Custom script"
target: "iso15919"
mappings:
vowels:
"a": "a"
consonants:
"k": "k"
`;
// Load at runtime
transliterator.loadSchemaFromString(yamlContent, "custom_script");
// Use immediately
const result = transliterator.transliterate("ka", "custom_script", "devanagari");
console.log(result); // "क"
// Get schema information
const info = transliterator.getSchemaInfo("custom_script");
console.log(`Name: ${info.name}, Mappings: ${info.mapping_count}`);
}
Key Runtime Features
- ✅ Load from YAML strings - No file system required
- ✅ Load from file paths - For development workflows
- ✅ Schema validation - Automatic error checking
- ✅ Hot reloading - Add/remove schemas dynamically
- ✅ Schema introspection - Get metadata about loaded schemas
- ✅ Memory management - Clear schemas when done
- ✅ Cross-platform - Identical API across Rust, Python, WASM
Use Cases
Development & Testing
// Test schema variations quickly
transliterator.load_schema_from_string(variant_a, "test_a")?;
transliterator.load_schema_from_string(variant_b, "test_b")?;
// Compare results immediately
Dynamic Applications
# User uploads custom transliteration scheme
user_schema = request.files['schema'].read().decode('utf-8')
transliterator.load_schema_from_string(user_schema, user_id)
# Use immediately in application
Configuration-Driven Systems
// Load schemas from configuration
config.schemas.forEach(schema => {
transliterator.loadSchemaFromString(schema.content, schema.name);
});
⚡ Performance & Benchmarks
Competitive Performance Analysis
Recent benchmarks show Shlesha delivers industry-competitive performance:
| Library | SLP1→ISO (71 chars) | ITRANS→ISO (71 chars) | Architecture |
|---|---|---|---|
| Vidyut | 1.75 MB/s | 1.92 MB/s | Direct conversion |
| Shlesha | 0.93 MB/s | 1.04 MB/s | Schema-generated hub |
| Performance Ratio | 1.89x slower | 1.85x slower | Extensible |
Performance Achievements
✅ 6-10x better than original performance targets
✅ Only 1.07x - 2.96x slower than Vidyut (industry leader)
✅ 10.52 MB/s for Indic script conversions
✅ Dramatically faster than Aksharamukha/Dharmamitra
✅ Schema-generated = hand-coded performance
Architecture Trade-offs
| Aspect | Shlesha | Vidyut |
|---|---|---|
| Performance | Excellent (2-3x slower) | Best-in-class |
| Extensibility | Runtime schemas | Compile-time only |
| Script Support | 15+ (easily expandable) | Limited |
| Architecture | Hub-and-spoke | Direct conversion |
| Bindings | Rust/Python/WASM/CLI | Rust only |
🏗️ Schema-Driven Development
Adding New Scripts
Adding support for new scripts is now trivial with schemas:
# schemas/new_script.yaml
metadata:
name: "NewScript"
description: "Description of the script"
unicode_block: "NewScript"
has_implicit_vowels: true
mappings:
vowels:
- source: "𑀅" # New script character
target: "अ" # Devanagari equivalent
# ... add more mappings
# Rebuild to include new script
cargo build
# New script automatically available!
Template-Based Generation
Converters are generated using Handlebars templates for consistency:
{{!-- templates/indic_converter.hbs --}}
/// {{metadata.description}} converter generated from schema
pub struct {{pascal_case metadata.name}}Converter {
{{snake_case metadata.name}}_to_deva_map: HashMap<char, char>,
deva_to_{{snake_case metadata.name}}_map: HashMap<char, char>,
}
impl {{pascal_case metadata.name}}Converter {
pub fn new() -> Self {
// Generated O(1) lookup tables
let mut {{snake_case metadata.name}}_to_deva = HashMap::new();
{{#each character_mappings}}
{{snake_case ../metadata.name}}_to_deva.insert('{{this.source}}', '{{this.target}}');
{{/each}}
// ... template continues
}
}
🧪 Quality Assurance
Comprehensive Test Suite
- ✅ 127 passing tests covering all functionality
- ✅ Schema-generated converter tests for all 14 generated converters
- ✅ Performance regression tests ensuring schema = hand-coded speed
- ✅ Cross-script conversion matrix testing all 210+ pairs
- ✅ Unknown character handling with graceful degradation
Build System Validation
# Test schema-generated converters maintain performance
cargo test --lib
# Verify all conversions work
cargo test comprehensive_bidirectional_tests
# Performance benchmarks
cargo run --example shlesha_vs_vidyut_benchmark
🔧 Build Configuration & Features
Schema Processing Features
# Default: Schema-generated + hand-coded converters
cargo build
# Development mode with schema recompilation
cargo build --features "schema-dev"
# Minimal build (hand-coded only)
cargo build --no-default-features --features "hand-coded-only"
# All features (Python + WASM + CLI)
cargo build --features "python,wasm,cli"
Runtime Extensibility
let mut transliterator = Shlesha::new();
// Load additional schemas at runtime (future feature)
transliterator.load_schema("path/to/new_script.yaml")?;
// Schema registry access
let scripts = transliterator.list_supported_scripts();
println!("Dynamically loaded: {:?}", scripts);
🚀 Advanced Features
Metadata Collection
// Track unknown characters and conversion details
let result = transliterator.transliterate_with_metadata("धर्मkr", "devanagari", "iast")?;
if let Some(metadata) = result.metadata {
println!("Conversion: {} → {}", metadata.source_script, metadata.target_script);
for unknown in metadata.unknown_tokens {
println!("Unknown '{}' at position {}", unknown.token, unknown.position);
}
}
Script Characteristics
// Schema-aware script properties
let registry = ScriptConverterRegistry::default();
// Indic scripts have implicit vowels
assert!(registry.script_has_implicit_vowels("bengali").unwrap());
assert!(registry.script_has_implicit_vowels("devanagari").unwrap());
// Roman schemes don't
assert!(!registry.script_has_implicit_vowels("itrans").unwrap());
assert!(!registry.script_has_implicit_vowels("slp1").unwrap());
Hub Processing Control
// Fine-grained control over conversion paths
let hub = Hub::new();
// Direct hub operations
let iso_text = hub.deva_to_iso("धर्म")?; // Devanagari → ISO
let deva_text = hub.iso_to_deva("dharma")?; // ISO → Devanagari
// Cross-hub conversion with metadata
let result = hub.deva_to_iso_with_metadata("धर्म")?;
📖 Documentation
Complete Documentation Suite
- Architecture Guide - Deep dive into hub-and-spoke design
- Schema Reference - Complete schema format documentation
- Performance Guide - Optimization techniques and benchmarks
- API Reference - Complete function and type reference
- Developer Setup - Development environment setup
- Release System - Automated release workflow overview
- Deployment Guide - Complete deployment and environment setup
- crates.io RC Support - Release candidate publishing guide
- Security Setup - Token management and environment security
- Contributing Guide - Guidelines for contributors
Quick Reference
# Generate documentation
cargo doc --open
# Run all examples
cargo run --example shlesha_vs_vidyut_benchmark
cargo run --example roman_allocation_analysis
# Performance testing
cargo bench
🚀 Releases
Shlesha uses an automated release system for publishing to multiple package registries:
Quick Release
# Guided release process
./scripts/release.sh
Package Installation
# Python (PyPI)
pip install shlesha
# WASM (npm)
npm install shlesha-wasm
# Rust (crates.io)
cargo add shlesha
See DEPLOYMENT.md for complete release documentation.
🤝 Contributing
We welcome contributions! Shlesha's schema-driven architecture makes adding new scripts easier than ever:
- Add Schema: Create TOML/YAML mapping file
- Test: Run test suite to verify
- Benchmark: Ensure performance maintained
- Submit: Open PR with schema and tests
See CONTRIBUTING.md for detailed guidelines.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Unicode Consortium for Indic script standards
- ISO-15919 for romanization standardization
- Sanskrit Library for SLP1 encoding schemes
- Vidyut Project for performance benchmarking standards
- Rust Community for excellent tools (PyO3, wasm-pack, handlebars)
Shlesha - Where performance meets extensibility through intelligent schema-driven design.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file shlesha-0.1.4.tar.gz.
File metadata
- Download URL: shlesha-0.1.4.tar.gz
- Upload date:
- Size: 286.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26399cd8fdf1fafbc9f72d4fa93a905e027e7ed3ce349f0d5314a9c5fb15b169
|
|
| MD5 |
7c0febcaf17a03280428e785af1dd3c9
|
|
| BLAKE2b-256 |
9727ea3e1765bdb29940bcdc48aafb601f46d64117ae2ef88c98378a5bd05dec
|
Provenance
The following attestation bundles were made for shlesha-0.1.4.tar.gz:
Publisher:
python.yml on udapaana/shlesha
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
shlesha-0.1.4.tar.gz -
Subject digest:
26399cd8fdf1fafbc9f72d4fa93a905e027e7ed3ce349f0d5314a9c5fb15b169 - Sigstore transparency entry: 256258302
- Sigstore integration time:
-
Permalink:
udapaana/shlesha@61c49e5ee85c2372fc0fcfc13b821ef2c05f9728 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/udapaana
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python.yml@61c49e5ee85c2372fc0fcfc13b821ef2c05f9728 -
Trigger Event:
push
-
Statement type:
File details
Details for the file shlesha-0.1.4-cp313-cp313-macosx_10_12_x86_64.whl.
File metadata
- Download URL: shlesha-0.1.4-cp313-cp313-macosx_10_12_x86_64.whl
- Upload date:
- Size: 559.4 kB
- Tags: CPython 3.13, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
034b4bbbff5198a9094c08c6225c78cea0f330dfc6ac41e8b4e6292cc4177dca
|
|
| MD5 |
ab78dbcb6e7c43dc5fb0e8d35a213da5
|
|
| BLAKE2b-256 |
10c8084d2cc4ecd65ed557d1388cb3828cf868c135ef6f6fec7c609b3a2e7384
|
Provenance
The following attestation bundles were made for shlesha-0.1.4-cp313-cp313-macosx_10_12_x86_64.whl:
Publisher:
python.yml on udapaana/shlesha
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
shlesha-0.1.4-cp313-cp313-macosx_10_12_x86_64.whl -
Subject digest:
034b4bbbff5198a9094c08c6225c78cea0f330dfc6ac41e8b4e6292cc4177dca - Sigstore transparency entry: 256258315
- Sigstore integration time:
-
Permalink:
udapaana/shlesha@61c49e5ee85c2372fc0fcfc13b821ef2c05f9728 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/udapaana
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python.yml@61c49e5ee85c2372fc0fcfc13b821ef2c05f9728 -
Trigger Event:
push
-
Statement type:
File details
Details for the file shlesha-0.1.4-cp39-cp39-win_amd64.whl.
File metadata
- Download URL: shlesha-0.1.4-cp39-cp39-win_amd64.whl
- Upload date:
- Size: 442.2 kB
- Tags: CPython 3.9, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c928938086115aa604b1611e4f044cb659e997a2919f02eaeca1255229d0ae4
|
|
| MD5 |
6c1e4161cb2f3a5ada4b92f318b85821
|
|
| BLAKE2b-256 |
e0cfe0bacdad46a3180f03ecedf414d0132d05ac9e7b329b908988757979c68a
|
Provenance
The following attestation bundles were made for shlesha-0.1.4-cp39-cp39-win_amd64.whl:
Publisher:
python.yml on udapaana/shlesha
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
shlesha-0.1.4-cp39-cp39-win_amd64.whl -
Subject digest:
1c928938086115aa604b1611e4f044cb659e997a2919f02eaeca1255229d0ae4 - Sigstore transparency entry: 256258351
- Sigstore integration time:
-
Permalink:
udapaana/shlesha@61c49e5ee85c2372fc0fcfc13b821ef2c05f9728 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/udapaana
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python.yml@61c49e5ee85c2372fc0fcfc13b821ef2c05f9728 -
Trigger Event:
push
-
Statement type:
File details
Details for the file shlesha-0.1.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: shlesha-0.1.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 597.5 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f4481be4c4cc2d9a991f529acc59d4c6c23dc51aa7177559b219dcb584d1acc
|
|
| MD5 |
d4639f26c0af87e7883e88a4b28892d6
|
|
| BLAKE2b-256 |
fc5fc0d6d15e9d8f2a91f0380f720b12a7cf5f5ff67d080d95fa98f39e585cdc
|
Provenance
The following attestation bundles were made for shlesha-0.1.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
python.yml on udapaana/shlesha
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
shlesha-0.1.4-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl -
Subject digest:
9f4481be4c4cc2d9a991f529acc59d4c6c23dc51aa7177559b219dcb584d1acc - Sigstore transparency entry: 256258330
- Sigstore integration time:
-
Permalink:
udapaana/shlesha@61c49e5ee85c2372fc0fcfc13b821ef2c05f9728 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/udapaana
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python.yml@61c49e5ee85c2372fc0fcfc13b821ef2c05f9728 -
Trigger Event:
push
-
Statement type: