Skip to main content

High-performance extensible transliteration library for Sanskrit and Indic scripts

Project description

Shlesha - High-Performance Schema-Driven Transliteration Library

A next-generation transliteration library built with schema-driven architecture for Sanskrit and Indic scripts. Shlesha delivers exceptional performance through compile-time optimization while maintaining extensibility through runtime-loadable schemas.

🚀 Quick Start for Developers

New to Shlesha? Get up and running in one command:

./scripts/quick-start.sh

This sets up everything: Rust environment, Python bindings, WASM support, and runs all tests.

For detailed setup instructions, see DEVELOPER_SETUP.md.

📚 Complete Documentation: See DOCUMENTATION_INDEX.md for all guides and references.


⚡ Performance Highlights

Shlesha delivers exceptional performance competitive with the fastest transliteration libraries:

  • Only 1.07x - 2.96x slower than Vidyut (industry-leading performance)
  • 10.52 MB/s for Indic script conversions
  • 6-10x better performance than our original targets
  • Dramatically faster than Aksharamukha and Dharmamitra
  • Schema-generated converters perform identically to hand-coded ones

🏗️ Revolutionary Schema-Based Architecture

Compile-Time Code Generation

Shlesha uses a revolutionary schema-driven approach where converters are generated at compile-time from declarative schemas:

# schemas/slp1.yaml - Generates optimized SLP1 converter
metadata:
  name: "slp1"
  script_type: "roman"
  description: "Sanskrit Library Phonetic Basic"

target: "iso15919"

mappings:
  vowels:
    "A": "ā"
    "I": "ī" 
    "U": "ū"
    # ... more mappings
# schemas/bengali.yaml - Generates optimized Bengali converter  
metadata:
  name: "bengali"
  script_type: "brahmic"
  description: "Bengali/Bangla script"

mappings:
  vowels:
    "অ": "अ"    # Bengali A → Devanagari A
    "আ": "आ"    # Bengali AA → Devanagari AA
    # ... more mappings

Build-Time Optimization

The build system automatically generates highly optimized converters:

# Build output showing schema processing
warning: Processing YAML schemas...
warning: Generating optimized converters with Handlebars templates...
warning: Created 18 schema-generated converters with O(1) lookups

🎯 Hub-and-Spoke Architecture

Smart Multi-Hub Design

  • Devanagari Hub: Central format for Indic scripts (तमिल → देवनागरी → गुजराती)
  • ISO-15919 Hub: Central format for romanization schemes (ITRANS → ISO → IAST)
  • Cross-Hub Conversion: Seamless Indic ↔ Roman via both hubs
  • Direct Conversion: Bypass hubs when possible for maximum performance

Intelligent Routing

The system automatically determines the optimal conversion path:

// Direct passthrough - zero conversion cost
transliterator.transliterate("धर्म", "devanagari", "devanagari")?; // instant

// Single hub - one conversion 
transliterator.transliterate("धर्म", "devanagari", "iso")?; // deva→iso

// Cross-hub - optimized path
transliterator.transliterate("dharma", "itrans", "bengali")?; // itrans→iso→deva→bengali

📚 Supported Scripts (15+ Scripts, 210+ Conversion Pairs)

Indic Scripts (Schema-Generated)

  • Devanagari (devanagari, deva) - Sanskrit, Hindi, Marathi
  • Bengali (bengali, bn) - Bengali/Bangla script
  • Tamil (tamil, ta) - Tamil script
  • Telugu (telugu, te) - Telugu script
  • Gujarati (gujarati, gu) - Gujarati script
  • Kannada (kannada, kn) - Kannada script
  • Malayalam (malayalam, ml) - Malayalam script
  • Odia (odia, od) - Odia/Oriya script
  • Gurmukhi (gurmukhi, pa) - Punjabi script
  • Sinhala (sinhala, si) - Sinhala script

Romanization Schemes (Schema-Generated)

  • ISO-15919 (iso15919, iso) - International standard
  • ITRANS (itrans) - Indian languages TRANSliteration
  • SLP1 (slp1) - Sanskrit Library Phonetic Basic
  • Harvard-Kyoto (harvard_kyoto, hk) - ASCII-based scheme
  • Velthuis (velthuis) - TeX-compatible scheme
  • WX (wx) - ASCII-based notation

Hand-Coded Scripts (Premium Quality)

  • IAST (iast) - International Alphabet of Sanskrit Transliteration
  • Kolkata (kolkata) - Regional romanization scheme
  • Grantha (grantha) - Classical Sanskrit script

🛠️ Usage Examples

Rust Library

use shlesha::Shlesha;

let transliterator = Shlesha::new();

// High-performance cross-script conversion
let result = transliterator.transliterate("धर्म", "devanagari", "gujarati")?;
println!("{}", result); // "ધર્મ"

// Roman to Indic conversion  
let result = transliterator.transliterate("dharmakṣetra", "slp1", "tamil")?;
println!("{}", result); // "தர்மக்ஷேத்ர"

// Schema-generated converters in action
let result = transliterator.transliterate("dharmakSetra", "slp1", "iast")?;
println!("{}", result); // "dharmakśetra"

Python Bindings (PyO3)

import shlesha

# Create transliterator with all schema-generated converters
transliterator = shlesha.Shlesha()

# Fast schema-based conversion
result = transliterator.transliterate("ধর্ম", "bengali", "telugu")
print(result)  # "ధర్మ"

# Performance with metadata tracking
result = transliterator.transliterate_with_metadata("धर्मkr", "devanagari", "iast")
print(f"Output: {result.output}")  # "dharmakr"
print(f"Unknown tokens: {len(result.metadata.unknown_tokens)}")

# Runtime extensibility
scripts = shlesha.get_supported_scripts()
print(f"Supports {len(scripts)} scripts: {scripts}")

Command Line Interface

# Schema-generated high-performance conversion
shlesha transliterate --from slp1 --to devanagari "dharmakSetra"
# Output: धर्मक्षेत्र

# Cross-script conversion via dual hubs  
shlesha transliterate --from itrans --to tamil "dharma"
# Output: தர்ம

# List all schema-generated + hand-coded scripts
shlesha scripts
# Output: bengali, devanagari, gujarati, harvard_kyoto, iast, iso15919, itrans, ...

WebAssembly (Browser/Node.js)

import init, { WasmShlesha } from './pkg/shlesha.js';

async function demo() {
    await init();
    const transliterator = new WasmShlesha();
    
    // Schema-generated converter performance in browser
    const result = transliterator.transliterate("કર્મ", "gujarati", "devanagari");
    console.log(result); // "कर्म"
    
    // Runtime script discovery
    const scripts = transliterator.listSupportedScripts();
    console.log(`${scripts.length} scripts available`);
}

🔧 Runtime Schema Loading

NEW: Shlesha now supports runtime schema loading across all APIs, enabling you to add custom scripts without recompilation.

Rust API

use shlesha::Shlesha;

let mut transliterator = Shlesha::new();

// Load custom schema from YAML content
let custom_schema = r#"
metadata:
  name: "my_custom_script"
  script_type: "roman"
  has_implicit_a: false
  description: "My custom transliteration scheme"

target: "iso15919"

mappings:
  vowels:
    "a": "a"
    "e": "ē"
  consonants:
    "k": "k"
    "t": "ṭ"
"#;

// Load the schema at runtime
transliterator.load_schema_from_string(custom_schema, "my_custom_script")?;

// Use immediately without recompilation
let result = transliterator.transliterate("kate", "my_custom_script", "devanagari")?;
println!("{}", result); // "काटे"

// Schema management
let info = transliterator.get_schema_info("my_custom_script").unwrap();
println!("Loaded {} with {} mappings", info.name, info.mapping_count);

Python API

import shlesha

transliterator = shlesha.Shlesha()

# Load schema from YAML string
yaml_content = """
metadata:
  name: "custom_script"
  script_type: "roman"
  has_implicit_a: false
  description: "Custom transliteration"

target: "iso15919"

mappings:
  vowels:
    "a": "a"
  consonants:
    "k": "k"
"""

# Runtime loading
transliterator.load_schema_from_string(yaml_content, "custom_script")

# Immediate usage
result = transliterator.transliterate("ka", "custom_script", "devanagari")
print(result)  # "क"

# Schema info
info = transliterator.get_schema_info("custom_script")
print(f"Script: {info['name']}, Mappings: {info['mapping_count']}")

# Schema management
transliterator.remove_schema("custom_script")
transliterator.clear_runtime_schemas()

JavaScript/WASM API

import init, { WasmShlesha } from './pkg/shlesha.js';

async function loadCustomScript() {
    await init();
    const transliterator = new WasmShlesha();
    
    // Define custom schema
    const yamlContent = `
metadata:
  name: "custom_script"
  script_type: "roman"
  has_implicit_a: false
  description: "Custom script"

target: "iso15919"

mappings:
  vowels:
    "a": "a"
  consonants:
    "k": "k"
`;
    
    // Load at runtime
    transliterator.loadSchemaFromString(yamlContent, "custom_script");
    
    // Use immediately
    const result = transliterator.transliterate("ka", "custom_script", "devanagari");
    console.log(result); // "क"
    
    // Get schema information
    const info = transliterator.getSchemaInfo("custom_script");
    console.log(`Name: ${info.name}, Mappings: ${info.mapping_count}`);
}

Key Runtime Features

  • Load from YAML strings - No file system required
  • Load from file paths - For development workflows
  • Schema validation - Automatic error checking
  • Hot reloading - Add/remove schemas dynamically
  • Schema introspection - Get metadata about loaded schemas
  • Memory management - Clear schemas when done
  • Cross-platform - Identical API across Rust, Python, WASM

Use Cases

Development & Testing

// Test schema variations quickly
transliterator.load_schema_from_string(variant_a, "test_a")?;
transliterator.load_schema_from_string(variant_b, "test_b")?;
// Compare results immediately

Dynamic Applications

# User uploads custom transliteration scheme
user_schema = request.files['schema'].read().decode('utf-8')
transliterator.load_schema_from_string(user_schema, user_id)
# Use immediately in application

Configuration-Driven Systems

// Load schemas from configuration
config.schemas.forEach(schema => {
    transliterator.loadSchemaFromString(schema.content, schema.name);
});

⚡ Performance & Benchmarks

Competitive Performance Analysis

Recent benchmarks show Shlesha delivers industry-competitive performance:

Library SLP1→ISO (71 chars) ITRANS→ISO (71 chars) Architecture
Vidyut 1.75 MB/s 1.92 MB/s Direct conversion
Shlesha 0.93 MB/s 1.04 MB/s Schema-generated hub
Performance Ratio 1.89x slower 1.85x slower Extensible

Performance Achievements

6-10x better than original performance targets
Only 1.07x - 2.96x slower than Vidyut (industry leader)
10.52 MB/s for Indic script conversions
Dramatically faster than Aksharamukha/Dharmamitra
Schema-generated = hand-coded performance

Architecture Trade-offs

Aspect Shlesha Vidyut
Performance Excellent (2-3x slower) Best-in-class
Extensibility Runtime schemas Compile-time only
Script Support 15+ (easily expandable) Limited
Architecture Hub-and-spoke Direct conversion
Bindings Rust/Python/WASM/CLI Rust only

🏗️ Schema-Driven Development

Adding New Scripts

Adding support for new scripts is now trivial with schemas:

# schemas/new_script.yaml
metadata:
  name: "NewScript"
  description: "Description of the script"
  unicode_block: "NewScript"
  has_implicit_vowels: true

mappings:
  vowels:
    - source: "𑀅"  # New script character
      target: "अ"   # Devanagari equivalent
    # ... add more mappings
# Rebuild to include new script
cargo build
# New script automatically available!

Template-Based Generation

Converters are generated using Handlebars templates for consistency:

{{!-- templates/indic_converter.hbs --}}
/// {{metadata.description}} converter generated from schema
pub struct {{pascal_case metadata.name}}Converter {
    {{snake_case metadata.name}}_to_deva_map: HashMap<char, char>,
    deva_to_{{snake_case metadata.name}}_map: HashMap<char, char>,
}

impl {{pascal_case metadata.name}}Converter {
    pub fn new() -> Self {
        // Generated O(1) lookup tables
        let mut {{snake_case metadata.name}}_to_deva = HashMap::new();
        {{#each character_mappings}}
        {{snake_case ../metadata.name}}_to_deva.insert('{{this.source}}', '{{this.target}}');
        {{/each}}
        // ... template continues
    }
}

🧪 Quality Assurance

Comprehensive Test Suite

  • 127 passing tests covering all functionality
  • Schema-generated converter tests for all 14 generated converters
  • Performance regression tests ensuring schema = hand-coded speed
  • Cross-script conversion matrix testing all 210+ pairs
  • Unknown character handling with graceful degradation

Build System Validation

# Test schema-generated converters maintain performance
cargo test --lib

# Verify all conversions work
cargo test comprehensive_bidirectional_tests

# Performance benchmarks
cargo run --example shlesha_vs_vidyut_benchmark

🔧 Build Configuration & Features

Schema Processing Features

# Default: Schema-generated + hand-coded converters
cargo build

# Development mode with schema recompilation
cargo build --features "schema-dev"

# Minimal build (hand-coded only)
cargo build --no-default-features --features "hand-coded-only"

# All features (Python + WASM + CLI)
cargo build --features "python,wasm,cli"

Runtime Extensibility

let mut transliterator = Shlesha::new();

// Load additional schemas at runtime (future feature)
transliterator.load_schema("path/to/new_script.yaml")?;

// Schema registry access
let scripts = transliterator.list_supported_scripts();
println!("Dynamically loaded: {:?}", scripts);

🚀 Advanced Features

Metadata Collection

// Track unknown characters and conversion details
let result = transliterator.transliterate_with_metadata("धर्मkr", "devanagari", "iast")?;

if let Some(metadata) = result.metadata {
    println!("Conversion: {} → {}", metadata.source_script, metadata.target_script);
    for unknown in metadata.unknown_tokens {
        println!("Unknown '{}' at position {}", unknown.token, unknown.position);
    }
}

Script Characteristics

// Schema-aware script properties
let registry = ScriptConverterRegistry::default();

// Indic scripts have implicit vowels
assert!(registry.script_has_implicit_vowels("bengali").unwrap());
assert!(registry.script_has_implicit_vowels("devanagari").unwrap());

// Roman schemes don't
assert!(!registry.script_has_implicit_vowels("itrans").unwrap());
assert!(!registry.script_has_implicit_vowels("slp1").unwrap());

Hub Processing Control

// Fine-grained control over conversion paths
let hub = Hub::new();

// Direct hub operations
let iso_text = hub.deva_to_iso("धर्म")?;  // Devanagari → ISO
let deva_text = hub.iso_to_deva("dharma")?;  // ISO → Devanagari

// Cross-hub conversion with metadata
let result = hub.deva_to_iso_with_metadata("धर्म")?;

📖 Documentation

Complete Documentation Suite

Quick Reference

# Generate documentation
cargo doc --open

# Run all examples
cargo run --example shlesha_vs_vidyut_benchmark
cargo run --example roman_allocation_analysis  

# Performance testing
cargo bench

🚀 Releases

Shlesha uses an automated release system for publishing to multiple package registries:

Quick Release

# Guided release process
./scripts/release.sh

Package Installation

# Python (PyPI)
pip install shlesha

# WASM (npm)  
npm install shlesha-wasm

# Rust (crates.io)
cargo add shlesha

See DEPLOYMENT.md for complete release documentation.

🤝 Contributing

We welcome contributions! Shlesha's schema-driven architecture makes adding new scripts easier than ever:

  1. Add Schema: Create TOML/YAML mapping file
  2. Test: Run test suite to verify
  3. Benchmark: Ensure performance maintained
  4. Submit: Open PR with schema and tests

See CONTRIBUTING.md for detailed guidelines.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Unicode Consortium for Indic script standards
  • ISO-15919 for romanization standardization
  • Sanskrit Library for SLP1 encoding schemes
  • Vidyut Project for performance benchmarking standards
  • Rust Community for excellent tools (PyO3, wasm-pack, handlebars)

Shlesha - Where performance meets extensibility through intelligent schema-driven design.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shlesha-0.1.4.tar.gz (286.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

shlesha-0.1.4-cp313-cp313-macosx_10_12_x86_64.whl (559.4 kB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

shlesha-0.1.4-cp39-cp39-win_amd64.whl (442.2 kB view details)

Uploaded CPython 3.9Windows x86-64

shlesha-0.1.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (597.5 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file shlesha-0.1.4.tar.gz.

File metadata

  • Download URL: shlesha-0.1.4.tar.gz
  • Upload date:
  • Size: 286.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for shlesha-0.1.4.tar.gz
Algorithm Hash digest
SHA256 26399cd8fdf1fafbc9f72d4fa93a905e027e7ed3ce349f0d5314a9c5fb15b169
MD5 7c0febcaf17a03280428e785af1dd3c9
BLAKE2b-256 9727ea3e1765bdb29940bcdc48aafb601f46d64117ae2ef88c98378a5bd05dec

See more details on using hashes here.

Provenance

The following attestation bundles were made for shlesha-0.1.4.tar.gz:

Publisher: python.yml on udapaana/shlesha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file shlesha-0.1.4-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for shlesha-0.1.4-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 034b4bbbff5198a9094c08c6225c78cea0f330dfc6ac41e8b4e6292cc4177dca
MD5 ab78dbcb6e7c43dc5fb0e8d35a213da5
BLAKE2b-256 10c8084d2cc4ecd65ed557d1388cb3828cf868c135ef6f6fec7c609b3a2e7384

See more details on using hashes here.

Provenance

The following attestation bundles were made for shlesha-0.1.4-cp313-cp313-macosx_10_12_x86_64.whl:

Publisher: python.yml on udapaana/shlesha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file shlesha-0.1.4-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: shlesha-0.1.4-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 442.2 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for shlesha-0.1.4-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 1c928938086115aa604b1611e4f044cb659e997a2919f02eaeca1255229d0ae4
MD5 6c1e4161cb2f3a5ada4b92f318b85821
BLAKE2b-256 e0cfe0bacdad46a3180f03ecedf414d0132d05ac9e7b329b908988757979c68a

See more details on using hashes here.

Provenance

The following attestation bundles were made for shlesha-0.1.4-cp39-cp39-win_amd64.whl:

Publisher: python.yml on udapaana/shlesha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file shlesha-0.1.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for shlesha-0.1.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9f4481be4c4cc2d9a991f529acc59d4c6c23dc51aa7177559b219dcb584d1acc
MD5 d4639f26c0af87e7883e88a4b28892d6
BLAKE2b-256 fc5fc0d6d15e9d8f2a91f0380f720b12a7cf5f5ff67d080d95fa98f39e585cdc

See more details on using hashes here.

Provenance

The following attestation bundles were made for shlesha-0.1.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python.yml on udapaana/shlesha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page