Transliterate Chechen text from Cyrillic to Latin script using the Chechen Latin alphabet
Project description
Chechen Transliterator
A Python library for transliterating Chechen text from Cyrillic to Latin script using the Chechen Latin alphabet.
Installation
pip install ce-translit
Quick Start
import ce_translit
# Simple usage - transliterate Chechen text
text = "Нохчийн мотт"
result = ce_translit.transliterate(text)
print(result) # Outputs: "Noxçiyŋ mott"
Features
- Simple API: Clean, single-function interface
- Linguistically Accurate: Handles all Chechen-specific rules
- Context-Aware: Special handling for letter position rules
- Customizable: Advanced options for specialized use cases
- Pure Python: No external dependencies
- Memory Efficient: Uses minimal memory and efficient string handling
Detailed Usage
Basic Usage
import ce_translit
# Transliterate a single word
word_result = ce_translit.transliterate("дош") # "doş"
# Transliterate a sentence
sentence = "Муха ду хьал де?"
sentence_result = ce_translit.transliterate(sentence) # "Muxa du ẋal de?"
Advanced Usage with Custom Rules
from ce_translit import Transliterator
# Create a custom transliterator with your own rules
custom_transliterator = Transliterator(
# Custom letter mapping
mapping={
**Transliterator()._mapping, # First define base mapping
# Then override specific mappings
"й": "j",
# Append completely new mappings
"1": "j"
},
# Override blacklist (Words that should keep the regular 'н' at the end)
blacklist=["дин", "гӏан", "сан"],
# Override unsurelist (Words that should use 'ŋ[REPLACE]' at the end)
unsurelist=["шун", "бен", "цӏен"]
)
# Use the custom transliterator
result = custom_transliterator.transliterate("1аж дин шун")
If you omit **Transliterator()._mapping** from the custom mapping, the custom transliterator will only use the custom mappings you provide.
Oveeride just one of list by defining a list outside
from ce_translit import Transliterator
# Define your own list
my_blacklist = ["дин", "гӏан", "сан"]
# Create a custom transliterator with defined blacklist
custom_transliterator = Transliterator(blacklist=my_blacklist)
result = custom_transliterator.transliterate("дин")
Special Transliteration Rules
The library handles several special rules in Chechen transliteration:
-
Letter 'е':
- At the start of a word → 'ye' (ex: "елар" → "yelar")
- After 'ъ' → 'ye' (ex: "шелъелча" → "şelyelça")
- In other positions → 'e' (ex: "мела" → "mela")
-
Letter 'н' at end of words:
- Regular handling → 'ŋ' (ex: "сан" → "saŋ")
- Blacklisted words keep 'n' (ex: "хан" → "xan")
- Unsurelist words use 'ŋ[REPLACE]' (ex: "шун" → "şuŋ[REPLACE]")
-
Standalone 'а':
- When 'а' is a standalone word → 'ə' (ex: "а" → "ə")
-
Special Character Combinations:
- 'къ' → 'q̇'
- 'хь' → 'ẋ'
- 'гӏ' → 'ġ'
Technical Details
Performance
The library is optimized for both startup time and runtime performance:
- Data is loaded once at import time
- Efficient string handling for minimal memory usage
- Uses sets for O(1) lookups in blacklists and unsure lists
Development
Setting up the Development Environment
# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate
# Install development tools
pip install --upgrade hatch pytest
# Run tests
hatch run test
# Build the package
hatch build
# Test the built package
pip install --force-reinstall dist/ce_translit-1.0.0-py3-none-any.whl
Running Tests
# Install test dependencies
pip install pytest
# Run tests
pytest
Repository Structure
ce-translit-py/
├── src/
│ └── ce_translit/
│ ├── __init__.py # Public API
│ ├── _transliterator.py # Core implementation
│ ├── data/
│ │ └── cyrl_latn_map.json # Character mapping
├── tests/
│ └── test_transliterator.py
├── LICENSE
├── README.md
└── pyproject.toml
License
This project is licensed under the MIT License.
Contributing
Contributions are welcome! Feel free to submit issues or pull requests on the GitHub repository.
Related Projects
- ce-translit-js - JavaScript version of this library
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ce_translit-1.0.1.tar.gz.
File metadata
- Download URL: ce_translit-1.0.1.tar.gz
- Upload date:
- Size: 6.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f382a84b9b1e5788c902de3a814da301f7a4a3811ad130a9d3cb8bc101311204
|
|
| MD5 |
cb66b0f36c7e71cd2b4869f9e78e6578
|
|
| BLAKE2b-256 |
c9c71ffbcb9c7af1e12cccf5882a6ad0e7dfa34c21d2dcc32ac2e25e8de40dd4
|
Provenance
The following attestation bundles were made for ce_translit-1.0.1.tar.gz:
Publisher:
python-publish.yml on chechen-language/ce-translit-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ce_translit-1.0.1.tar.gz -
Subject digest:
f382a84b9b1e5788c902de3a814da301f7a4a3811ad130a9d3cb8bc101311204 - Sigstore transparency entry: 219436980
- Sigstore integration time:
-
Permalink:
chechen-language/ce-translit-py@0ead028ad8a6dbf849c010624dcbc08ccda72e5e -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/chechen-language
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@0ead028ad8a6dbf849c010624dcbc08ccda72e5e -
Trigger Event:
release
-
Statement type:
File details
Details for the file ce_translit-1.0.1-py3-none-any.whl.
File metadata
- Download URL: ce_translit-1.0.1-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f81259fc5d5f35fc965f4dfb2d1c1f3c2a0faaef14f7ce066a18210c9bc6846c
|
|
| MD5 |
d09d2f0331f21f0c7b0eed63bef6c63d
|
|
| BLAKE2b-256 |
13494e33b5560b5cb9a3d61a9794bf777bbeb413d41248af43e1627a5c09ceb3
|
Provenance
The following attestation bundles were made for ce_translit-1.0.1-py3-none-any.whl:
Publisher:
python-publish.yml on chechen-language/ce-translit-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ce_translit-1.0.1-py3-none-any.whl -
Subject digest:
f81259fc5d5f35fc965f4dfb2d1c1f3c2a0faaef14f7ce066a18210c9bc6846c - Sigstore transparency entry: 219436981
- Sigstore integration time:
-
Permalink:
chechen-language/ce-translit-py@0ead028ad8a6dbf849c010624dcbc08ccda72e5e -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/chechen-language
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@0ead028ad8a6dbf849c010624dcbc08ccda72e5e -
Trigger Event:
release
-
Statement type: