A Python library for Hinglish (Hindi+English code-mixed) NLP: detection, tokenization, transliteration, stop-word removal.
Project description
hinglishswd
A Python library for Hinglish (Hindi+English code-mixed) NLP.
Features
- Language detection — English / Hindi (Devanagari) / Hinglish (Latin-script Hindi)
- Tokenization — Punctuation-aware splitting for Hinglish, spaCy-based for Devanagari Hindi
- Transliteration — Hinglish → Hindi Devanagari (via
indic-transliteration) - Translation — Hinglish/Hindi → English (via
deep-translatorGoogle Translate) - Stop word removal — Built-in Hindi + Hinglish stop word lists
- Pipeline API — Single-call for full processing
Installation
pip install hinglishswd
Quick Start
from hinglishswd import HinglishNLP
nlp = HinglishNLP()
# Language detection
nlp.detect("kal mai khaana khane gaya") # "hinglish"
nlp.detect("Hello, how are you?") # "english"
nlp.detect("आज मौसम बहुत अच्छा है") # "hindi"
# Tokenization
nlp.tokenize("kal mai khaana khane gaya") # ['kal', 'mai', 'khaana', 'khane', 'gaya']
# Transliteration (Hinglish -> Devanagari)
nlp.transliterate("kal mai khaana khane gaya") # कल मै खान खने गय
# Translation (Hinglish -> English)
nlp.translate("kal mai khaana khane gaya") # "I went to eat yesterday"
# Full pipeline
result = nlp.pipeline("aaj mausam bahut acha hai")
# {
# "text": "aaj mausam bahut acha hai",
# "language": "hinglish",
# "tokens": ["aaj", "mausam", "bahut", "acha", "hai"],
# "tokens_no_stopwords": ["aaj", "mausam", "acha"],
# "devanagari": "आज मौसम बहुत अच्छा है",
# "english": "the weather is very good today"
# }
Module-level API
from hinglishswd import (
detect_language,
tokenize, tokenize_sentences,
transliterate, hinglish_to_devanagari,
to_english, hinglish_to_english, translate_pipeline,
remove_stopwords,
)
lang = detect_language("Aap kahan ho?")
tokens = tokenize("Mujhe paani chahiye")
dev = hinglish_to_devanagari("mera naam rahul hai")
en = hinglish_to_english("aaj kya kar rahe ho?")
Package Structure
hinglishswd/
├── __init__.py
├── core.py # HinglishNLP class (main API)
├── detect.py # Language detection
├── tokenize.py # Tokenization
├── transliterate.py # Script conversion (Indic-transliteration)
├── translate.py # Translation (deep-translator)
└── stopwords.py # Hindi + Hinglish stop words
Dependencies
indic-transliteration— Hinglish ↔ Devanagari transliterationdeep-translator— Google Translate-based translation (optional, for.translate())
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hinglishswd-0.1.1.tar.gz
(6.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hinglishswd-0.1.1.tar.gz.
File metadata
- Download URL: hinglishswd-0.1.1.tar.gz
- Upload date:
- Size: 6.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
262362e6f3853721ee5c703f9e8a7601d57f410718474fa1e5a51bc3cb2175b8
|
|
| MD5 |
357ee37ab65d19fe874fb78afebf5df2
|
|
| BLAKE2b-256 |
002d70d99e1203b586df6c89b92abb0e8001a9ad719f07d56f72b33fa94e1427
|
File details
Details for the file hinglishswd-0.1.1-py3-none-any.whl.
File metadata
- Download URL: hinglishswd-0.1.1-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81bbd33f90b8ca3c585de68aa948b9636ac484cce2cdb7360e110a98481c3403
|
|
| MD5 |
568995e5e30091eadaa3969d03cee75f
|
|
| BLAKE2b-256 |
35d37dacc89b08d013661f25bd390ae2da347c68a5c41e978ab6ad66b74f413c
|