Skip to main content

A Python library for Hinglish (Hindi+English code-mixed) NLP: detection, tokenization, transliteration, stop-word removal.

Project description

hinglishswd

A Python library for Hinglish (Hindi+English code-mixed) NLP.

Features

  • Language detection — English / Hindi (Devanagari) / Hinglish (Latin-script Hindi)
  • Tokenization — Punctuation-aware splitting for Hinglish, spaCy-based for Devanagari Hindi
  • Transliteration — Hinglish → Hindi Devanagari (via indic-transliteration)
  • Translation — Hinglish/Hindi → English (via deep-translator Google Translate)
  • Stop word removal — Built-in Hindi + Hinglish stop word lists
  • Pipeline API — Single-call for full processing

Installation

pip install hinglishswd

Quick Start

from hinglishswd import HinglishNLP

nlp = HinglishNLP()

# Language detection
nlp.detect("kal mai khaana khane gaya")    # "hinglish"
nlp.detect("Hello, how are you?")           # "english"
nlp.detect("आज मौसम बहुत अच्छा है")         # "hindi"

# Tokenization
nlp.tokenize("kal mai khaana khane gaya")   # ['kal', 'mai', 'khaana', 'khane', 'gaya']

# Transliteration (Hinglish -> Devanagari)
nlp.transliterate("kal mai khaana khane gaya")  # कल मै खान खने गय

# Translation (Hinglish -> English)
nlp.translate("kal mai khaana khane gaya")  # "I went to eat yesterday"

# Full pipeline
result = nlp.pipeline("aaj mausam bahut acha hai")
# {
#   "text": "aaj mausam bahut acha hai",
#   "language": "hinglish",
#   "tokens": ["aaj", "mausam", "bahut", "acha", "hai"],
#   "tokens_no_stopwords": ["aaj", "mausam", "acha"],
#   "devanagari": "आज मौसम बहुत अच्छा है",
#   "english": "the weather is very good today"
# }

Module-level API

from hinglishswd import (
    detect_language,
    tokenize, tokenize_sentences,
    transliterate, hinglish_to_devanagari,
    to_english, hinglish_to_english, translate_pipeline,
    remove_stopwords,
)

lang = detect_language("Aap kahan ho?")
tokens = tokenize("Mujhe paani chahiye")
dev = hinglish_to_devanagari("mera naam rahul hai")
en = hinglish_to_english("aaj kya kar rahe ho?")

Package Structure

hinglishswd/
├── __init__.py
├── core.py              # HinglishNLP class (main API)
├── detect.py            # Language detection
├── tokenize.py          # Tokenization
├── transliterate.py     # Script conversion (Indic-transliteration)
├── translate.py         # Translation (deep-translator)
└── stopwords.py         # Hindi + Hinglish stop words

Dependencies

  • indic-transliteration — Hinglish ↔ Devanagari transliteration
  • deep-translator — Google Translate-based translation (optional, for .translate())

License

Modified MIT License — see LICENSE. Large-scale commercial users (>100M MAU or >$20M/month revenue) must display "hinglishswd" in their UI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hinglishswd-0.1.3.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hinglishswd-0.1.3-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file hinglishswd-0.1.3.tar.gz.

File metadata

  • Download URL: hinglishswd-0.1.3.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hinglishswd-0.1.3.tar.gz
Algorithm Hash digest
SHA256 12fb360d559468b750cc09ed52273b3c193362939888394fb9bcc4a18156168c
MD5 47696e929749c2b62f0404791020346d
BLAKE2b-256 9454997124fc7dcad575349f164b9bd1ddc6ac44518b6710576f08544518ca7c

See more details on using hashes here.

File details

Details for the file hinglishswd-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: hinglishswd-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hinglishswd-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8b26c885df7f27d6bd86addf1061065a7d4e4e43daf0f55c5b9a0e5e3e7269bc
MD5 76a7929990b2566a0e6dabca946b0fa9
BLAKE2b-256 76be7f28fc065123b74663452c5dba81d0c442bc5c950f6637080f18ad5d94c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page