Parse and format South Asian numeric strings (Bangla/English, lakh-crore system).
Project description
doshomik
Parse and format South Asian numeric strings — Bangla and English digits, lakh-crore system, currency markers.
from doshomik import parse, format
parse("১ কোটি ২৫ লক্ষ") # → 12_500_000
parse("2.5 crore") # → 25_000_000
parse("৳ 1,25,000") # → 125_000
format(12_500_000, script="bangla", grouping="lakh") # → "১,২৫,০০,০০০"
format(12_500_000, script="english", grouping="lakh") # → "1,25,00,000"
format(12_500_000, script="english", grouping="western") # → "12,500,000"
Installation
pip
pip install doshomik
uv
uv add doshomik
Requires Python 3.10 or later. No runtime dependencies.
Usage
Parsing
parse(s) accepts a string containing South Asian numeric notation and returns a plain int.
from doshomik import parse
# Bangla digits
parse("১২৩") # → 123
parse("০") # → 0
# English digits
parse("125000") # → 125_000
# Multiplier words (Bangla and English)
parse("১ হাজার") # → 1_000
parse("1 hazar") # → 1_000
parse("1 thousand") # → 1_000
parse("১ লাখ") # → 100_000
parse("1 lakh") # → 100_000
parse("1 lac") # → 100_000
parse("১ কোটি") # → 10_000_000
parse("1 crore") # → 10_000_000
# Compound expressions
parse("১ কোটি ২৫ লক্ষ") # → 12_500_000
parse("1 crore 25 lakh") # → 12_500_000
parse("1 crore 50 lakh 25 hazar") # → 15_025_000
# Decimal multipliers
parse("2.5 crore") # → 25_000_000
parse("1.5 lakh") # → 150_000
# Grouped numbers (lakh and western styles)
parse("1,25,000") # → 125_000
parse("1,000,000") # → 1_000_000
# Currency markers are stripped automatically
parse("৳ 1,25,000") # → 125_000
parse("500 taka") # → 500
parse("500 tk.") # → 500
parse("bdt 250") # → 250
# International (as used in English-language BD press)
parse("1 million") # → 1_000_000
parse("1 billion") # → 1_000_000_000
# Mixed script
parse("১০ lakh") # → 1_000_000
parse("10 লাখ") # → 1_000_000
What parse accepts:
| Category | Examples |
|---|---|
| Bangla digits | ০১২৩৪৫৬৭৮৯ |
| English digits | 0123456789 |
| Thousands | hazar, hajar, thousand, k, হাজার |
| Lakhs | lakh, lac, lakhs, লাখ, লক্ষ |
| Crores | crore, crores, cr, কোটি |
| International | million, mn, m, billion, bn, b |
| Grouping commas | lakh-style (1,25,000) and western (1,000,000) |
| Currency markers | ৳, tk, tk., taka, টাকা, rs, rs., bdt |
| Decimal multipliers | 2.5 crore, 1.5 lakh |
What parse does NOT accept:
- Bare decimals without a multiplier (
"1.5"raisesParseError— result is not an integer) - Strings with no numeric content (
"abc","") - Standalone multipliers (
"crore","লাখ")
Formatting
format(n, *, script, grouping) converts a plain int to a formatted string.
from doshomik import format
# script="bangla" | "english"
# grouping="lakh" | "western" | "none"
format(125_000, script="english", grouping="lakh") # → "1,25,000"
format(125_000, script="bangla", grouping="lakh") # → "১,২৫,০০০"
format(125_000, script="english", grouping="western") # → "125,000"
format(125_000, script="bangla", grouping="western") # → "১২৫,০০০"
format(125_000, script="english", grouping="none") # → "125000"
format(125_000, script="bangla", grouping="none") # → "১২৫০০০"
# Negative numbers
format(-1_000_000, script="english", grouping="lakh") # → "-10,00,000"
format(-1_000_000, script="bangla", grouping="lakh") # → "-১০,০০,০০০"
Lakh grouping follows the South Asian convention: the rightmost group has 3 digits, every group to the left has 2 digits.
12,50,00,000 → "12 crore 50 lakh"
1,25,000 → "1 lakh 25 thousand"
Default arguments: script="bangla", grouping="lakh" — i.e., format(n) produces a Bangla-script lakh-grouped string.
Error handling
from doshomik import parse, format
from doshomik import ParseError, FormatError, DoshomikError
# DoshomikError is the base class for both ParseError and FormatError
try:
value = parse("not a number")
except ParseError as e:
print(e) # no numeric content found in 'not a number'
try:
result = format(1.5, script="english", grouping="lakh")
except FormatError as e:
print(e) # n must be int, got float
# Catch either with the base class
try:
parse("crore") # multiplier without preceding number
except DoshomikError as e:
print(type(e).__name__, e)
ParseError is raised when:
- The input contains no numeric content
- A multiplier appears without a preceding number
- The computed result is not a whole integer (e.g.
"1.5"with no multiplier)
FormatError is raised when:
nis not a plainint(floats, strings,None, andboolare all rejected)scriptis not"bangla"or"english"(case-sensitive)groupingis not"lakh","western", or"none"(case-sensitive)
API reference
parse(s: str) -> int
Parse a South Asian numeric string and return an integer.
| Parameter | Type | Description |
|---|---|---|
s |
str |
Input string to parse |
Raises ParseError on invalid input.
format(n: int, *, script: str = "bangla", grouping: str = "lakh") -> str
Format an integer as a South Asian numeric string.
| Parameter | Type | Default | Description |
|---|---|---|---|
n |
int |
— | Integer to format (booleans rejected) |
script |
str |
"bangla" |
"bangla" or "english" |
grouping |
str |
"lakh" |
"lakh", "western", or "none" |
Raises FormatError on invalid arguments.
Exceptions
| Exception | Base | When |
|---|---|---|
DoshomikError |
Exception |
Base class |
ParseError |
DoshomikError |
Input cannot be parsed |
FormatError |
DoshomikError |
Arguments are invalid |
Contributing
git clone https://github.com/rayhan-mahmuud/doshomik
cd doshomik
uv sync --all-groups
# Run the full test suite (pytest + coverage)
uv run pytest
# Type check
uv run mypy
# Lint and format
uv run ruff check src tests
uv run ruff format src tests
Tests target Python 3.10–3.13. Property-based tests use Hypothesis.
Changelog
See CHANGELOG.md.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file doshomik-0.1.1.tar.gz.
File metadata
- Download URL: doshomik-0.1.1.tar.gz
- Upload date:
- Size: 6.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e4fc9c88fe871486507fd46e69235dc11dbe0c69cfaea021926f32d3ca089de2
|
|
| MD5 |
0de0308c8c7a4bcd45905eaa3cbd88ca
|
|
| BLAKE2b-256 |
e272e874c7795ea373ebaf5b3aa1164c1a441caea7dbe28a1b6c82c984345035
|
Provenance
The following attestation bundles were made for doshomik-0.1.1.tar.gz:
Publisher:
publish.yml on rayhan-mahmuud/doshomik
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
doshomik-0.1.1.tar.gz -
Subject digest:
e4fc9c88fe871486507fd46e69235dc11dbe0c69cfaea021926f32d3ca089de2 - Sigstore transparency entry: 1349123146
- Sigstore integration time:
-
Permalink:
rayhan-mahmuud/doshomik@17ebd20191ba021912501a61eb96a1a2899bc729 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/rayhan-mahmuud
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@17ebd20191ba021912501a61eb96a1a2899bc729 -
Trigger Event:
release
-
Statement type:
File details
Details for the file doshomik-0.1.1-py3-none-any.whl.
File metadata
- Download URL: doshomik-0.1.1-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f404bdfc1485fa2ae1b009d3f7b3774f421218c3e01cd02d1b27505982b6b19
|
|
| MD5 |
d5b2c4e388360bdda7c78198ecc1986a
|
|
| BLAKE2b-256 |
6f2514af92afddd8ced2001b2c8f72e5bd6dd370e1e4c49da921c185496be0c6
|
Provenance
The following attestation bundles were made for doshomik-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on rayhan-mahmuud/doshomik
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
doshomik-0.1.1-py3-none-any.whl -
Subject digest:
8f404bdfc1485fa2ae1b009d3f7b3774f421218c3e01cd02d1b27505982b6b19 - Sigstore transparency entry: 1349123509
- Sigstore integration time:
-
Permalink:
rayhan-mahmuud/doshomik@17ebd20191ba021912501a61eb96a1a2899bc729 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/rayhan-mahmuud
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@17ebd20191ba021912501a61eb96a1a2899bc729 -
Trigger Event:
release
-
Statement type: