Skip to main content

Parse and format South Asian numeric strings (Bangla/English, lakh-crore system).

Project description

doshomik

CI PyPI version Python versions License: MIT

Parse and format South Asian numeric strings — Bangla and English digits, lakh-crore system, currency markers.

from doshomik import parse, format

parse("১ কোটি ২৫ লক্ষ")   # → 12_500_000
parse("2.5 crore")          # → 25_000_000
parse("৳ 1,25,000")         # → 125_000

format(12_500_000, script="bangla",   grouping="lakh")    # → "১,২৫,০০,০০০"
format(12_500_000, script="english",  grouping="lakh")    # → "1,25,00,000"
format(12_500_000, script="english",  grouping="western") # → "12,500,000"

Installation

pip

pip install doshomik

uv

uv add doshomik

uv (standalone script / tool)

uvx doshomik  # not applicable — library only; import in your project instead

Requires Python 3.10 or later. No runtime dependencies.

Usage

Parsing

parse(s) accepts a string containing South Asian numeric notation and returns a plain int.

from doshomik import parse

# Bangla digits
parse("১২৩")           # → 123
parse("০")             # → 0

# English digits
parse("125000")        # → 125_000

# Multiplier words (Bangla and English)
parse("১ হাজার")       # → 1_000
parse("1 hazar")       # → 1_000
parse("1 thousand")    # → 1_000

parse("১ লাখ")         # → 100_000
parse("1 lakh")        # → 100_000
parse("1 lac")         # → 100_000

parse("১ কোটি")        # → 10_000_000
parse("1 crore")       # → 10_000_000

# Compound expressions
parse("১ কোটি ২৫ লক্ষ")         # → 12_500_000
parse("1 crore 25 lakh")         # → 12_500_000
parse("1 crore 50 lakh 25 hazar") # → 15_025_000

# Decimal multipliers
parse("2.5 crore")    # → 25_000_000
parse("1.5 lakh")     # → 150_000

# Grouped numbers (lakh and western styles)
parse("1,25,000")     # → 125_000
parse("1,000,000")    # → 1_000_000

# Currency markers are stripped automatically
parse("৳ 1,25,000")   # → 125_000
parse("500 taka")     # → 500
parse("500 tk.")      # → 500
parse("bdt 250")      # → 250

# International (as used in English-language BD press)
parse("1 million")    # → 1_000_000
parse("1 billion")    # → 1_000_000_000

# Mixed script
parse("১০ lakh")      # → 1_000_000
parse("10 লাখ")       # → 1_000_000

What parse accepts:

Category Examples
Bangla digits ০১২৩৪৫৬৭৮৯
English digits 0123456789
Thousands hazar, hajar, thousand, k, হাজার
Lakhs lakh, lac, lakhs, লাখ, লক্ষ
Crores crore, crores, cr, কোটি
International million, mn, m, billion, bn, b
Grouping commas lakh-style (1,25,000) and western (1,000,000)
Currency markers , tk, tk., taka, টাকা, rs, rs., bdt
Decimal multipliers 2.5 crore, 1.5 lakh

What parse does NOT accept:

  • Bare decimals without a multiplier ("1.5" raises ParseError — result is not an integer)
  • Strings with no numeric content ("abc", "")
  • Standalone multipliers ("crore", "লাখ")

Formatting

format(n, *, script, grouping) converts a plain int to a formatted string.

from doshomik import format

# script="bangla" | "english"
# grouping="lakh"  | "western" | "none"

format(125_000, script="english", grouping="lakh")    # → "1,25,000"
format(125_000, script="bangla",  grouping="lakh")    # → "১,২৫,০০০"
format(125_000, script="english", grouping="western") # → "125,000"
format(125_000, script="bangla",  grouping="western") # → "১২৫,০০০"
format(125_000, script="english", grouping="none")    # → "125000"
format(125_000, script="bangla",  grouping="none")    # → "১২৫০০০"

# Negative numbers
format(-1_000_000, script="english", grouping="lakh")    # → "-10,00,000"
format(-1_000_000, script="bangla",  grouping="lakh")    # → "-১০,০০,০০০"

Lakh grouping follows the South Asian convention: the rightmost group has 3 digits, every group to the left has 2 digits.

12,50,00,000  →  "12 crore 50 lakh"
 1,25,000     →  "1 lakh 25 thousand"

Default arguments: script="bangla", grouping="lakh" — i.e., format(n) produces a Bangla-script lakh-grouped string.

Error handling

from doshomik import parse, format
from doshomik import ParseError, FormatError, DoshomikError

# DoshomikError is the base class for both ParseError and FormatError
try:
    value = parse("not a number")
except ParseError as e:
    print(e)  # no numeric content found in 'not a number'

try:
    result = format(1.5, script="english", grouping="lakh")
except FormatError as e:
    print(e)  # n must be int, got float

# Catch either with the base class
try:
    parse("crore")  # multiplier without preceding number
except DoshomikError as e:
    print(type(e).__name__, e)

ParseError is raised when:

  • The input contains no numeric content
  • A multiplier appears without a preceding number
  • The computed result is not a whole integer (e.g. "1.5" with no multiplier)

FormatError is raised when:

  • n is not a plain int (floats, strings, None, and bool are all rejected)
  • script is not "bangla" or "english" (case-sensitive)
  • grouping is not "lakh", "western", or "none" (case-sensitive)

API reference

parse(s: str) -> int

Parse a South Asian numeric string and return an integer.

Parameter Type Description
s str Input string to parse

Raises ParseError on invalid input.

format(n: int, *, script: str = "bangla", grouping: str = "lakh") -> str

Format an integer as a South Asian numeric string.

Parameter Type Default Description
n int Integer to format (booleans rejected)
script str "bangla" "bangla" or "english"
grouping str "lakh" "lakh", "western", or "none"

Raises FormatError on invalid arguments.

Exceptions

Exception Base When
DoshomikError Exception Base class
ParseError DoshomikError Input cannot be parsed
FormatError DoshomikError Arguments are invalid

Contributing

git clone https://github.com/rayhanmahmuud/doshomik
cd doshomik
uv sync --all-groups

# Run the full test suite (pytest + coverage)
uv run pytest

# Type check
uv run mypy

# Lint and format
uv run ruff check src tests
uv run ruff format src tests

Tests target Python 3.10–3.13. Property-based tests use Hypothesis.

Changelog

See CHANGELOG.md.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doshomik-0.1.0.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

doshomik-0.1.0-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file doshomik-0.1.0.tar.gz.

File metadata

  • Download URL: doshomik-0.1.0.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for doshomik-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8b837d1c7f06e6f9388dfb4eb1a9c38d88a9f44d5ea781fc4a67b0374c11be40
MD5 78379c94b10ef5c7a9b850dbaa748e68
BLAKE2b-256 847fba9f19371c0bbff17fb835335dfab59793c5fe385c893a9a41e95b152512

See more details on using hashes here.

File details

Details for the file doshomik-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: doshomik-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for doshomik-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 71d152bfb59f81e880e3d83324d750defc7bd663c57389018be58c245e3da295
MD5 12a8ca1e63682a72d5e89b2dca0d3a11
BLAKE2b-256 edf83a8eb620e34180cc1038faf220116e46ad07001732425b796f81433b65bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page