Skip to main content

Length of sentence utilities for counting the number of words/characters in a sentence

Project description

len_sentence

A Python library for counting words and characters in sentences across different languages and writing systems.

Overview

len_sentence provides intelligent sentence length counting that adapts to different languages and writing systems. It uses ISO15924 script codes to determine the appropriate counting method for each language, handling everything from space-separated languages like English to character-based languages like Chinese and Japanese.

Features

  • Multi-language support: Handles various writing systems including Latin, Chinese (Traditional/Simplified), Japanese, Thai, Khmer, Myanmar, Tibetan, and more
  • ISO15924 compliance: Uses standard 4-letter script codes for language identification
  • Special character handling: Properly processes numbers, punctuation, and language-specific separators
  • Intelligent counting: Automatically switches between word counting and character counting based on the script type

Installation

pip install len_sentence

Supported Language Scripts

The goal of the script is able to count the words from all the langauge of the world.

API Reference

count_sentence(sentence, lang_code)

Count the number of words or characters in a sentence based on the language script.

Parameters:

  • sentence (str): The sentence to count
  • lang_code (str): 4-letter ISO15924 script code, limited support for 2 letters ISO639-1 code

Returns:

  • int: Number of words or characters in the sentence

Raises:

  • ValueError: If the language code is not valid

Examples

from len_sentence import count_sentence

# Different scripts, different counting methods
count_sentence("Hello world!", "Latn")
count_sentence("你好世界", "Hans")
count_sentence("こんにちは", "Jpan")
count_sentence("བཀྲ་ཤིས་བདེ་ལེགས", "Tibt")
count_sentence("مرحبا بالعالم", "Arab")
count_sentence("Привет мир", "Cyrl")

Development

Setup

git clone <repository-url>
cd len-sentence
pip install -e .

Building and Publishing

python setup.py sdist bdist_wheel
twine upload dist/*

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

Author

JackyHe398
Email: hekinghung@gmail.com


Note: This library uses ISO15924 script codes. For a complete list of supported scripts, refer to the Unicode Script Codes specification.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

len_sentence-0.1.1.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

len_sentence-0.1.1-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file len_sentence-0.1.1.tar.gz.

File metadata

  • Download URL: len_sentence-0.1.1.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for len_sentence-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5fbe5b683587231995855394d7d80be1ae074bae2bf626058ad5e6270d1f1f41
MD5 23bfdca5e3b26819f93998de0da9f86c
BLAKE2b-256 09103fd0cef04a1a1f9a652db0cf29fdc2156d58d87fc245ef4ba44417b6f5f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for len_sentence-0.1.1.tar.gz:

Publisher: publish.yaml on JackyHe398/len-sentence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file len_sentence-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: len_sentence-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for len_sentence-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2da9466cb0168fbb7c667d787fca613a2ea2f9878be016a6d2fe2c2e203da483
MD5 3a46e6a65261d01ef7c1a0a0f89e232e
BLAKE2b-256 d3096d77eff1243d2dfd1b57eccf0cadafaf06d8c4b9d5dfb4b60630878281fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for len_sentence-0.1.1-py3-none-any.whl:

Publisher: publish.yaml on JackyHe398/len-sentence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page