Skip to main content

Length of sentence utilities for counting the number of words/characters in a sentence

Project description

len_sentence

A Python library for counting words and characters in sentences across different languages and writing systems.

Overview

len_sentence provides intelligent sentence length counting that adapts to different languages and writing systems. It uses ISO15924 script codes to determine the appropriate counting method for each language, handling everything from space-separated languages like English to character-based languages like Chinese and Japanese.

Features

  • Multi-language support: Handles various writing systems including Latin, Chinese (Traditional/Simplified), Japanese, Thai, Khmer, Myanmar, Tibetan, and more
  • ISO15924 compliance: Uses standard 4-letter script codes for language identification
  • Special character handling: Properly processes numbers, punctuation, and language-specific separators
  • Intelligent counting: Automatically switches between word counting and character counting based on the script type

Installation

pip install len_sentence

Supported Language Scripts

The goal of the script is able to count the words from all the langauge of the world.

API Reference

count_sentence(sentence, lang_code)

Count the number of words or characters in a sentence based on the language script.

Parameters:

  • sentence (str): The sentence to count
  • lang_code (str): 4-letter ISO15924 script code

Returns:

  • int: Number of words or characters in the sentence

Raises:

  • ValueError: If the language code is not valid

Examples

from len_sentence import count_sentence

# Different scripts, different counting methods
count_sentence("Hello world!", "Latn")
count_sentence("你好世界", "Hans")
count_sentence("こんにちは", "Jpan")
count_sentence("བཀྲ་ཤིས་བདེ་ལེགས", "Tibt")
count_sentence("مرحبا بالعالم", "Arab")
count_sentence("Привет мир", "Cyrl")

Development

Setup

git clone <repository-url>
cd len-sentence
pip install -e .

Building and Publishing

python setup.py sdist bdist_wheel
twine upload dist/*

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

Author

JackyHe398
Email: hekinghung@gmail.com


Note: This library uses ISO15924 script codes. For a complete list of supported scripts, refer to the Unicode Script Codes specification.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

len_sentence-0.1.0.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

len_sentence-0.1.0-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file len_sentence-0.1.0.tar.gz.

File metadata

  • Download URL: len_sentence-0.1.0.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for len_sentence-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7910394c4ad04ed22452de6cfb18ced720895c85985e55b7a8c6b2b620a2f5d4
MD5 f62d96907b18580ea3829b6a71f4996b
BLAKE2b-256 3396a0d10adb6dd00e068007a5501b6e16172db6b8edb413553e83cd65a42b1c

See more details on using hashes here.

Provenance

The following attestation bundles were made for len_sentence-0.1.0.tar.gz:

Publisher: publish.yaml on JackyHe398/len-sentence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file len_sentence-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: len_sentence-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for len_sentence-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 994820505b2a84162a485e3c6d25fa135d9519deffeefc5aeb877f599b9dd3b4
MD5 30c9ad27d1397f071c1aab49ecc408f5
BLAKE2b-256 d7e6185badb18b857469c95c0b4c0bbbbc02f0bda6fde9c4af9e4d27f6edbcbe

See more details on using hashes here.

Provenance

The following attestation bundles were made for len_sentence-0.1.0-py3-none-any.whl:

Publisher: publish.yaml on JackyHe398/len-sentence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page