Length of sentence utilities for counting the number of words/characters in a sentence
Project description
len_sentence
A Python library for counting words and characters in sentences across different languages and writing systems.
Overview
len_sentence provides intelligent sentence length counting that adapts to different languages and writing systems. It uses ISO15924 script codes to determine the appropriate counting method for each language, handling everything from space-separated languages like English to character-based languages like Chinese and Japanese.
Features
- Multi-language support: Handles various writing systems including Latin, Chinese (Traditional/Simplified), Japanese, Thai, Khmer, Myanmar, Tibetan, and more
- ISO15924 compliance: Uses standard 4-letter script codes for language identification
- Special character handling: Properly processes numbers, punctuation, and language-specific separators
- Intelligent counting: Automatically switches between word counting and character counting based on the script type
Installation
pip install len_sentence
Supported Language Scripts
The goal of the script is able to count the words from all the langauge of the world.
API Reference
count_sentence(sentence, lang_code)
Count the number of words or characters in a sentence based on the language script.
Parameters:
sentence(str): The sentence to countlang_code(str): 4-letter ISO15924 script code, limited support for 2 letters ISO639-1 code
Returns:
int: Number of words or characters in the sentence
Raises:
ValueError: If the language code is not valid
Examples
from len_sentence import count_sentence
# Different scripts, different counting methods
count_sentence("Hello world!", "Latn")
count_sentence("你好世界", "Hans")
count_sentence("こんにちは", "Jpan")
count_sentence("བཀྲ་ཤིས་བདེ་ལེགས", "Tibt")
count_sentence("مرحبا بالعالم", "Arab")
count_sentence("Привет мир", "Cyrl")
Development
Setup
git clone <repository-url>
cd len-sentence
pip install -e .
Building and Publishing
python setup.py sdist bdist_wheel
twine upload dist/*
Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.
Author
JackyHe398
Email: hekinghung@gmail.com
Note: This library uses ISO15924 script codes. For a complete list of supported scripts, refer to the Unicode Script Codes specification.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file len_sentence-0.1.1.tar.gz.
File metadata
- Download URL: len_sentence-0.1.1.tar.gz
- Upload date:
- Size: 4.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5fbe5b683587231995855394d7d80be1ae074bae2bf626058ad5e6270d1f1f41
|
|
| MD5 |
23bfdca5e3b26819f93998de0da9f86c
|
|
| BLAKE2b-256 |
09103fd0cef04a1a1f9a652db0cf29fdc2156d58d87fc245ef4ba44417b6f5f1
|
Provenance
The following attestation bundles were made for len_sentence-0.1.1.tar.gz:
Publisher:
publish.yaml on JackyHe398/len-sentence
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
len_sentence-0.1.1.tar.gz -
Subject digest:
5fbe5b683587231995855394d7d80be1ae074bae2bf626058ad5e6270d1f1f41 - Sigstore transparency entry: 532544507
- Sigstore integration time:
-
Permalink:
JackyHe398/len-sentence@ad1237dafcb425ff67009ca7e17481eafc47c414 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/JackyHe398
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@ad1237dafcb425ff67009ca7e17481eafc47c414 -
Trigger Event:
push
-
Statement type:
File details
Details for the file len_sentence-0.1.1-py3-none-any.whl.
File metadata
- Download URL: len_sentence-0.1.1-py3-none-any.whl
- Upload date:
- Size: 4.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2da9466cb0168fbb7c667d787fca613a2ea2f9878be016a6d2fe2c2e203da483
|
|
| MD5 |
3a46e6a65261d01ef7c1a0a0f89e232e
|
|
| BLAKE2b-256 |
d3096d77eff1243d2dfd1b57eccf0cadafaf06d8c4b9d5dfb4b60630878281fa
|
Provenance
The following attestation bundles were made for len_sentence-0.1.1-py3-none-any.whl:
Publisher:
publish.yaml on JackyHe398/len-sentence
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
len_sentence-0.1.1-py3-none-any.whl -
Subject digest:
2da9466cb0168fbb7c667d787fca613a2ea2f9878be016a6d2fe2c2e203da483 - Sigstore transparency entry: 532544528
- Sigstore integration time:
-
Permalink:
JackyHe398/len-sentence@ad1237dafcb425ff67009ca7e17481eafc47c414 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/JackyHe398
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@ad1237dafcb425ff67009ca7e17481eafc47c414 -
Trigger Event:
push
-
Statement type: