Skip to main content

ToCount: Lightweight Token Estimator

Project description

ToCount: Lightweight Token Estimator


PyPI version built with Python3 GitHub repo size Discord Channel

Overview

ToCount is a lightweight and extensible Python library for estimating token counts from text inputs using both rule-based and machine learning methods. Designed for flexibility, speed, and accuracy, ToCount provides a unified interface for different estimation strategies, making it ideal for tasks like prompt analysis, token budgeting, and optimizing interactions with token-based systems.

PyPI Counter
Github Stars
Branch main dev
CI

Installation

PyPI

Source code

Models

Rule-Based

Model Name MAE MSE
RULE_BASED.UNIVERSAL 106.70 381,647.81 0.8175
RULE_BASED.GPT_4 152.34 571,795.89 0.7266
RULE_BASED.GPT_3_5 161.93 652,923.59 0.6878

Tiktoken R50K

Model Name MAE MSE
TIKTOKEN_R50K.LINEAR_ALL 71.38 183897.01 0.8941
TIKTOKEN_R50K.LINEAR_ENGLISH 23.35 14127.92 0.9887

ℹ️ The training and testing dataset is taken from Lmsys-chat-1m [1] and Wildchat [2].

Usage

>>> from tocount import estimate_text_tokens, TextEstimator
>>> estimate_text_tokens("How are you?", estimator=TextEstimator.RULE_BASED.UNIVERSAL)
4

Issues & bug reports

Just fill an issue and describe it. We'll check it ASAP! or send an email to tocount@openscilab.com.

  • Please complete the issue template

You can also join our discord server

Discord Channel

References

1- Zheng, Lianmin, et al. "Lmsys-chat-1m: A large-scale real-world llm conversation dataset." International Conference on Learning Representations (ICLR) 2024 Spotlights.
2- Zhao, Wenting, et al. "Wildchat: 1m chatgpt interaction logs in the wild." International Conference on Learning Representations (ICLR) 2024 Spotlights.

Show your support

Star this repo

Give a ⭐️ if this project helped you!

Donate to our project

If you do like our project and we hope that you do, can you please support us? Our project is not and is never going to be working for profit. We need the money just so we can continue doing what we do ;-) .

ToCount Donation

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

Unreleased

0.2 - 2025-10-02

Added

  • TIKTOKEN_R50K.LINEAR_ALL model
  • TIKTOKEN_R50K.LINEAR_ENGLISH model

Changed

  • README.md updated

0.1 - 2025-08-30

Added

  • RULE_BASED.UNIVERSAL model
  • RULE_BASED.GPT_4 model
  • RULE_BASED.GPT_3_5 model

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tocount-0.2.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tocount-0.2-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file tocount-0.2.tar.gz.

File metadata

  • Download URL: tocount-0.2.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for tocount-0.2.tar.gz
Algorithm Hash digest
SHA256 e654be1bc2ee838f5421764de94b2caea999ad80a296de7c5b842d0a4a3e153b
MD5 39f87e9ea18ad43e18e2f35f4459b899
BLAKE2b-256 287dd5d134f78fafa3d23797773dd5b1374f9fa5cf0f8f113b5b78038289fd57

See more details on using hashes here.

File details

Details for the file tocount-0.2-py3-none-any.whl.

File metadata

  • Download URL: tocount-0.2-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for tocount-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4ae60245f47dcfc7a5c26b09a1caf0a40ff4b6febdc556532cc4fba9a626912e
MD5 572484bbc1c79efcb55f28187427dba3
BLAKE2b-256 f6d688336fa5c5151b1b68875fa02dfc485e9211c2befb21a98a18215fd1675d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page