Skip to main content

ToCount: Lightweight Token Estimator

Project description

ToCount Logo

ToCount: Lightweight Token Estimator


PyPI version built with Python3 GitHub repo size Discord Channel

Overview

ToCount is a lightweight and extensible Python library for estimating token counts from text inputs using both rule-based and machine learning methods. Designed for flexibility, speed, and accuracy, ToCount provides a unified interface for different estimation strategies, making it ideal for tasks like prompt analysis, token budgeting, and optimizing interactions with token-based systems.

PyPI Counter
Github Stars
Branch main dev
CI
Code Quality CodeFactor

Installation

PyPI

Source code

Models

Rule-Based

Model Name MAE RMSE MedAE
RULE_BASED.UNIVERSAL 0.8175 106.70 617.78 18 0.6377
RULE_BASED.GPT_3_5 0.7266 152.34 756.17 35 0.4828
RULE_BASED.GPT_4 0.6878 161.93 808.04 40 0.4502

Tiktoken R50K

Model Name MAE RMSE MedAE
TIKTOKEN_R50K.LINEAR_ALL 0.7334 152.39 733.40 28.55 0.4826
TIKTOKEN_R50K.LINEAR_ENGLISH 0.8703 62.76 508.20 8.87 0.7287

Tiktoken CL100K

Model Name MAE RMSE MedAE
TIKTOKEN_CL100K.LINEAR_ALL 0.9127 64.09 298.02 15.73 0.6804
TIKTOKEN_CL100K.LINEAR_ENGLISH 0.9711 27.43 185.07 6.34 0.8527

Tiktoken O200K

Model Name MAE RMSE MedAE
TIKTOKEN_O200K.LINEAR_ALL 0.9563 38.23 197.16 9.70 0.7818
TIKTOKEN_O200K.LINEAR_ENGLISH 0.9730 26.00 177.54 5.96 0.8581

ℹ️ The training and testing dataset is taken from Lmsys-chat-1m [1] and Wildchat [2].

Usage

>>> from tocount import estimate_text_tokens, TextEstimator
>>> estimate_text_tokens("How are you?", estimator=TextEstimator.RULE_BASED.UNIVERSAL)
4

Issues & bug reports

Just fill an issue and describe it. We'll check it ASAP! or send an email to tocount@openscilab.com.

  • Please complete the issue template

You can also join our discord server

Discord Channel

References

1- Zheng, Lianmin, et al. "Lmsys-chat-1m: A large-scale real-world llm conversation dataset." International Conference on Learning Representations (ICLR) 2024 Spotlights.
2- Zhao, Wenting, et al. "Wildchat: 1m chatgpt interaction logs in the wild." International Conference on Learning Representations (ICLR) 2024 Spotlights.

Show your support

Star this repo

Give a ⭐️ if this project helped you!

Donate to our project

If you do like our project and we hope that you do, can you please support us? Our project is not and is never going to be working for profit. We need the money just so we can continue doing what we do ;-) .

ToCount Donation

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

Unreleased

0.4 - 2025-12-17

Added

  • Logo

Changed

  • TIKTOKEN_CL100K.LINEAR_ALL model updated
  • TIKTOKEN_CL100K.LINEAR_ENGLISH model updated
  • TIKTOKEN_O200K.LINEAR_ALL model updated
  • TIKTOKEN_O200K.LINEAR_ENGLISH model updated
  • TIKTOKEN_R50K.LINEAR_ALL model updated
  • TIKTOKEN_R50K.LINEAR_ENGLISH model updated

0.3 - 2025-10-21

Added

  • TIKTOKEN_CL100K.LINEAR_ALL model
  • TIKTOKEN_CL100K.LINEAR_ENGLISH model
  • TIKTOKEN_O200K.LINEAR_ALL model
  • TIKTOKEN_O200K.LINEAR_ENGLISH model

Changed

  • README.md updated
  • Python 3.14 added to test.yml

0.2 - 2025-10-02

Added

  • TIKTOKEN_R50K.LINEAR_ALL model
  • TIKTOKEN_R50K.LINEAR_ENGLISH model

Changed

  • README.md updated

0.1 - 2025-08-30

Added

  • RULE_BASED.UNIVERSAL model
  • RULE_BASED.GPT_4 model
  • RULE_BASED.GPT_3_5 model

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tocount-0.4.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tocount-0.4-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file tocount-0.4.tar.gz.

File metadata

  • Download URL: tocount-0.4.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for tocount-0.4.tar.gz
Algorithm Hash digest
SHA256 c2a52bb02d3ee9b734ae6ac49291a733d4c47267dd7727542ca7c956fda31abd
MD5 ac1e043fe949dd8aa7427003a002aec4
BLAKE2b-256 b55ce47ca5a6216b9f80b4b8c1d38b40131658f929bf85497de6bedd27b7028c

See more details on using hashes here.

File details

Details for the file tocount-0.4-py3-none-any.whl.

File metadata

  • Download URL: tocount-0.4-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for tocount-0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 57ea521773613045460d1bd8032c6d41e764d149bcd4b84495b192a1186fba0a
MD5 af10cbf5b177e83d8b53defe643af1f8
BLAKE2b-256 9146fa7a2450ada7d313e4d648ddd30bb2e64d6b2425619e0001527d73922fee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page