Skip to main content

ToCount: Lightweight Token Estimator

Project description

ToCount: Lightweight Token Estimator


PyPI version built with Python3 GitHub repo size Discord Channel

Overview

ToCount is a lightweight and extensible Python library for estimating token counts from text inputs using both rule-based and machine learning methods. Designed for flexibility, speed, and accuracy, ToCount provides a unified interface for different estimation strategies, making it ideal for tasks like prompt analysis, token budgeting, and optimizing interactions with token-based systems.

PyPI Counter
Github Stars
Branch main dev
CI
Code Quality CodeFactor

Installation

PyPI

Source code

Models

Rule-Based

Model Name MAE MSE
RULE_BASED.UNIVERSAL 106.70 381,647.81 0.8175
RULE_BASED.GPT_4 152.34 571,795.89 0.7266
RULE_BASED.GPT_3_5 161.93 652,923.59 0.6878

Tiktoken R50K

Model Name MAE MSE
TIKTOKEN_R50K.LINEAR_ALL 71.38 183897.01 0.8941
TIKTOKEN_R50K.LINEAR_ENGLISH 23.35 14127.92 0.9887

Tiktoken CL100K

Model Name MAE MSE
TIKTOKEN_CL100K.LINEAR_ALL 41.85 47949.48 0.9545
TIKTOKEN_CL100K.LINEAR_ENGLISH 21.12 17597.20 0.9839

Tiktoken O200K

Model Name MAE MSE
TIKTOKEN_O200K.LINEAR_ALL 25.53 20195.32 0.9777
TIKTOKEN_O200K.LINEAR_ENGLISH 20.24 15887.99 0.9859

ℹ️ The training and testing dataset is taken from Lmsys-chat-1m [1] and Wildchat [2].

Usage

>>> from tocount import estimate_text_tokens, TextEstimator
>>> estimate_text_tokens("How are you?", estimator=TextEstimator.RULE_BASED.UNIVERSAL)
4

Issues & bug reports

Just fill an issue and describe it. We'll check it ASAP! or send an email to tocount@openscilab.com.

  • Please complete the issue template

You can also join our discord server

Discord Channel

References

1- Zheng, Lianmin, et al. "Lmsys-chat-1m: A large-scale real-world llm conversation dataset." International Conference on Learning Representations (ICLR) 2024 Spotlights.
2- Zhao, Wenting, et al. "Wildchat: 1m chatgpt interaction logs in the wild." International Conference on Learning Representations (ICLR) 2024 Spotlights.

Show your support

Star this repo

Give a ⭐️ if this project helped you!

Donate to our project

If you do like our project and we hope that you do, can you please support us? Our project is not and is never going to be working for profit. We need the money just so we can continue doing what we do ;-) .

ToCount Donation

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

Unreleased

0.3 - 2025-10-21

Added

  • TIKTOKEN_CL100K.LINEAR_ALL model
  • TIKTOKEN_CL100K.LINEAR_ENGLISH model
  • TIKTOKEN_O200K.LINEAR_ALL model
  • TIKTOKEN_O200K.LINEAR_ENGLISH model

Changed

  • README.md updated
  • Python 3.14 added to test.yml

0.2 - 2025-10-02

Added

  • TIKTOKEN_R50K.LINEAR_ALL model
  • TIKTOKEN_R50K.LINEAR_ENGLISH model

Changed

  • README.md updated

0.1 - 2025-08-30

Added

  • RULE_BASED.UNIVERSAL model
  • RULE_BASED.GPT_4 model
  • RULE_BASED.GPT_3_5 model

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tocount-0.3.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tocount-0.3-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file tocount-0.3.tar.gz.

File metadata

  • Download URL: tocount-0.3.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for tocount-0.3.tar.gz
Algorithm Hash digest
SHA256 fde7bf644f993c0af1df627bc0430c585d8bad0d2804336db287f3c212505f83
MD5 22e205eeafc00b55a831438d3474ddc4
BLAKE2b-256 cf580108fff088a6d9cbfe2a66207c7e23bd335854bebca5843da178e6aaa85d

See more details on using hashes here.

File details

Details for the file tocount-0.3-py3-none-any.whl.

File metadata

  • Download URL: tocount-0.3-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for tocount-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0360b2f021048d1c27ec14ed5c4d8ec5f8be23372de8cd0bdffe996e93de2aaa
MD5 df8694dffa1312eba3fc70690bbf418e
BLAKE2b-256 cbd8b2c2eb3fe9c11fe66fce49698a8d8ce452ad73bbe7e212c85deed4dc8690

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page