ToCount: Lightweight Token Estimator

These details have not been verified by PyPI

Project links

Project description

ToCount: Lightweight Token Estimator

Overview

ToCount is a lightweight and extensible Python library for estimating token counts from text inputs using both rule-based and machine learning methods. Designed for flexibility, speed, and accuracy, ToCount provides a unified interface for different estimation strategies, making it ideal for tasks like prompt analysis, token budgeting, and optimizing interactions with token-based systems.

PyPI Counter
Github Stars

Branch	main	dev
CI

Code Quality

Installation

PyPI

Check Python Packaging User Guide
Run pip install tocount==0.5

Source code

Download Version 0.5 or Latest Source
Run pip install .

Models

Rule-Based

Model Name	R²	MAE	RMSE	MedAE	D²
`RULE_BASED.UNIVERSAL`	0.8175	106.70	617.78	18	0.6377
`RULE_BASED.GPT_3_5`	0.7266	152.34	756.17	35	0.4828
`RULE_BASED.GPT_4`	0.6878	161.93	808.04	40	0.4502

Tiktoken R50K

Model Name	R²	MAE	RMSE	MedAE	D²
`TIKTOKEN_R50K.LINEAR_ALL`	0.7334	152.39	733.40	28.55	0.4826
`TIKTOKEN_R50K.LINEAR_ENGLISH`	0.8703	62.76	508.20	8.87	0.7287

Tiktoken CL100K

Model Name	R²	MAE	RMSE	MedAE	D²
`TIKTOKEN_CL100K.LINEAR_ALL`	0.9127	64.09	298.02	15.73	0.6804
`TIKTOKEN_CL100K.LINEAR_ENGLISH`	0.9711	27.43	185.07	6.34	0.8527

Tiktoken O200K

Model Name	R²	MAE	RMSE	MedAE	D²
`TIKTOKEN_O200K.LINEAR_ALL`	0.9563	38.23	197.16	9.70	0.7818
`TIKTOKEN_O200K.LINEAR_ENGLISH`	0.9730	26.00	177.54	5.96	0.8581

Deepseek R1

Model Name	R²	MAE	RMSE	MedAE	D²
`DEEPSEEK_R1.LINEAR_ALL`	0.9531	40.66	212.11	10.71	0.7741
`DEEPSEEK_R1.LINEAR_ENGLISH`	0.9696	28.44	192.36	6.36	0.8477

Qwen QwQ

Model Name	R²	MAE	RMSE	MedAE	D²
`QWEN_QWQ.LINEAR_ALL`	0.9342	45.50	257.97	12.17	0.7542
`QWEN_QWQ.LINEAR_ENGLISH`	0.9570	29.06	236.10	6.68	0.8457

Llama 3.1

Model Name	R²	MAE	RMSE	MedAE	D²
`LLAMA_3_1.LINEAR_ALL`	0.9538	44.37	207.58	11.70	0.7578
`LLAMA_3_1.LINEAR_ENGLISH`	0.9731	26.59	177.94	6.24	0.8564

ℹ️ The training and testing dataset is taken from Lmsys-chat-1m [1] and Wildchat [2].

Usage

>>> from tocount import estimate_text_tokens, TextEstimator
>>> estimate_text_tokens("How are you?", estimator=TextEstimator.RULE_BASED.UNIVERSAL)
4

Issues & bug reports

Just fill an issue and describe it. We'll check it ASAP! or send an email to tocount@openscilab.com.

Please complete the issue template

You can also join our discord server

References

1- Zheng, Lianmin, et al. "Lmsys-chat-1m: A large-scale real-world llm conversation dataset." International Conference on Learning Representations (ICLR) 2024 Spotlights.

2- Zhao, Wenting, et al. "Wildchat: 1m chatgpt interaction logs in the wild." International Conference on Learning Representations (ICLR) 2024 Spotlights.

Show your support

Star this repo

Give a ⭐️ if this project helped you!

Donate to our project

If you do like our project and we hope that you do, can you please support us? Our project is not and is never going to be working for profit. We need the money just so we can continue doing what we do ;-) .

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

Unreleased

0.5 - 2026-01-02

Added

DEEPSEEK_R1.LINEAR_ALL model
DEEPSEEK_R1.LINEAR_ENGLISH model
QWEN_QWQ.LINEAR_ALL model
QWEN_QWQ.LINEAR_ENGLISH model
LLAMA_3_1.LINEAR_ALL model
LLAMA_3_1.LINEAR_ENGLISH model

Changed

README.md updated

0.4 - 2025-12-17

Added

Logo

Changed

TIKTOKEN_CL100K.LINEAR_ALL model updated
TIKTOKEN_CL100K.LINEAR_ENGLISH model updated
TIKTOKEN_O200K.LINEAR_ALL model updated
TIKTOKEN_O200K.LINEAR_ENGLISH model updated
TIKTOKEN_R50K.LINEAR_ALL model updated
TIKTOKEN_R50K.LINEAR_ENGLISH model updated

0.3 - 2025-10-21

Added

TIKTOKEN_CL100K.LINEAR_ALL model
TIKTOKEN_CL100K.LINEAR_ENGLISH model
TIKTOKEN_O200K.LINEAR_ALL model
TIKTOKEN_O200K.LINEAR_ENGLISH model

Changed

README.md updated
Python 3.14 added to test.yml

0.2 - 2025-10-02

Added

TIKTOKEN_R50K.LINEAR_ALL model
TIKTOKEN_R50K.LINEAR_ENGLISH model

Changed

README.md updated

0.1 - 2025-08-30

Added

RULE_BASED.UNIVERSAL model
RULE_BASED.GPT_4 model
RULE_BASED.GPT_3_5 model

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5

Jan 2, 2026

0.4

Dec 16, 2025

0.3

Oct 21, 2025

0.2

Oct 2, 2025

0.1

Aug 30, 2025

0.0.0

Mar 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tocount-0.5.tar.gz (15.2 kB view details)

Uploaded Jan 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tocount-0.5-py3-none-any.whl (8.5 kB view details)

Uploaded Jan 2, 2026 Python 3

File details

Details for the file tocount-0.5.tar.gz.

File metadata

Download URL: tocount-0.5.tar.gz
Upload date: Jan 2, 2026
Size: 15.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for tocount-0.5.tar.gz
Algorithm	Hash digest
SHA256	`7b2eda67538576c665dc76001d35ccab99414897b4850f29f469d0196952d299`
MD5	`6122d741ba6afceced8fe459b23836eb`
BLAKE2b-256	`863c43c8bc8fe6ca455b39b1d60518661c49c7a14260661bcad9a4f4ba9311d4`

See more details on using hashes here.

File details

Details for the file tocount-0.5-py3-none-any.whl.

File metadata

Download URL: tocount-0.5-py3-none-any.whl
Upload date: Jan 2, 2026
Size: 8.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for tocount-0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a5a6ed223e757a4570a99aec393628a9188052eaf8ee086cf2579ac2a7f5c035`
MD5	`df43d9029def0f2267010ff85927be67`
BLAKE2b-256	`961d1ffdabc806d36f108fd618561bdd0f015556a91b1315b3a09beac34ad52c`

See more details on using hashes here.

tocount 0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ToCount: Lightweight Token Estimator

Overview

Installation

PyPI

Source code

Models

Rule-Based

Tiktoken R50K

Tiktoken CL100K

Tiktoken O200K

Deepseek R1

Qwen QwQ

Llama 3.1

Usage

Issues & bug reports

References

Show your support

Star this repo

Donate to our project

Changelog

Unreleased

0.5 - 2026-01-02

Added

Changed

0.4 - 2025-12-17

Added

Changed

0.3 - 2025-10-21

Added

Changed

0.2 - 2025-10-02

Added

Changed

0.1 - 2025-08-30

Added

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes